Classification of DNA structural mutational profiles in pancreatic adenocarcinomas

Pancreatic ductal adenocarcinoma (PDAC) continues to be the third leading cause of cancer-related deaths in the United States with a five-year relative survival rate below ten percent. Detection and classification tools are urgently needed to improve PDAC patients’ survival. Current methods for PDAC categorization rely on traditional mutational status of genes involved in DNA repair, homologous recombination, or other pathways. PDACs with these mutations generally respond better to different therapies. Nevertheless, traditional mutational analysis may not capture all perturbations to these pathways, especially where a large number of genes are involved. It has been demonstrated that mutations due to structural DNA aberrations are a better measure of DNA damage. Hence, we aim to use this measure to better classify PDACs to predict therapeutic response. Our goal is to advance PDAC classification by developing computational tools that uses structural (conformational) descriptors of the human genome. 


Figure. DNA sequence-structure model for mutational predictions in PDAC.

(A) Mutational profiles within PDAC preliminary samples: MSI (signature SBS6, SBS stands for a single base substitution) and APOBEC (SBS2/13). (B) The APOBEC SBS2/13 mutational signature from preliminary data. (C) Mapping of the structural DNA parameters on DNA sequence, where N can be any of the four DNA bases. Structural features: minor groove width (mgw), base-pair parameters (bp, here propeller), base-pair step parameters (bp step, here roll), and their loci within the 5nt-long DNA motif. (D) Mutational signature SBS 2/13 from DNA structure. (E) Contribution of each signature to a given cancer type (2 signatures shown). (F) Contribution of detected signature to tumor sample (PDAC samples, high heterogeneity).

Non-canonical DNA, mutagenesis, and drug resistance

Non-canonical DNA structures detected along gene promoters and other coding regions during transcription and/or replication suggest their importance and relevance for cancer development. During DNA replication, the switch from error-free to error-prone specialized polymerases helps overcome the risk of stopping replication at G-quadruplexes and creates the primary source to acquire mutations. Furthermore, bypassing damaged or non-canonical DNA is recognized as a major mutagen in many cancer types that may lead to drug resistance. Our group is interested in detection of the overlapping non-canonical secondary structures along the genetic material including but not limited to G-quadruplexes and R-loops (DNA-RNA hybrids). Despite their mutagenic potential, these DNA structures can serve as therapeutic targets and lead to cancer suppression.

From DNA mutations to RNA splicing

We are interested in developing novel machine learning approaches for capturing splicing factor regulatory networks from the splicing features of RNA sequencing, DNA non-canonical secondary structures including various forms of single stranded DNA, and mechanisms involving error-prone polymerases. The interest is two-fold: on one hand, we aim to apply the concepts of information theory (entropy and mutual information) to generate the splicing regulatory networks. Alternative splicing is a stochastic process that generates transcript configuration under a probabilistic distribution, thus the mutual dependence between splicing factors and site target across samples can be quantified by mutual information while the relationship between splicing factors and loci can be estimated with calculations of dependency, i.e., configuration-specific mutual information. On the other hand, we aim to integrate multiple data types using deep learning algorithms. By application of the neural networks to the structured multi-omics data, we seek to: i) identify pre- and post-translationally modified genes and proteins, ii) construct the network architectures, and iii) predict cancer phenotypes and targets for treatment.

Interacting with interactomes

We apply existing and develop novel tools utilizing concepts from the fields of structural biology of proteins, computational chemistry, engineering, biophysics, and machine learning to identify and design therapeutic candidates that display improved bimolecular interaction patterns and can work as molecular glue degraders or their hosts.