AGENDA

Link to SSACB25 program (last updated Aug 20 6am)

Aggarwal, Manu - NIH

"A Spatially-Aware and Interpretable Deep Learning Framework to Predict Gene Expression from Histopathology Images"

Deep learning models that predict gene expression from routine H&E (hematoxylin and eosin) stained images are enabling new paradigms for precision oncology, including the prediction of patient response to therapy. However, the biological interpretation of these predictive models remains a significant challenge. We propose a computational pipeline that leverages tissue organization to predict bulk gene expression with enhanced biological interpretability. We developed a robust, data-driven stain separation technique to computationally isolate hematoxylin and eosin signals, and we use a Graph Neural Network (GNN) to model the spatial relationships within the tissue architecture. The GNN integrates information across multiple spatial scales—from individual tiles to whole tissue fragments—to predict bulk RNA-sequencing profiles. A crucial aspect of our framework is its design to scale biological interpretability to giga-pixel H&E images. By leveraging SensX, a model-agnostic explainable AI (XAI) technique, we can identify specific tissue microenvironments that drive gene expression predictions. Applying this XAI method to the separated stain channels further elucidates the relative importance of nuclear versus non-nuclear morphological patterns. This work moves beyond "black-box" predictions by linking deep learning models to tissue morphological features, paving the way for AI-driven biomarker discovery and the development of more biologically validated, clinically trusted predictive models.

Bio:

Dr. Manu Aggarwal (Research Fellow) obtained his Ph.D. in mathematics specializing in modeling biological dynamical systems using mathematical models. During his postdoc at NIH, he developed scalable algorithms for topological data analysis of large data sets. He has used these algorithms to analyze topology of chromatin folding, arrangement of endocrine cells in pancreatic islets, topology of protein folding, and arrangement of galaxies in the universe. As a research fellow at NIH, he analyzed large deviation trajectories in HIV drug-resistance evolution and developed a model agnostic framework to explain deep learning models using global sensitivity analysis. He applied the latter to deep-learning models trained to annotate cell types using single-cell RNA data to validate that the models learned functionally relevant genes as important to classify different cell types.

Carja, Oana - Carnegie Mellon University

"The shape of evolving systems: How spatial arrangement drives evolutionary dynamics"

Living systems, from cancer cells and microbial colonies to human populations, are inherently spatial. Individuals interact within complex networks of relationships, and we are finally starting to have unprecedented access to datasets that reveal this spatial complexity and that inform on the ”shape” of biological systems.

Despite this, most evolutionary modeling and inference still rely on the simplifying assumption of well-mixed populations, an approximation that was reasonable when spatial data were scarce, but now limits our ability to extract meaningful insights.

In this talk, I will discuss early efforts to understand how spatial structure shapes eco-evolutionary dynamics and explain why accurately interpreting novel spatial data and making robust evolutionary inferences requires developing new methods that explicitly incorporate spatial constraints and heterogeneity.

I will also present recent work from my group that tackles this challenge through novel mathematical and computational approaches. Our results offer tractable, generalizable, and unifying models that bridge theory with data, opening new avenues for understanding how spatial structure influences evolution across biological systems.

Bio:

Dr. Carja has an undergraduate degree in Mathematics, a PhD in Biological sciences from Stanford University, advised by Marc Feldman, and is now an assistant professor in the School of Computer Science at Carnegie Mellon University. Her research program at Carnegie Mellon focuses on developing models and computational approaches to understand the evolutionary properties of a population's spatial arrangement, with the goal of designing populations with desired evolutionary properties and outcomes.

Datta, Vishaka

"Applying principal geodesic analysis to discover gene expression patterns co-varying with nuclear shape from in situ transcriptomics data"

Cells exhibit variation in their transcriptional states, as captured by single cell RNA-sequencing. While these states are presumed to have distinct phenotypic properties, their functional and clinical relevance can be hard to ascertain. However, since changes in cellular and nuclear morphology are associated with phenotypic changes, we propose that coupling a cell’s transcriptional state with its morphology may help distinguish clinically relevant cell states from more minor cell state changes. The Xenium in situ transcriptomics platform provides a unique opportunity to infer genes whose expression co-vary with nuclear and cellular morphology. We present a novel application of a shape analysis technique, called principal geodesic analysis (PGA), to a sample Xenium dataset of melanoma. PGA permits a computationally efficient embedding of shapes given the positions of points on its boundary which then enables the inference of genes whose expression co-vary with shape. We discover several modes of nuclear shape variation — pinching, elongation, and a disc-to-triangle squeezing mode --- across cells. Sets of genes that co-vary with malignant nuclear morphology show an association with targeted therapy resistance and re-capitulate known aspects of melanoma progression. Our eventual goal is to embed cells in a joint gene expression and nuclear shape space, as a clustering in this space will highlight clinically relevant cell states.

El-Kebir, Mohammed – UIUC

"Comparing and Summarizing Cancer Phylogenies"

Cancer phylogenies are key to understanding tumor evolution. However, due to the uncertainty in phylogenetic estimation, one typically infers many, equally plausible phylogenies from DNA sequencing data of tumors, hindering downstream analysis that relies on correct phylogenies. In this talk, we will discuss several distance metrics and techniques to compare and summarize tumor phylogenies.

Bio:

Dr. Mohammed El-Kebir is an Associate Professor of Computer Science at the University of Illinois Urbana-Champaign. Prior to joining Illinois, he was a postdoctoral research associate at Princeton University and Brown University. He received his PhD in 2015 at Centrum Wiskunde & Informatica (CWI) and VU University Amsterdam in the Netherlands. His research is in combinatorial optimization algorithms for problems in computational biology, with a particular focus on cancer evolution.

Ghersi, Dario - University of Omaha

"Studying Cancer Dynamics with Agent-Based Modeling"

In my research talk, I will show that pancreatic ductal adenocarcinoma (PDAC) progression is accompanied by a striking and previously unrecognized level of geometric and spatial organization. After building a large-scale, AI-assisted and expert-curated PDAC atlas with over 140,000 structures from H&E-stained slides, we developed SHAPE, a novel computational tool to quantify tumor morphology and architecture. We found that morphological and architectural traits correlate with genomic instability, specific molecular subtypes, and spatially localized invasive programs. Our findings offer new morphological and architectural hallmarks of PDAC aggressiveness and demonstrate that computational analysis of tumor geometry can inform prognosis, therapeutic response, and clinical decision-making.

In my tutorial, I will provide an introduction to agent-based modeling (ABM) as a powerful, intuitive framework for studying cancer’s dynamics, using time-dependent response to therapy as an example. I will briefly cover ABM fundamentals, then highlight current challenges in scalability, parameterization, and validation. Next, I will survey leading ABM platforms and discuss cancer models that showcase emergent tumor properties. I will conclude with an overview of our language-agnostic pipeline that converts high-level YAML specifications into executable code, advancing transparency and reproducibility in ABM.

Bio:

Dario Ghersi earned his M.D. from the University of Genoa, where he focused on studying immune system dynamics with agent-based modeling, and his Ph.D. in Computational Biology from Mount Sinai / New York University, designing algorithms to identify and characterize protein-binding sites. As a postdoctoral fellow at Princeton University, he combined network-based and structural approaches to explore protein function, with a special focus on cancer mutations. He is now Associate Professor of Biomedical Informatics and Graduate Program Chair at the University of Nebraska at Omaha. Dr. Ghersi’s current research centers on developing and applying computational tools that integrate multi-omics, clinical, and imaging data to study tumor progression.

Hach, Faraz - University of British Columbia

“A Computational Framework for Identifying Alternative Splicing from Single-Cell Long-Read RNA-Seq”

Alternative splicing (AS) increases protein diversity by producing different isoforms from the same gene. In cancer, AS plays a role in oncogenesis, and tumour-specific isoforms can serve as drug targets. Detecting AS isoforms at cell-type resolution is crucial for cancer research and treatment. Single-cell RNA-seq (scRNA-seq) relies on short-read (SR) sequencing and thus faces limitations in identifying full AS isoforms. Long-read (LR) RNA-seq can detect full-length AS isoforms but lacks cell-type resolution. A hybrid approach using LR sequencing in scRNA-seq shows promise but requires the development of specialized computational methods beyond existing scRNA-seq pipelines. In this talk, I will present a computational framework for detecting AS isoforms at the cell-type level using LR scRNA-seq data, without relying on a transcript annotation database. This method advances our ability to study transcriptomic complexity in cancer and other heterogeneous diseases.

Bio:

Dr. Faraz Hach is an Associate Professor in the Department of Urologic Sciences at the University of British Columbia and a Senior Research Scientist at the Vancouver Prostate Centre. He also serves as Associate Director of the UBC Bioinformatics Graduate Program. His research focuses on developing algorithms and machine learning models to enhance the diagnosis and understanding of diseased genomes, with an emphasis on cancer and male infertility. In recent years, his work has concentrated on advancing computational methods for analyzing the cancer transcriptome at single-cell resolution, with a specific emphasis on detecting alternative splicing events using long-read sequencing technologies.

Hannenhalli, Sridhar - NCI

"Identifying Prognostic Cell State Interactions in the Tumor Microenvironment of IDH-Mutant Gliomas"

Intercellular communication between distinct transcriptional states of various cell types in the tumor microenvironment (TME) influence the progression and clinical outcome of the tumors. Details of such clinically relevant cellular state interactions (CSIs) in IDH-mutant gliomas remain obscure. We will discuss CSI-TME, a computational pipeline that deconvolves cell type-specific gene expression from bulk transcriptomic data, identifies distinct cell states, and uncovers prognostic cell state interactions by modelling the clinical data based on the joint activity of cell state pairs. Time permitting, we will discuss an addition project where we explore the extent to which long-term drug resistance is an inherent early response to cytotoxic stress, conserved across evolution.

Bio:

Dr. Hannenhalli obtained a B. Tech from the Indian Institute of Technology and his Ph.D. in Computer Science from the Pennsylvania State University. After a postdoctoral fellowship at the University of Southern California (1996-1997), he worked as a Senior Scientist at Glaxo Smith-Kline and then at Celera Genomics, where he was involved in the work reporting the first human genome sequence. He was a faculty member in the Department of Genetics at the University of Pennsylvania, and then at the University of Maryland (2010-2019). Dr. Hannenhalli served as Interim Director of the Center for Bioinformatics and Computational Biology, and the director of Computational Biology Ph.D. program at UMD. He was a Fulbright Scholar (2017-2018) and was a visiting faculty at the Indian Institute of Sciences, and National Center for Biological Sciences. Since 2019 he has been the head of Gene regulation section in the Cancer Data Science Lab at the NCI. The Hannenhalli lab is broadly interested in developing computational approaches harnessing multi-omics data to understand regulatory underpinning of cancer.

Hormozdiari, Fereydoun - UC Davis

"Computational and AI Models for Analyzing Non-Coding Regions and Cell-Free RNA in Cancer Detection"

"Recent advances in cancer genomics have underscored the pivotal role of non-coding regions and cell-free RNA (cfRNA) in both tumorigenesis and non-invasive diagnostics. I will first present Dr.Nod, a computational framework for identifying non-coding cis-regulatory driver elements associated with gene dysregulation, leveraging tissue-matched enhancer-gene maps. We demonstrate that somatic mutations in these regulatory elements can disrupt transcription factor binding, leading to oncogene activation and tumor suppressor silencing, highlighting the functional significance of distal non-coding regions in cancer.

Next, I will introduce two AI models for cfRNA analysis in liquid biopsies. First, Orion, a multi-task variational autoencoder, demonstrates high sensitivity and specificity in detecting non-small cell lung cancer using serum-derived orphan non-coding RNAs. Next, I will introduce a transformer-based foundation model that integrates RNA sequence and abundance data to capture biologically meaningful cfRNA representations, improving the robustness and accuracy of cancer detection. Together, these approaches demonstrate how advanced computational methods and AI models can decode the regulatory landscape of cancer and accelerate the development of cfRNA-based cancer diagnostics. "

Bio:

Dr. Fereydoun Hormozdiari, an associate professor at the University of California, Davis (UC-Davis), leads a lab focused on computational biology and genomics. He holds a BSc in Computer Engineering from Sharif University of Technology, and MSc and PhD degrees in Computing Science from Simon Fraser University, where his doctoral thesis earned him the Governor General Academic Medal. Dr. Hormozdiari's significant contributions have been recognized with the Solan Research Award and the NSF CAREER Award. He has also played a key role in influential consortiums such as the 1000 Genomes Project and the Great Ape Genome Project. His current research focuses on developing computational algorithms and designing machine learning and AI approaches to study human health and diseases, with a particular emphasis on autism and cancer.

Huang, Haiyan - UC Berkeley

"A Deep Learning Approach to Contextual Gene Prioritization in RNA-Seq Data"

Biological systems function through the interactions of numerous molecules that influence a wide range of biochemical reactions. However, many of these systems remain only partially understood. In my presentation, I will introduce a deep learning–based semi-supervised approach that uses bulk or single-cell RNA-Seq gene expression data, along with a set of “bait” genes (i.e., those already known to be relevant to a given biological process)

to identify additional context-specific genes involved in that process. This approach has the potential to highlight the functional importance of known but poorly studied genes, and its underlying framework can also be applied to study cancer-specific pathways.

Bio:

Haiyan Huang received her Ph.D. in Applied Mathematics from the University of Southern California in 2001. She was a postdoctoral fellow at Harvard University from 2001 to 2003. Currently, she is a Professor in the Department of Statistics at UC Berkeley. She served as the Director of the Center for Computational Biology at UC Berkeley from July 2019 to June 2022, and as Chair of the Department of Statistics from July 2022 to June 2025.

As an applied statistician, her research lies at the interface between statistics and data-rich scientific disciplines such as biology. In recent decades, advances in biological technologies have generated vast amounts of high-dimensional, complex, and noisy data, requiring innovative statistical and computational methods for meaningful analysis. Her group is dedicated to addressing various modeling and analysis challenges arising from such data.

Jiang, Peng - CDSL

"Cancer immunology data engine reveals secreted AOAH as a potential immunotherapy"

Secreted proteins are central mediators of intercellular communications and can serve as therapeutic targets in diverse diseases. The ~1903 human genes encoding secreted proteins are difficult to study through common genetic approaches. To address this hurdle and, more generally, to discover cancer therapeutics, we developed the Cancer Immunology Data Engine (CIDE, https://cide.ccr.cancer.gov), which incorporates 90 omics datasets spanning 8575 tumor profiles with immunotherapy outcomes from 17 solid tumor types. CIDE systematically identifies all genes associated with immunotherapy outcomes. Then, we focused on secreted proteins prioritized by CIDE without known cancer roles and validated regulatory effects on immune checkpoint blockade for AOAH, CR1L, COLQ, and ADAMTS7 in mouse models. The top hit, AOAH (Acyloxyacyl Hydrolase), potentiates immunotherapies in multiple tumor models by sensitizing T-cell receptors to weak antigens and protecting dendritic cells through depleting immunosuppressive arachidonoyl phosphatidylcholines and oxidized derivatives.

Bio:

Dr. Peng Jiang started his research program at the National Cancer Institute (NCI) in July 2019 and was awarded tenure in July 2025. His Lab focuses on developing big-data and artificial intelligence frameworks to identify biomarkers and new therapeutic approaches for cancer immunotherapies in solid tumors. Before joining NCI, he finished his postdoctoral training at the Dana Farber Cancer Institute and Harvard University. Dr. Peng finished his Ph.D. at the Department of Computer Science & Lewis Sigler Genomics Institute at Princeton University, and his undergraduate study with the highest national honors at the Department of Computer Science at Tsinghua University, Beijing, China (GPA rank 1st in his year). He is a recipient of the NCI K99 Pathway to Independence Award, the Scholar-In-Training Award of the American Association of Cancer Research, the Technology Innovation Award of the Cancer Research Institute, and the NCI Director's Award of Data Science.

Kim, Youngwook - National Cancer Center, Republic of Korea

“Comprehensive Proteogenomic Characterization Reveals Clinically Relevant Molecular Subtypes Associated with Medulloblastoma Progression”

Current treatment strategies for medulloblastoma remain ineffective due to extensive tumor heterogeneity. Methods: we generated five platform of omics data including LCMS/MS-based proteome and performed integrated multi-omic characterization to improve the conventional molecular classification of medulloblastoma. Results We identified seven refined distinct subtypes. The SHH group was reclassified into two subgroups, SHHα and SHHβ, while group 4 was divided into three subgroups, G4α, G4β, and G4γ. SHH and Group 4 subtypes exhibit two distinct neuronal differentiation trajectories: granular neuron (GN) and unipolar brush cell (UBC) differentiation (SHHβ and G4γ, respectively), both of which associated with more favorable clinical outcome. Furthermore, we uncovered unique proteomic and kinomic properties that conferred increased treatment vulnerabilities to targeted therapeutic interventions against each of the three medulloblastoma subtypes associated with poor clinical outcome. We demonstrated the therapeutic potential of exploiting these vulnerabilities by utilizing a proteasome inhibitor and subtype-specific agents, including CDK1/2, PARP, CLK1, and MET inhibitors. Mechanistic insights were further elucidated through in-depth proteome analyses. Conclusions: Our study qualifies the use of proteomic signatures and activation of neuronal differentiation trajectories to tailor selective therapeutic opportunities for distinct subgroups of medulloblastoma patients.

Bio: coming soon

Kolmogorov, Mikhail - NCI

"Harmonizing phased structural variation and copy number profiles from tumor long-read tumor sequencing"

Cancer genomes are often characterized by complex, unbalanced karyotypes, which could be characterized through a set of somatic copy number alterations (CNAs), large-scale amplification or deletions, ranging from several megabases to whole chromosomes in size. Long-read analysis of tumor genomes offer advantages over short-read methods, but most current CNA inference methods were developed for short reads. Here we present Wakhan, a method for CNA detection in cancer using long reads. Instead of modeling mixtures of two alleles using biallelic SNV frequencies, Wakhan uses long-read phasing to separate haplotypes and incorporates somatic SV breakpoints to improve CNA profile reconstruction and achieve chromosome-level phasing. We benchmark Wakhan against the popular short- and long-read tools using a comprehensive set of cancer cell lines and demonstrate more purity/ploidy estimation and higher precision to small-to-medium CNA events. Further, we show that in combination with accurate somatic SV calls, Wakhan can reconstruct CNA profiles of complex amplification profiles, such as breakage-fusion-bridge or seismic amplification.

Bio:

Mikhail is currently a Stadtman investigator at the National Cancer Institute, where he leads a group focusing on computational and cancer genomics. Prior to that, Mikhail was a postdoctoral fellow at the UC Santa Cruz, supervised by Dr. Benedict Paten. Mikhail completed his Ph.D. in September 2019 in Computer Science from UC San Diego, under the mentorship of Dr. Pavel Pevzner. Mikhail received his M.S. in bioinformatics from St. Petersburg Academic University, Russia.

Landi, Tere - NCI

"Deciphering the mutational landscape and evolutionary processes in lung cancer"

Lung cancer is the second most common cancer type and the first cause of cancer death worldwide. About 10-25% of lunch cancers occur amount never smokers. We evaluated the mutagenic processes shaping the genome landscape of lunch cancers examining deep whole-genome sequencing from 1217 samples (of which 871 from never smokers). Samples were collected from 28 geographical locations and centers across the world within the Sherlock-Lung study.

Understanding the mutagenic processes provides insight into the genesis of these tumors, elucidating the diversity of mutational processes. We identified novel mutational signatures associated with lung cancers in never smokers, which differed ancestry and geographical locations. Moreover, in a subset of 542 lung adenocarcinomas (LUAD, the most common histological subtype) that displayed diverse clonal architecture, we observed divergent evolutionary trajectories based on tobacco smoking exposure, ancestry, and sex. Importantly we found the mutational signature ID2 is a marker of previously unrecognized mechanism for LUAD evolution, implicating the important role of L 1 retrotransposition-induced mutagenesis. The complex nature of lung cancer evolution creates both challenges and opportunities for screening and treatment plans.

Bio:

Dr. Landi is an M.D., Ph.D. with training in clinical oncology and molecular epidemiology. She is Senior Advisor for Genomic Epidemiology, Trans-Divisional Research Program, and Senior Investigator, Divisin on Cancer Epidemiology and Genetics, National Cancer Institute, NIH. She focuses her research on the genetic and environmental determinants of lung cancer and melanoma, and on the genomic characterization of these tumors. She is the Principal Investigator of both EAGLE and Sherlock-Lung, two landmark studies of lunch cancer in smokers and never smokers, respectively, which identified subtypes with distinct genome features, mutational signatures, and evoutionary trajectories. She is also the leader of the MelaNostrum consortium which examines melanoma risk and clinical features in Mediterranean populations. She leads a very large melanoma family study and a melanoma case-control study, which identified both high penetrant and susceptibility genes and multiple common genetic variants influencing the risk of melanoma.

Luna, Augustin - NLM, NCI

"Integrative Modeling of Drug Response & Resistance Using Big Data & Network Pharmacology"

This talk presents ongoing efforts that are part of network-focused projects leveraging both machine learning and artificial intelligence (i.e., large language models) to advance the use of data-driven decisions with explainable rationale for cancer care. I will describe our work in structuring and annotating 1) biological pathway information and 2) data from experimental models (e.g., cell lines). I will then highlight applications where we seek to better understand the biology of particular cancers, including both their drug response and resistance to therapy, as well as our efforts to map results between model systems and patients. We will discuss limitations of current technologies in biology and healthcare data, along with future directions for AI in medicine.

Bio:

Augustin Luna is a Distinguished Scholar and Stadtman Tenure-track Investigator with a primary appointment at the National Library of Medicine and a secondary appointment at the National Cancer Institute. Augustin has a Ph.D. in Bioinformatics from Boston University as part of a joint program with the National Cancer Institute. He earned a B.Sc. in Biomedical Engineering from the Georgia Institute of Technology and was previously a Research Associate at Harvard Medical School with an affiliation at the Broad Institute of MIT/Harvard. Before this, Dr. Luna was a post-doctoral researcher at the Memorial Sloan-Kettering Cancer Center in New York City and the Dana-Farber Cancer Institute in Boston. Dr. Luna has received the Fund for Innovation in Cancer Informatics (ICI) Discovery Grant, BroadIgnite Award, Ruth L. Kirschstein National Research Service Award (NRSA), and, earlier, a Dissertation Fellowship from the Ford Foundation.

Ma, Lichun - NCI

"Resolving the spatial organization of cellular communities in liver cancer"

The development of liver cancer involves an intricate interplay among various cell types within the liver. Unraveling the orchestration of these cells may hold the key to deciphering the underlying mechanisms of this complex disease. The advancement of single-cell and spatial technologies has revolutionized our ability to determine cellular neighborhoods and understand their crucial roles in disease pathogenesis. Here, we apply these approaches to determine the spatial landscape of liver cancer, which may offer insights into the molecular mechanisms underlying tumor heterogeneity and tumor evolution and pave the way for effective therapeutic interventions.

Bio:

Dr. Ma received her Ph.D. degree in Electronic Engineering at the City University of Hong Kong in 2016. After a one-year postdoctoral fellowship at Nanyang Technological University, Singapore, she joined NCI in 2017 as a postdoctoral fellow where she studied cancer biology using single-cell techniques. She initiated her independent research program at NCI as a Stadtman Investigator in 2022. Dr. Ma has a strong background in mathematics, information theory and machine learning. She received many awards during her training, including the NCI CCR Excellence in Postdoctoral Research Transition award. Her recent work on tumor cell biodiversity and microenvironmental reprogramming in liver cancer was showcased in the 2019-2020 NCI Center for Cancer Research Milestones publication.

Przytycka, Teresa - NLM, NIH

"Mutational signatures in cancer: interplay between DNA damage and repair"

Cancer genomes accumulate a large number of somatic mutations resulting from various endogenous and exogenous causes, including normal DNA damage and repair, cancer-related aberrations of the DNA maintenance machinery, and mutations triggered by carcinogenic exposures. Different mutagenic processes lead to different patterns of somatic mutations called mutational signatures. We can think of mutational signatures as fingerprints of the corresponding mutagenic processes that occur. Analysis of these signatures has emerged as an important approach to understanding the mutagenic processes and their repair mechanisms.

Bio:

Teresa Przytycka is a Senior Investigator at National Library of Medicine. The research in her group focuses on computational methods advancing the understanding of biomolecular systems and the emergence of complex phenotypes, including cancer. Her group also develops new computational approaches to study gene regulation including methods to reconstruct Gene Regulatory Networks, DNA conformational dynamics, single cell analysis, mutational processes in cancer and drug response. Teresa Przytycka serves as an editor of serval computational biology journals and is a member of the steering committee of RECOMB – a prestigious computational biology conferences bridging the areas of computational, mathematical, statistical, and biological sciences. She is an elected fellow of the International Society for Computational Biology (ISCB) and American Institute for Medical and Biological Engineering (AIMBE).

Popic, Victoria – Broad Institute

"Graph representation learning for complex structural variant discovery"

Structural variants (SVs), which encompass the broad class of large-scale genome rearrangements, are key drivers of tumor initiation and progression. Their accurate discovery, however, is challenging, due to the considerable diversity and complexity of SV signatures (typically derived from read alignments to a reference genome). This talk will cover our latest work on graph representation learning approaches for SV discovery, which aim to accurately capture both known and novel classes of complex structural variation in the genome.

Bio:

Victoria Popic is a Director of Computational Research and Development at Broad Clinical Labs, where she leads a lab focused on the development of deep learning approaches for the characterization and interpretation of the genome and the mechanisms that drive disease. Dr. Popic earned her Ph.D. in computer science from Stanford University. She also holds a B.S. in computer science, a B.S. in mathematics, and an M.Eng. in computer science from MIT. Prior to the Broad, she spent several years working in industry, conducting research on DNA sequencing at Illumina and working on compilers at SambaNova Systems.

Raphael, Ben - Princeton

"Mapping tumor heterogeneity across space and time"

Tumors are heterogeneous mixtures of cancerous cells that evolve over time and non-cancerous cells that interact in distinct spatial niches within the tumor microenvironment. Spatial transcriptomics technologies measure RNA expression at thousands of locations in a 2D tumor slice quantifying important features of tumor heterogeneity such as the spatial distribution of cell types and spatial variation in gene expression. However, due to technical limitations these measurements are typically sparse with high rates of missing data. I will describe computational approaches that overcome these limitations by modeling the geometry of individual tumor slices and integrating measurements from multiple slices. We use these algorithms to derive gene expression gradients in the tumor microenvironment, construct 3D tumor atlases, and infer spatial tumor evolution across multiple cancer types.

Bio:

Department of Computer Science, Princeton University

Ruppin, Eytan CDSL, NCI, NIH

"Towards fast and accessible precision oncology directly from the good old histopathology slides"

Precision oncology has made significant advances, mainly based on costly and time/labor intensive DNA and RNA sequencing. Here I will describe four new approaches for fast and low-cost prediction of patient response to cancer treatments directly from the tumor pathology H&E slides: (1) DeepPT/Enlight: inferring bulk tumor transcriptomics and patient response [Nat Cancer 2024], (2) TIME-ACT: inferring the immune activation levels of the tumor microenvironment (TME ‘hotness’) and ICB response [biorxiv 2025] (3) Path2Space: inferring spatial transcriptomics to identify spatially grounded biomarkers of treatment response [biorxiv 2025], and finally, (4) Path2Cell: inferring spatial transcriptomics at a single cell resolution [ongoing work]. As time permits, I will describe new approaches to predict ICB response and toxicity from the blood. Finally, I will discuss the challenges and roadmap laying ahead towards democratizing precision oncology.

Bio:

Eytan Ruppin received his M.D. and Ph.D. from Tel-Aviv University where he has served as a professor of Computer Science & Medicine since 1995, conducting computational multi-disciplinary research spanning computational neuroscience, natural language processing, machine learning and systems biology. In 2014 he joined the University of Maryland as director of its center for bioinformatics and computational biology (CBCB). In 2018 he moved to the NCI where he founded and is chief of its Cancer Data Science Lab (CDSL, https://ccr.cancer.gov/cancer-data-science-laboratory). His research is focused on developing new computational approaches for advancing precision oncology, leading to a few ongoing clinical trials. Eytan is a fellow of the International Society for Computational Biology (ISCB), a recipient of the NCI Director award for his work on precision oncology (2022), the DeLano award for computational biosciences for his work on synthetic lethality (2023) and the NIH director award for developing new computational paradigms for precision oncology (2024). He is a member of GSK Oncology, ProCan and WIN consortium scientific advisory boards and a co-founder of a few precision medicine startup companies.

Sashittal, Palash – Virginia Tech

"Inferring Cell Lineage Trees and Differentiation Maps from Lineage Tracing Data"

Reconstructing the cell lineage tree and differentiation map of cellular populations is essential not only for understanding normal development but also for studying complex diseases such as cancer. In tumors, lineage relationships and dysregulated cell-type transitions drive critical processes including clonal evolution, treatment resistance, and cancer cell plasticity. Recent advances in lineage tracing technologies have enabled simultaneous measurement of heritable barcodes and transcriptional states at single-cell resolution. However, these technologies do not capture every cell division or differentiation event, creating a need for computational methods to reconstruct the developmental and evolutionary history of cells from the lineage tracing data. In this talk, I will present two methods, Startle and Carta, to infer cell lineage trees and differentiation maps from single-cell lineage tracing data, respectively. I will demonstrate the application of these tools to developmental systems and discuss their potential to study tumor evolution and differentiation in cancer.

Bio:

Dr. Palash Sashittal is an Assistant Professor in the Department of Computer Science at Virginia Tech. Prior to joining Virginia Tech, he was a Postdoctoral Research Associate with Prof. Ben Raphael in the Computer Science Department at Princeton University. He received a Ph.D. in Aerospace Engineering and M.S. in Computer Science from the University of Illinois Urbana-Champaign (UIUC), and B.Tech. in Aerospace Engineering from Indian Institute of Technology Bombay (IIT Bombay). His research focuses on the design of combinatorial and statistical algorithms to analyze and interpret sequencing data. Recent areas of emphasis include infectious disease evolution and transmission, cancer genome evolution, and cell fate mapping of developmental systems.

Sahinalp, S. Cenk - CDSL, NCI

"Identifying Robust Subclonal Structures via Tumor Progression Tree Alignment"

Understanding and comparing tumor evolutionary histories is fundamental to cancer genomics, with direct implications for tracking subclonal population dynamics, treatment resistance, and tumor heterogeneity. Clonal trees are rooted, unordered trees in which each node represents a subclone labeled by a set of distinct mutations, widely used to model tumor progression. However, no existing computational approach offers efficient or principled means to align clonal trees and compare their subclonal architectures, limiting the robustness of downstream inferences.

We will introduce OMLTED, the optimal multi-label tree edit distance between two clonal trees, as the minimum number of mutation labels that must be deleted so that the remaining trees become isomorphic, thus representing their optimal alignment. Computing OMLTED and the corresponding clonal tree alignment is NP-hard; we will present a fixed-parameter tractable algorithm – which is exponentially faster than the state-of-the-art classical tree edit distance algorithm by Akutsu et al.

We have implemented our algorithm to provide the first practical computational tool for determining the optimal alignment between clonal trees. An application of our tool to both simulated and real single cell tumor sequencing datasets demonstrates that it can identify common evolutionary trajectories among clonal trees representing (i) distinct tumors, (ii) distinct samples from the same tumor, (iii) distinct sequencing data from the same sample, or (iv) outputs of distinct analysis tools.

Bio:

S. Cenk Sahinalp has been a Senior Investigator at the Cancer Data Science Laboratory in CCR, NCI, NIH since 2019. Sahinalp completed his B.Sc. in Electrical Engineering at Bilkent University, Ankara, Turkey, and his Ph.D. in Computer Science at the University of Maryland, College Park. His 1997 Ph.D. thesis introduced the first work optimal parallel algorithm for suffix tree construction and the first linear time algorithm for pattern matching within sublinear edit distance. After a brief postdoctoral fellowship at Bell Labs, Murray Hill, he has worked as a Computer Science professor, most recently at Indiana University, Bloomington.

Sahinalp’s research has focused on combinatorial algorithms and data structures, primarily for strings/sequences, and their applications to biomolecular sequence analysis, especially in the context of cancer. In the past decade, his lab has developed several algorithmic methods for efficient and effective use of high-throughput sequencing data for better characterization of the structure, evolution, and heterogeneity of cancer genomes.

Scheuermann, Richard - NLM

"Interpretive Analysis of scRNAseq data: Marker Genes, Expressed Genes, and Differential Expression"

Single cell transcriptomic analysis is revolutionizing our understanding of the cellular complexity of human tissues. Once a cell by gene expression matrix is constructed from the next generation sequence data, a variety of cellular characteristics can be derived using different computational methods. Although differential expression analysis is a common approach to identify distinct characteristics of the cell types and cell states observed, alternative methods, such as NS-Forest, based on random forest machine learning, show superior performance characteristics for cell type-specific marker gene identification. In addition, methods that can identify all of the genes expressed in a given cell type are more useful to obtain a complete picture of cellular phenotypes in comparison to differential expression analysis. And in-depth description of these techniques and their advantages and disadvantages will be discussed.

Bio:

Richard H. Scheuermann, PhD is the Scientific Director of the National Library of Medicine (NLM) where he leads a team of data scientists engaged in computational health and biomedical informatics research. Dr. Scheuermann received his Bachelor of Science in Life Sciences from the Massachusetts Institute of Technology and his Ph.D. in Molecular Biology from the University of California, Berkeley. Before joining NLM in 2023, Dr. Scheuermann was the Director of Informatics and La Jolla Campus Director at the J. Craig Venter Institute (JCVI) and served as an Adjunct Professor of Pathology at the University of California, San Diego and an Investigator in the Center for Infectious Disease and Vaccine Research at the La Jolla Institute for Immunology. His research interests are focused on the development of computational data mining algorithms and knowledge representation methods, with the goal of scalable translation of research data into computable biomedical knowledge. In his career, he has published over 200 peer-reviewed scientific papers. He currently serves as the chair of the NIH Intramural Research Program Artificial Intelligence (AI) Task Force, a member of the NIH Coordinating Committee for Autoimmune Disease Research as part of the Office of Research on Women’s Health, and a member of the AI Task Force: Research & Discovery Working Group for the Department of Health and Human Services

Schwartz, Russell – Carnegie Mellon University

" Mathematical Programming for Multimodal Data Integration in Cancer Clonal Evolution"

This lecture will explore the use of mathematical programming, a broadly useful way of posing computational optimization problems, for various applications in studying clonal evolution in cancers. We will explore some basic concepts in modeling optimization problems for cancer genomic data with a particular focus on the increasing role of multimodal methods in modern cancer genomics. We will cover some of the main algorithmic principles behind solving these problems efficiently, by seeing how this general framework applies to diverse problems in cancer clonal lineage analysis. We will illustrate the power and versatility of these tools through applications ranging from simple variants of genomic deconvolution and phylogeny inference to hard real-world problems in reconstructing somatic evolution, resolving temporal dynamics of clonal populations, and planning multimodal studies.

Bio:

Dr. Russell Schwartz received his B.S., M.Eng., and Ph.D. degrees from the Department of Electrical Engineering and Computer Science (EECS) at the Massachusetts Institute of Technology, the last in 2000. This was followed by postdoctoral work in the MIT Biology Department studying protein biophysics and research as an Informatics Research Scientist at Celera Genomics Corporation, where he was involved in some of the first efforts to sequence the human genome and study human genetic variation at a whole-genome scale. Since leaving Celera in 2002, he has been a faculty member of Carnegie Mellon University where he is currently Professor and Head of the Ray and Stephanie Lane Computational Biology Department and Professor of Biological Sciences with additional appointments in the Computer Science Department and Machine Learning Department. His laboratory has worked broadly on algorithms, machine learning, and simulation methods for computational genetics, genomics, and biophysics, with a current focus largely on computational methods development for studying cancer and somatic evolution. He is also active in bioinformatics education, largely through work with the International Society for Computational Biology (ISCB), where he currently serves as a Vice President of the Society and co-chair of its Education Community of Special Interest (COSI).

Singh, Mona - Princeton

Talk Title: coming soon

Abstract: coming soon

Bio: coming soon

Weghorn, Donate – University of Barcelona

"Mutational signature decomposition with deep neural networks"

DNA mutational processes generate patterns of somatic and germline mutations. A multitude of such mutational processes has been identified and linked to biochemical mechanisms of DNA damage and repair. Cancer genomics relies on these so-called mutational signatures to classify tumors into subtypes, navigate treatment, determine exposure to mutagens, and characterize the origin of individual mutations. Yet, state-of-the-art methods to quantify the contributions of different mutational signatures to a tumor sample frequently fail to detect certain mutational signatures, work well only for a relatively high number of mutations, and do not provide comprehensive error estimates of signature contributions. Here, we present a novel approach to signature decomposition using artificial neural networks that addresses these problems. We show that our approach, SigNet, outperforms existing methods by learning signature patterns and their correlations present in real data. Unlike any other method we tested, SigNet achieves high prediction accuracy even with few mutations. We used this to discover novel associations of mutational signatures with tumor hypoxia, including strong positive correlations with the activities of clock-like and defective-DNA-repair mutational processes. We further exposed putative links between homologous recombination deficiency and the ubiquitous yet enigmatic signatures SBS5 and SBS40. These results provide insights into the interplay between tumor biology and mutational processes and demonstrate the utility of our novel approach to mutational signature decomposition, a crucial part of cancer genomics studies.

Bio:

Dr. Weghorn has been a group leader at the Centre for Genomic Regulation (CRG) in Barcelona, Spain, since October 2018. Her research focuses on modeling evolutionary processes with an emphasis on cancer tumors and the human lineage.

A physicist by training, Dr. Weghorn received her PhD working on population genetics problems at the University of Cologne, Germany, in 2012. During her time as a postdoc at Harvard Medical School in Boston, she developed several algorithms for inferring the selection on protein-coding genes and non-coding regions in cancer tumors. Dr. Weghorn also derived quantitative estimates of recent selection acting on genes in the human lineage.

The Weghorn group at the CRG is particularly interested in how the evolution and survival of cancer cell populations depends on mutation influx and how selection can be inferred from observed mutation data. To this end, we develop mathematical and computational approaches to estimate mutation probabilities and selection. Estimates of the strength of selection in cancer allow prioritization of genes and non-coding regions according to their disease relevance, with the ultimate aim of promoting therapeutic advances. We are also interested in mutation rates and selection inference in the context of human genetic variation, including polymorphisms and de novo variants. Here, a particular focus of the group is the description of purifying selection in humans and other species, considering both mutational processes and the effects of genetic drift.

The Weghorn Group is part of the “Computational Biology and Health Genomics” program at the CRG. Further information can be found at https://weghornlab.net/ and at www.crg.eu/en/programmes-groups/weghorn-lab.

Welch, Joshua – University of Michigan

"Inferring ligand-receptor signaling and differentiation from spatial transcriptomic data"

New technologies for high-resolution spatial transcriptomic measurement provide exciting new opportunities to investigate how cell signaling and spatial context contribute to normal and aberrant cell differentiation within complex tissues. In this talk, I will present CytoSignal and TopoVelo, two new computational tools for investigating these questions. CytoSignal infers cell signaling from spatial transcriptomic data by predicting the amount of ligand-receptor protein-protein interaction that occurs at each position within a tissue. This approach can identify spatial gradients of signaling, signaling-associated differentially expressed genes, and temporal dynamics of signaling at each position. To validate CytoSignal predictions, we generated the first dataset featuring PLA and spatial transcriptomics for the same spatial positions. TopoVelo extends the RNA velocity framework to spatial transcriptomic data by jointly modeling the differentiation of cells across a tissue with spatially coupled differential equations. TopoVelo identified the degree of influence among neighboring cells during differentiation and accurately predicts the directions of cell differentiation and migration across a variety of spatial transcriptomic datasets.

Bio:

Joshua Welch is an Associate Professor of Computational Medicine and Bioinformatics and Computer Science and Engineering at the University of Michigan. He earned his PhD in Computer Science from the University of North Carolina at Chapel Hill and performed postdoctoral research at the Broad Institute of Harvard and MIT. His lab integrates computational and experimental approaches for single-cell and spatial omic analysis. He also applies these tools to study how cells differentiate to their final fates, particularly in the context of neurodevelopment. His work has been published in top journals, including Nature, Cell, and Nature Biotechnology and funded by the NIH and the Chan-Zuckerberg Initiative.

Zook, Justin - NIST

The Genome in a Bottle Consortium (GIAB), hosted by the National Institute of Standards and Technology (NIST), is developing new matched tumor-normal samples, the first explicitly consented for public dissemination of genomic data and cell lines. We recently published a comprehensive genomic dataset from the first individual, HG008, including DNA from an adherent, epithelial-like pancreatic ductal adenocarcinoma (PDAC) tumor cell line and matched normal cells from duodenal and pancreatic tissues. Data for the tumor-normal matched samples comes from seventeen distinct state-of-the-art whole genome measurement technologies, including high depth short and long-read bulk whole genome sequencing (WGS), single cell WGS, Hi-C, and karyotyping. GIAB has used these data to develop matched tumor-normal benchmarks for somatic variant detection. We have also generated near-complete assemblies of the normal tissue and dominant tumor clone, enabling phasing of the somatic variants and resolution of complex structural variants in centromeres. We expect these data to facilitate innovation for whole genome measurement technologies, de novo assembly of tumor and normal genomes, and bioinformatic tools to identify small and structural somatic variants.

Bio:

Dr. Justin Zook co-leads the Biomarker and Genomic Sciences Group at the National Institute of Standards and Technology. Since 2013, he has co-led the Genome in a Bottle Consortium’s work developing benchmark human genomes, including a new pancreatic tumor/normal pair.

SSACB Organizing Committee:

Mohammed El Kebir (University of Illinois Urbana-Champaign)

Vishaka Gopalan (NCI)

Mikhail Kolmogorov (NCI)

Salem Malikic (NCI)

Teresa Przytycka (NLM)

Ben Raphael (Princeton)

Cenk Sahinalp (NCI)

Mona Singh (Princeton)