Russell Schwartz, Carnegie Mellon University
" Constrained Optimization Problems in Cancer Clonal Evolution from Multimodal Data Integration"
This lecture will examine tools for constrained optimization, a broadly useful framework for describing computational inferences, applies in a variety of contexts in studying clonal evolution in cancers. We will review some of the basic biology and biotechnology applied in studying cancer clonal evolution today. We will then explore some basic concepts in defining constrained optimization problems and tools for solving them. We will then explore how this framework applies to diverse problems in studying cancer evolution, including to problems in genomic deconvolution, tumor phylogenetics, integrating multimodal data sources, and optimizing complex multimodal study designs.
Dario Ghersi, University of Nebraska
“More Than Meets the Slide: Computational Approaches in Digital Pathology for Segmentation and Registration"
Digital pathology is transforming our ability to study tissue architecture of human cancers, yet much of the information encoded in histological slides remains underutilized. In this talk, I will present computational approaches that move beyond single-slide analysis by focusing on semantic segmentation and cross-slide registration. First, I will describe the development of a large-scale, human-in-the-loop segmentation dataset for pancreatic cancer, and how it enabled systematic benchmarking of a range of machine learning models for accurate and biologically meaningful annotation of tumor and microenvironmental structures. I will then discuss methods for registering structures across serial sections to support 3D reconstruction of solid tumors. This problem introduces significant algorithmic challenges, including sparse correspondences, structural heterogeneity, and deformation across slices, but also opens the door to new biological insights into tumor organization and progression. Together, these approaches highlight how integrating segmentation and registration can reveal spatial features of cancer that are not apparent from individual slides alone.
Rachel Karchin, Johns Hopkins University
Tumors are not static targets. They are evolving populations of competing clonal lineages that shift dominance under therapeutic pressure. By integrating serial tissue biopsies and liquid biopsies with computational clonal tracking, we can uncover expanding and contracting lineages in real time. I will present some examples of how computational models can potentially support adaptive, evolution-informed therapy selection, transforming cancer treatment from a fixed strategy into a medicine that learns.
Jingyi Jessica Li, Fred Hutchinson Cancer Center
“FDR Calibration withy Synthetic Null Data: Controlling False Discoveries While Maintaining Power in High-Throughput Biology”
False discovery rate (FDR) control is essential for reliable inference in high-throughput biology, yet it is increasingly compromised in moder analyses due to data reuse, selection bias, and model misspecification. Common remedies such as data splitting or knockoff constructions often achieve FDR control at the cost of power loss and changes to existing workflows. In this talk, I present a unified framework for calibrated inference via synthetic null data, which achieves FDR control while preserving power and leaving original data and analysis pipelines intact. The central idea is to generate data-driven synthetic null data as in silico negative controls, apply the same estimation or testing procedure to both observed and synthetic data, and use their parallel contrast to calibrate significance thresholds. This framework was motivated by a common “double-dipping” issue in single-cell RNA-seq analysis, where the same data are used both to identify cell clusters and to test for clusters-specific marker genes, leading to clustering-induced bias. This challenge led to ClusterDE, which mitigates post-clustering bias in marker discovery across single-cell, spatial, bulk, and microbiome data. Building on this idea, we developed Nullstrap, a general framework for FDR-controlled variable selection in high-dimensional models without data splitting or knockoffs. I then presented Nullstrap-DE, an application of this framework for RNA-seq differential expression (DE) analysis, which calibrates popular tools such as DESeq2 and edgeR to improve FDR control under mild model violations while retaining high power. Together, these methods illustrate how synthetic null data provide a flexible and principled route to FDR calibration in high-throughput biological data analysis.
Victoria Popic, Broad Institute
Details of the talk will be posted soon
Kai Tan, Children’s Hospital of Philadelphia
“Algorithms for modeling tumor microenvironment from spatial omics maps”
Understanding the tumor microenvironment (TME) is critical for advancing cancer diagnostics, prognostics, and therapeutic strategies. Spatial omics technologies have emerged as powerful tools for characterizing the complex cellular and molecular landscape of tumors. This seminar will discuss advancements and challenges in this rapidly evolving field. Specifically, I will introduce three algorithms designed for 1) automated segmentation and cell annotation of imaging-based spatial omics maps; 2) identification and comparison of tissue cellular neighborhood and 3) de novo construction of cell-specific signaling pathways. I will present case studies from recent research to illustrate how these algorithms can reveal insights into how dynamic cell-cell interactions in the TME contribute to tumor heterogeneity, therapy response, and patient outcomes.
Atul Deshpande, Johns Hopkins University
Details of the talk will be posted soon
Wenyi Wang, MD Anderson
“Transfer Learning for Survival-based Clustering of Predictors with an Application to TP53 Mutation Annotation”
TP53 is the most frequently mutated gene in human cancers, and germline mutations in TP53 cause Li-Fraumeni syndrome (LFS), a hereditary predisposition to diverse cancers. Accurate annotation of TP53 mutations based on their survival effects is critical for informed LFS patient management. Motivated by this need, we develop a new approach for Survival-based Clustering of Predictors (SCP) by identifying homogeneous coefficients in Cox regression. We formulate this task as a fusion penalized Cox regression problem and provide an efficient computational algorithm. A nonconvex distance-to-set penalty is adopted to facilitate parameter tuning and improve estimation accuracy. To overcome data limitations, we further develop TL-SCP, a transfer learning extension that borrows coefficient ranking information from a source dataset under the assumption of similar ranking patterns between source and target. TL-SCP integrates ranking information through weighted rank averaging, allowing flexibility in accommodating cohort heterogeneity while maintaining model simplicity. Simulation studies demonstrate TL-SCP’s superior performance over SCP in clustering recovery and coefficient estimation. In the application of TP53 mutation annotation where we utilize non-LFS germline TP53 mutation carriers as a source cohort for the target LFS cohort, TL-SCP identifies biologically meaningful TP53 mutation clusters and offers improved clinical interpretability compared to experiment-based annotations.
Roded Sharan, Tel Aviv University
“Multi Modal Data Integration for Gene Representation Learning”
The data deluge in biology calls for computational approaches that can integrate multiple datasets of different types to build a holistic view of biological processes or structures of interest. An emerging paradigm in this domain is the unsupervised learning of data embeddings that can be used for downstream clustering and classification tasks. In this talk I will describe recent work in my group to integrate diverse data types for gene representation learning with applications to gene module detection and gene function prediction.
Peter Van Loo, MD Anderson
“Molecular Archeology of Cancer”
The cancer genome carries an archeological record of the tumor’s past. Over the past years, we have developed several approaches to mine that archeological record, which we collectively call 'molecular archeology of cancer'. Using these approaches, we are able to infer the subclonal architecture and evolutionary history of tumors. We applied these approaches in a large-scale pan-cancer setting, showing that intra-tumor heterogeneity is pervasive across cancers and that the timelines of tumor evolution span multiple years to decades. Key driver events in tumor evolution typically occur early, and copy number gains often accumulate as punctuated bursts, commonly after genome doubling. Late genome doubling is frequent in cancer evolution and is typically followed by an increase in chromosomal instability. Our approaches increase the evolutionary information that can be obtained from tumor genome sequences and, therefore, improve our understanding of the developmental history of cancer.
Yuichi Shiraishi, National Cancer Center Japan
"Centromere Variation and Its Role in Cancer Chromosomal Rearrangements"
Large-scale pangenome efforts, including the Human Pangenome Reference Consortium (HPRC) and the Telomere-to-Telomere (T2T) Consortium, are producing an expanding collection of near-complete genome assemblies. For the first time, these resources have resolved highly repetitive and structurally complex regions that were largely inaccessible to short-read sequencing, including centromeres, segmental duplications, and satellite arrays. This progress is now providing new opportunities in cancer genomics, where structural rearrangements involving these difficult-to-analyze regions are frequently observed yet remain poorly characterized.
As an example of the application of pangenome resources, we introduce a k-mer-based computational framework, ascairn, to infer centromere haplotypes from short-read whole-genome sequencing (WGS) data. We applied ascairn to investigate the genomic structure underlying the 1p/19q co-deletion, a highly recurrent centromere-involving translocation in oligodendrogliomas. Analyzing short-read WGS data from 142 cases with 1p/19q co-deletion using rare k-mers, we showed that the breakpoints of the 1p/19q co-deletion map to aHOR arrays on chromosome 1 (D1Z7) and chromosome 19 (D19Z3), with a clear positional relationship to kinetochore attachment sites; this finding was validated by long-read sequencing of two 1p/19q co-deletion–positive cases. We also demonstrate that ascairn can be effectively used for interrogating the diversity of centromere structures and their geographic distributions across populations.
As pangenome resources expand, they will enable systematic analysis of previously inaccessible repetitive regions in large cancer cohorts, revealing mechanisms of chromosomal rearrangements.
Uthsav Chitra, Johns Hopkins University
“Mapping the geometry of spatial gene expression”
Recent spatial transcriptomics (ST) technologies make high-throughput measurements of gene expression at thousands of locations in a 2-D tissue slice. However, due to cost and technological limitations, these measurements are highly sparse—thus complicating the identification and analysis of spatial gene expression patterns, particularly in spatially heterogeneous tissues such as tumors. In this talk, I will present machine learning approaches that overcome these limitations by modeling the underlying geometry of a 2-D tissue slice. First, I will present GASTON and GASTON-Mix, unsupervised and interpretable deep learning algorithms which learn "topographic maps" of a 2-D tissue slice. Then I will present SLOPER, an algorithm which leverages point processes and score matching to learn spatial gradient vector fields that characterize spatial variation in the expression of individual genes. I will show how our algorithms uncover subtle spatial gene expression patterns across tumors and other biological systems.
Ben Raphael, Princeton University
Details of the talk will be posted soon
Mona Singh, Princeton University
Details of the talk will be posted soon
Mohammed El Kebir, University of Illinois at Urbana-Champaign
“GReinSS: Generative Modeling via Reinforcement Learning for Latent Structured States”
Many scientific problems require inferring unobserved mechanistic latent states from indirect observations. While classical approaches, including expectation-maximization, do not scale to combinatorially large spaces, deep learning approaches such as variational autoencoders typically form artificial latent states rather than reconstructing the mechanistic ground-truth states. Here, we introduce GReinSS, a policy learning framework that uses dynamically rescaled rewards to learn latent state distributions that maximize the observed data likelihood. We show that GReinSS accurately reconstructs simulated latent sets and latent graphs, outperforming alternative policy learning and generative modeling baselines. Additionally, GReinSS reconstructs isoforms from real short-read RNA sequencing data that better match orthogonal long-read sequencing detected isoforms than the standard RSEM algorithm. Overall, GReinSS is a principled and practically effective approach for generative modeling and inference of combinatorial latent states from indirect observations.
Gregoire Altan-Bonnet, NCI
"Decoding immune response using the latent geometry of T cells’ activation trajectories: from bulk data to single-cell resolution"
Immune responses arise from the coordinated behavior of diverse leukocytes responding to complex tissue environments. How heterogeneous and dynamic activation states are integrated into coherent functional immune programs remains poorly understood. In this lecture, I will discuss how we built and leveraged the IMMUNOtron (a high-throughput lab automation platform for scalable immune profiling, combined with machine learning) to show that stochastic T cell activation states contain sufficient structure to predict antigen identity, functional outputs, and antagonistic immune responses across immunological environments. Critically, our framework I can be generalized: it is applicable across cell types, antigenic contexts, and experimental systems, establishing a broadly deployable strategy for decoding immune information at single-cell resolution. Our analysis reveals a time-dependent combinatorial code in which a subset of markers encodes a highly-resolved continuous pattern of activation, organized along a 1D interpretable manifold that captures the hierarchical nature of antigen discrimination by T cells. This latent geometry is robust across computational settings and experimental conditions, enabling alignment of datasets and direct comparison of equivalent activation states under ligand mixtures, including antagonistic combinations. Modes inferred from single-cell embeddings quantitatively match independent collective cytokine measurements and further identify single-cell signatures of immune antagonism (collaboration with Paul François’s group - Université de Montréal & MILA). Together, these results show how an apparently digital self/non-self decision can coexist with graded, high-dimensional ligand discrimination within a low-dimensional manifold, and demonstrate how machine learning approaches (deployed at scale through laboratory automation) can decode the structure of immune information processing from single-cell data to bulk responses, across diverse immunological settings.
Justin Zook, NIST
Details of the talk will be posted soon
Misha Kolmogorov, CDSL, NCI
The power of somatic and germline variant phasing to improve reconstruction of cancer genome architecture
A common signature of cancer genomes is a complex, rearranged karyotype, characterized by acquired gains or losses of chromosomal material, referred to as somatic copy number alterations (CNAs). Identification of haplotype-specific CNAs from bulk sequencing data is a key step in many short-read cancer genomic workflows; however, short reads have a limited phasing range. In contrast, long reads can directly phase genomic variants into contiguous haplotypes.
In this tutorial, I will present several principles how cancer genomic analysis can be enhanced using various types of phasing, from population panels to direct phasing of both germline and somatic variants with long reads. I will Illustrate the applications in haplotype-specific CNA profiling, mutational timing, clonal deconvolution and reconstruction of complex SV events.
Cenk Sahinalp, CDSL, NCI
Details of the talk will be posted soon
Salem Malikic, CDSL, NCI
Details of the talk will be posted soon
Teresa Przytycka, NCBI, NLM
Details of the talk will be posted soon
Vishaka Gopalan, CDSL, NCI
Details of the talk will be posted soon
Eytan Ruppin, Cedars-Sinai Medical Center
"Inferring Tissue Omics Without Sequencing at both bulk and spatial resolution"
Our understanding of cancer and other human disease has made significant advances in recent years, mainly based on quite costly and time/labor intensive DNA and RNA sequencing. I will start by briefly describing published approaches for fast and low-cost inference of bulk tissue omics data from the ubiquitous tumor pathology H&E slides, with translational applications in precision oncology. Those include (1) DeepPT-Enlight/Path2Omics: inferring bulk tumor transcriptomics and patient response [Nat Cancer 2024, Cancer Research 2025] and (2) TIME-ACT: inferring the immune activation of the tumor microenvironment (TME ‘hotness’) and predicting checkpoint immunotherapy response [bioRxiv 2025]. I will devote the main part of my talk to describing more recent approaches for inferring omics data at a spatial resolution, including (3) Path2Space: inferring spatial transcriptomics to identify spatially grounded biomarkers of treatment response [Cell, 2026], and finally, (4) Path2Marker and Path2Cell: inferring protein biomarkers and annotating cells at single cell resolution [ongoing work]. Taken together, these approaches lay the basis for democratizing and accelerating medical and translational research in the next few years.
Sridhar Hannenhalli, CDSL, NCI
Context Specificity of Biological Functions
Effect of a genetic mutation can be highly tissue-specific. Think BRCA mutation. At some level, all biological phenotypes are emergent properties of interacting parts. This issue of context-specificity undergirds all biological investigations. Drawing on our and others’ works over the years we will discuss this important, annoying, and under-appreciated issue.
Ben Greenbaum, Memorial Sloan Kettering Cancer Center
Details of the talk will be posted soon
Peng Jiang, CDSL, NCI
"Data-Driven Discovery of Secreted Proteins as Cancer Immunotherapies"
My research focuses on developing data-integration and artificial intelligence frameworks to study intercellular signaling mediated by secreted proteins in antitumor immunity. Data-driven analyses estimate that about two thousand human genes encode secreted proteins. Yet, our literature mining revealed that 61% of these genes lack known roles in cancer. To address this gap, we develop computational methods and apply diverse immunological models to dissect cytokine networks, secreted proteins, and ligand–receptor interactions in cancer. Ultimately, our goal is to uncover new mechanisms of immune regulation and identify therapeutic opportunities that harness intercellular communication against tumors.
Erin Molloy, University of Maryland
Establishing Statistical Guarantees for Cell Lineage Tree Reconstruction via Triplets and Quartets
Cell lineage tree reconstruction is often formulated as a combinatorial optimization problem. But under what conditions can we recover the true lineage tree with high probability, and how much data is required? This lecture presents tools for addressing these fundamental questions focusing on two common models for cell lineage tracing. We show how to derive statistical consistency and sample complexity guarantees for methods based on triplets (rooted three-leaf induced subtrees) and quartets (unrooted four-leaf induced subtrees). Along the way, we highlight recent statistical advances for cell lineage tracing as well as their implications for algorithm design and outstanding open questions from both the theoretical and practical perspectives.
Lichun Ma, CDSL, NCI
Resolving the Spatial Organization of Cellular Communities in Liver Cancer
Liver tumors comprise a complex ecosystem of malignant, stromal, and immune cells whose spatial organization and interactions shape tumor progression, heterogeneity, and therapeutic response. Recent advances in spatial technologies enable high-resolution characterization of cellular states and neighborhoods within intact tissues. Here, we apply these approaches to define the spatial landscape of liver cancer, with the aim of uncovering the cellular and molecular mechanisms driving tumor heterogeneity and evolution, ultimately informing the development of more effective precision therapies.
SSACB Organizing Committee:
Mohammed El Kebir (University of Illinois Urbana-Champaign)
Vishaka Gopalan (NCI)
Mikhail Kolmogorov (NCI)
Salem Malikic (NCI)
Teresa Przytycka (NLM)
Ben Raphael (Princeton)
Cenk Sahinalp (NCI)
Mona Singh (Princeton)