Confirmed Faculty

‪Elham Azizi, Columbia

"Machine Learning Cellular Dynamics in the Tumor Microenvironment"

Despite the clinical successes of cancer immunotherapies such as immune checkpoint blockade (ICB) and adoptive cellular therapies, we lack a clear understanding of the specific role and characteristics of tumor-infiltrating immune cells in the local microenvironment. Studying how intra-tumoral immune populations coordinate to generate anti-tumor responses can guide precise treatment prioritization. Recent genomic technologies that measure cell features at the resolution of single cells or in a spatially-resolved manner, present exciting opportunities to study the heterogeneity of cells and characterize complex interactions in the tumor microenvironment (TME). However, analyzing and integrating these data types in particular in complex patient specimens involves significant statistical and computational challenges. I will present a set of statistical machine learning methods developed to infer temporal and spatial dynamics of cells in the TME. I will show their application in characterization of spatial dynamics in aggressive metaplastic breast cancer revealing metabolic reprogramming shaping immunosuppressive niches. Additionally, I will show a systematic dissection of coordinated immune cell networks in an established adoptive cellular therapy, donor lymphocyte infusion (DLI) in relapsed leukemia.

Bio:

Elham joined Columbia University in 2020 as the Herbert and Florence Irving Assistant Professor of Cancer Data Research (in the Irving Institute for Cancer Dynamics) and Assistant Professor of Biomedical Engineering. She is also affiliated with the Department of Computer Science, Data Science Institute, and the Herbert Irving Comprehensive Cancer Center. Elham holds a BSc in Electrical Engineering from Sharif University of Technology, an MSc in Electrical Engineering and a PhD in Bioinformatics from Boston University. She was a postdoctoral fellow in the Dana Pe'er Lab at Columbia University and Memorial Sloan Kettering Cancer Center. Her multidisciplinary research utilizes novel machine learning techniques and single-cell genomic and imaging technologies to study the dynamics and circuitry of interacting cells in the tumor microenvironment. She is a recipient of the NYAS/Takeda Early-Career Innovator in Science Award, CZI Science Diversity Leadership Award, NSF CAREER Award, Tri-Institutional Breakout Prize for Junior Investigators, NIH NCI Pathway to Independence Award, American Cancer Society Postdoctoral Fellowship, and IBM Best Paper Award at the New England Statistics Symposium.

Niko Beerenwinkel, ETH Zurich

"Computational analysis of tumor single-cell sequencing data"

Cancer progression is an evolutionary process characterized by the accumulation of genetic alterations and responsible for tumor growth, clinical progression, and drug resistance development. We discuss how to reconstruct the evolutionary history of a tumor from single-cell sequencing data and present probabilistic models and efficient inference algorithms for mutation calling and learning tumor phylogenies from mutation and copy number data. We present methods for integrating single-cell DNA and RNA data obtained from tumor biopsies, for detecting signatures of selection pressure in single-cell samples, and for finding common patterns of tumor evolution among patients, including re-occurring evolutionary trajectories and clonally exclusive mutations.

Bio:

Niko Beerenwinkel is full professor of computational biology at the Department of Biosystems Science and Engineering of ETH Zurich in Basel. His research is at the interface of mathematics, statistics, and computer science with biology and medicine. It includes the development of statistical models for high-throughput molecular profiling data, network-based analysis of genome-wide perturbation screens, evolutionary modeling, and clinical applications in oncology and infectious diseases. He has developed computational pipelines for molecular tumor boards, including methods for the analysis of single-cell sequencing data, and algorithms and tools for mining viral genomes and improving clinical diagnostics of viruses.

Mohammed El Kebir, UIUC

“Advanced Integer Linear Programming in Cancer Phylogenetics”

Many optimization problems in computational biology are NP-hard. This holds true for cancer phylogenetics as well. Due to advances in (commercial) integer linear programming (ILP) solvers, one is often able to solve practical problem instances to optimality. This tutorial will discuss advanced techniques for modeling optimization problems using ILP, including cutting planes, column generation and piecewise linear approximations of convex functions.

Bio:

Dr. Mohammed El-Kebir is an Assistant Professor of Computer Science at the University of Illinois Urbana-Champaign. Prior to joining Illinois, he was a postdoctoral research associate at Princeton University and Brown University. He received his PhD in 2015 at Centrum Wiskunde & Informatica (CWI) and VU University Amsterdam in the Netherlands. His research is in combinatorial optimization algorithms for problems in computational biology, with a particular focus on cancer evolution.

Funda Ergun, Indiana U

"An Algorithmic Approach to Understanding Tumor Evolution from Single Cell Sequencing Data"

Cancer is an ever-evolving disease, demonstrated by the high variety of mutations exhibited by different cells within a tumor. Advances in sequencing technologies have improved our understanding of this heterogeneity within tumors; yet our ability for tracing the structure of the evolution of mutations in cells is limited by the sheer size and the inaccuracy of our data. The goal in our research is to employ mathematical techniques in order to overcome these challenges.

In particular, in this talk, we will discuss techniques towards discovering the tumor evolution tree underlying large amounts of noisy mutation information in a tumor. Our initial efforts are hampered by our inability to accurately access the ground truth, as all observations are subject to probabilistic noise. To compensate, we treat the ground truth as a probability distribution over tumor evolution trees, restricted by certain properties representing the possible real-life structure of the data. Unfortunately, the subspace of such trees is so small that we cannot expect to observe them by sampling the cell/mutation space directly. We instead show how to devise smart strategies for generating intermediate distributions for efficiently sampling and analyzing this subspace. We ultimately come up with provable inferences about the underlying tree structure of tumor evolution, which can be used to make conclusions about disease traits.

Bio:

Funda Ergun is a professor or Computer Science at Indiana University. Her interests like in randomized algorithms, sublinear algorithms, and big data algorithmics with applications to fields such as bioinformatics, computer networks/data centers, and cloud computing. She has received her PhD from Cornell University, and has previously been employed by Bell Laboratories, Case Western Reserve University, and Simon Fraser University.

Elana Fertig, JHU

Coming soon

Vishaka Gopalan, CDSL, NCI

"Exploring non-genetic cellular memory through single-cell RNA-sequencing"

Our collective and decades-long study of genetic basis of cancer initiation and progression has not only deepened our understanding of cancer evolution but also yielded a wealth of clinically actionable insights. Over the same time frame, progress in cell and developmental biology has shown the existence of forms of cellular memory and heredity that is not mediated by DNA and its modifications. The latter helps us understand curious and clinically relevant phenomena in cancer progression that are not readily answerable in a purely genetic view of cancer progression.

I will summarize key concepts in cellular states and memory and then proceed to review and explore some landmark studies that utilize single-cell RNA-sequencing to explore non-genetic memory in tumors and other tissues.

Bio:

Vishaka Gopalan is a postdoctoral fellow with Dr. Sridhar Hannenhalli at the Cancer Data Science Laboratory. His research interests are in developing methods for single-cell RNA-seq data to uncover cell-intrinsic and cell-extrinsic regulators of cellular state transitions in cancer and development. He completed his doctoral training in biology at the National Center for Biological Sciences, India in 2019 and obtained his Bachelors and Masters in Statistics and Informatics from the Indian Institute of Technology, Kharagpur, in 2012.

Sridhar Hannenhalli, CDSL, NCI

"Identifying functional non-coding variants at multiple biological scales"

While most (>95%) of the variants and somatic mutations in the human genome reside in the non-coding portion of the genome, it is challenging to assess their functional significance due to incomplete mechanistic understanding of their impact. We will discuss our recent success with applying deep learning to identify mutations in the human lineage creating novel brain developmental enhancers potentially underlying evolution of cognition and cognitive disorders. We will present our current efforts to apply such modeling approach to identify non-coding polymorphisms potentially explaining differential incidence of prostate cancer in African populations, as well investigating somatic mutations in esophageal cancer. Time permitting, on a different topic, we will discuss evolutionarily conserved cellular responses to drugs that may underlie the long-term drug resistance in cancer.

Bio:

Dr. Hannenhalli obtained a B. Tech from the Indian Institute of Technology (1990) and his Ph.D. in Computer Science from the Pennsylvania State University (1995). After a postdoctoral fellowship at the University of Southern California (1996-1997), he worked as a Senior Scientist at Glaxo Smith-Kline (1997-2000) and then at Celera Genomics (2000-2003), where he was involved in the work reporting the first human genome sequence. He was a faculty member in the Department of Genetics at the University of Pennsylvania (2003-2010), and then at the University of Maryland (2010-2019). Dr. Hannenhalli served as Interim Director of the Center for Bioinformatics and Computational Biology, and the director of Computational Biology Ph.D. program at UMD. He was a Fulbright Scholar (2017-2018) and was a visiting faculty at the Indian Institute of Sciences, and National Center for Biological Sciences. Since 2019 he has been the head of Gene regulation section in the Cancer Data Science Lab at the NCI. The Hannenhalli lab is broadly interested in developing computational approaches harnessing multi-omics data to understand regulatory underpinning of cancer.

Peng Jiang, CDSL, NCI

"Estimation of cell lineages in tumors from spatial transcriptomics data"

Spatial transcriptomics (ST) technology through in situ capturing has enabled topographical gene expression profiling of tumor tissues. However, each capturing spot may contain diverse immune and malignant cells, with different cell densities across tissue regions. Cell type deconvolution in tumor ST data remains challenging for existing methods designed to decompose general ST or bulk tumor data. We develop the Spatial Cellular Estimator for Tumors (SpaCET) to infer cell identities from tumor ST data. SpaCET first estimates cancer cell abundance by integrating a gene pattern dictionary of copy number alterations and expression changes in common malignancies. A constrained regression model then calibrates local cell densities and determines immune and stromal cell lineage fractions. SpaCET provides higher accuracy than existing methods based on simulation and real ST data with matched double-blind histopathology annotations as ground truth. Further, coupling cell fractions with ligand-receptor coexpression analysis, SpaCET reveals how intercellular interactions at the tumor-immune interface promote cancer progression.

Bio:

Dr. Peng Jiang started his research program at the National Cancer Institute (NCI) in July 2019. His Lab focuses on developing big-data and artificial intelligence frameworks to identify biomarkers and new therapeutic approaches for cancer immunotherapies in solid tumors. Before joining NCI, he finished his postdoctoral training at the Dana Farber Cancer Institute and Harvard University. During his postdoctoral research, Peng developed computational frameworks that repurposed public domain data to identify biomarkers and regulators of cancer immunotherapy resistance. Notably, his computational model TIDE revealed that cancer cells could utilize the self-protection strategy of cytotoxic lymphocytes to resist lymphocyte killing under immune checkpoint blockade. Dr. Peng finished his Ph.D. at the Department of Computer Science & Lewis Sigler Genomics Institute at Princeton University, and his undergraduate study with the highest national honors at the Department of Computer Science at Tsinghua University (GPA rank 1st in his year). He is a recipient of the NCI K99 Pathway to Independence Award, the Scholar-In-Training Award of the American Association of Cancer Research, and the Technology Innovation Award of the Cancer Research Institute.

Rachel Karchin, JHU

"Benchmarking the Future: Evaluating Classifiers and Predictors in Computational Oncology".

Users employing computational approaches for oncology prediction tasks urgently need a solid framework to assess and compare different methods prior to application on research and clinical questions. Unlike the machine learning domain, where established benchmark datasets and consistent progress-tracking through well-specified metrics are the norm, the field of computational oncology lacks standardized benchmarks. Yet manuscript publication requires demonstration of a method's wide applicability and a clear advantage over existing methods. Consequently, authors often devise their own benchmarks and select a set of methods for comparison that inherently favor their own approach. This scenario complicates quantitative measurement and dissemination of advancements within the field. I will explore the achievements and setbacks of open community challenges intended to navigate these obstacles. Additionally, I will highlight how the absence of standardization affects cancer driver mutation and gene prediction methods development. To conclude, I will offer recommendations on how to address these issues moving forward, aiming to foster more transparent and meaningful progress in computational oncology.

Bio:

Rachel Karchin, Ph.D. is Professor in the Department of Biomedical Engineering at Johns Hopkins University. She received a Ph.D. in Computer Science from the University of California, Santa Cruz in 2003, spent three years as a postdoctoral fellow in the Department of Biopharmaceutical Sciences at University of California, San Francisco, and joined the Hopkins faculty in 2006. Dr. Karchin has a joint appointment in the Department of Oncology, a secondary appointment in the Department of Computer Science, and is a core member of the Institute for Computational Medicine at the Whiting School of Engineering. Her lab develops algorithms and tools to interpret and model molecular sequence data, with a focus on how tumors evolve and how they interact with the immune system. In 2017, she was inducted into the College of Fellows of the American Institute for Medical and Biological Engineering for her contributions to translational computational biology.

Misha Kolmogorov, CDSL, NCI

"Accurate detection and characterization of somatic structural variation in tumor genomes using long reads"

Abstract: Most current studies rely on short-read sequencing to detect somatic structural variation (SV) in cancer genomes. Long-read sequencing offers the advantage of better mappability and long-range phasing, which results in substantial improvements in germline SV detection. However, current long-read SV detection methods do not generalize well to the analysis of somatic SVs in tumor genomes with complex rearrangements, heterogeneity, and aneuploidy. Here, we present Severus: a method for the accurate detection of different types of somatic SVs using a phased breakpoint graph approach. To benchmark various short- and long-read SV detection methods, we sequenced five tumor/normal cell line pairs with Illumina, Nanopore, and PacBio sequencing platforms; Severus showed the highest F1 scores (harmonic mean of the precision and recall) as compared to long-read and short-read methods on this benchmark. We then applied Severus to three clinical cases of pediatric cancer, demonstrating concordance with known genetic findings as well as revealing clinically relevant cryptic rearrangements missed by standard genomic panels.

Bio:

Before joining the Cancer Data Science Laboratory in January 2022, Mikhail was a postdoctoral fellow at the University of California (UC) Santa Cruz, supervised by Dr. Benedict Paten. Prior to that, he was a postdoctoral fellow at the UC San Diego, co-supervised by Dr. Rob Knight and Dr. Pavel Pevzner. Mikhail completed his Ph.D. in September 2019 in Computer Science from UC San Diego, under the mentorship of Dr. Pavel Pevzner.

Jens Lagergren, Stockholm

"Variational Inference of Somatic Evolution in Cancer"

In this talk, we will consider the evolving landscape of variational inference (VI) based methods for phylogenetic analysis, a domain gaining prominence in machine learning and computational biology, e.g., in studying cancer development from single-cell data. Probabilistic phylogenetic inference is a machine-learning problem, and the key to unlocking this crucial application can be expected to be found among the machine-learning methodologies. In particular, VI is a strong candidate, considering the general capacity of the VI methodology to deliver impressive performance gains for Bayesian inference, e.g., compared to MCMC.

First, we introduce VaiPhy, a novel VI-based algorithm designed for rapid and efficient approximate posterior inference in augmented tree space. Moreover, we will briefly discuss the emerging field of VI-based phylogenetic inference. Next, we focus on VICTree, another novel variational inference-based algorithm. VICTree addresses the complexities in analyzing tumor heterogeneity and cancer evolution in the context of single-cell sequencing. It incorporates a Tree-structured Mixture Hidden Markov Model (TSMHMM) for modeling copy number evolution, effectively handling the inherent noise and dependencies in copy number profiles. By examining VICTree's application to multiple myeloma and breast cancer samples, we will highlight its capabilities in reliable clustering, clonal tree reconstruction, and copy number evolution analysis, along with its advantages in quality and speed of inference compared to other methods.

Bio:

Professor Jens Lagergren belongs to the EECS school at KTH, where he teaches courses in Machine Learning and AI. He is physically located at SciLifeLab’s Stockholm site (www.scilifelab.se) and, moreover, affiliated with the Digital Futures at KTH (www.digitalfutures.kth.se) and the national WASP program (wasp-sweden.org). Jens Lagergren’s main interest is in evolutionary models that include events affecting entire genes in phylogenetic inference, such as gene duplication, gene loss, and lateral gene transfer. He has recently focused on somatic evolution in cancer and methodological questions in artificial intelligence (AI) and machine learning.

Lichun Ma, CDSL, NCI

"Spatial single-cell dissection of cellular neighborhoods in liver cancer"

The development of liver cancer involves an intricate interplay among various cell types within the liver. Unraveling the orchestration of these cells may hold the key to deciphering the underlying mechanisms of this complex disease. The advancement of single-cell and spatial technologies has revolutionized our ability to determine cellular neighborhoods and understand their crucial roles in disease pathogenesis. Here, we apply these approaches to determine the landscape on cellular neighborhoods in liver cancer, which may offer insights into the molecular mechanisms underlying tumor heterogeneity and tumor evolution and pave the way for effective therapeutic interventions.

Bio

Dr. Ma received her Ph.D. degree in Electronic Engineering at the City University of Hong Kong in 2016. After a one-year postdoctoral fellowship at Nanyang Technological University, Singapore, she joined NCI in 2017 as a postdoctoral fellow where she studied cancer biology using single-cell techniques. She initiated her independent research program at NCI as a Stadtman Investigator in 2022. Dr. Ma has a strong background in mathematics, information theory and machine learning. She received many awards during her training, including the NCI CCR Excellence in Postdoctoral Research Transition award. Her recent work on tumor cell biodiversity and microenvironmental reprogramming in liver cancer was showcased in the 2019-2020 NCI Center for Cancer Research Milestones publication.

Jian Ma, CMU

"Single-cell 3D genome organization"

The organization of the 3D genome is deeply interconnected with critical genomic functions, such as gene transcription and DNA replication. Unraveling the structure and function of the 3D genome at single-cell resolution presents a significant challenge due to the complexity of identifying 3D genome features and their variability. In this talk, I will discuss our latest work on developing representation learning approaches for single-cell 3D epigenomics, which have the potential to provide new understanding of fundamental genome structures and cellular functions in various biological contexts.

Bio:

Jian Ma is the Ray and Stephanie Lane Professor of Computational Biology at the School of Computer Science at Carnegie Mellon University. His work focuses on developing computational methods to study the structure and function of the human genome and cellular organization and their implications for health and disease.

Layla Oesper, Carleton College

"Tumor Evolution: Methods for Comparing and Summarizing Clonal Trees"

Tumors evolve as part of an evolutionary process where distinct sets of genomic mutations accumulate in different cell lineages descending from an original founder cell. A better understanding of how such tumor lineages evolve over time, which mutations occur together or separately, and in what order these mutations were gained may yield important insight into cancer and how to treat it. Thus, in recent years there has been an increased interest in computationally inferring the evolutionary history of a tumor – that is, a rooted tree where vertices represent populations of cells that have a unique complement of somatic mutations and edges that represent ancestral relationships between these populations. However, accurately inferring these trees is often a challenging process. In this research talk, I will discuss several methods designed in my lab that address issues related to the inference of tumor evolution. This includes methods to compare these trees that take into account the unique structure of tumor evolution and methods that are able to create a consensus tree from a set of conflicting tumor evolutionary histories.

Bio:

Layla Oesper is an Associate Professor of Computer Science at Carleton College in Northfield, MN. Dr. Oesper received her B.A. in mathematics from Pomona College and her Sc.M and Ph.D. in Computer Science from Brown University. Dr. Oesper is also the recipient of NSF CRII and CAREER Awards. Her lab focuses on the design of computational methods related to inference and analysis of cancer evolution.

Erez Persi, NCBI, NLM

"Tumor evolution under therapy and harsh variable conditions"

Tumor evolution is shaped by selective pressures imposed by the microenvironment as the tumor progresses and colonizes local and distant tissues, as well as by therapy. However, the difference between these two types of pressures and how they shape the course of tumor evolution remain elusive, mainly due to large intra-tumor heterogeneity and data availability. Here, to address this question and disentangle the effects of these pressures, we analyze diverse datasets of patients, with at least two samples per patient, of both treated and untreated cancers. We find that the selection strength on tumor genomes has a wide distribution across patients, yet the selection value in an individual is highly stable and invariant to tumor progression and therapy. Further, despite this “invariance”, we identify a significant bias toward neutral evolution following treatment failure, whereby tumors evolve resistance. We demonstrate that this bias is associated with worse prognosis. The results are validated on both published and original datasets. We suggest that monitoring the selection value during treatment can assist decision-making in the clinics in a significant fraction of cases. In the last part of the talk I will demonstrate how a normal breast cell line can transform into a neoplastic one under harsh variable conditions, demonstrating the dynamics of selection values as the cells also evolve phenotypic traits.

Bio:

Erez is PhD in theoretical physics from The university of Paris, Pierre et Marie Curie (Paris 6), and obtained his M.Sc. and B.Sc. in physics from Tel-Aviv university. Erez is tuned to computational biology from the early stages, working on multi-disciplinary studies. His early research focused on developing theoretical models to explain mechanisms of neural information processing in the brain. After his PhD, Erez gradually turned his research focus to bioinformatics and evolutionary genomics, studying various questions in species and cancer evolution, using computational techniques, such as analytical and numerical modeling, data mining and machine learning applied to biological data, and developing sequence analysis tools. His current research on cancer evolution focuses on both fundamental and translational questions, from the population genetics perspective.

Victoria Popic, Broad

“Learning the signatures of germline and somatic structural variation in the genome”

Structural variants (SVs) are the exceptionally diverse set of large-scale genome rearrangements, encompassing mutations, such as deletions, insertions, inversions, duplications, translocations, and any complex combination thereof. Accounting for more base-pair differences across individuals than all other variant types combined, SVs are the greatest source of genetic diversity in the human genome and a key driver of disease. In particular, recent analysis of over 2,500 cancer genomes by the Pan-Cancer Analysis of Whole Genomes Consortium found that SVs are the most prevalent class of driver mutations in cancer, playing a key role in tumorigenesis, progression, metastasis, and resistance to therapies. However, general SV discovery still remains an open problem. In large part, this is due to the intractability of accurately modeling the full range of SV signatures using manually-engineered heuristic approaches, which have been predominantly developed and employed to date. In this talk, we will motivate the need for generalizable data-driven methods for SV discovery and introduce the deep learning method, Cue, for calling germline and somatic SVs. Cue casts SV discovery as a multi-class keypoint localization task in custom images derived from read alignments to a reference genome. By design, this formulation allows Cue to rapidly adapt to different sequencing data types and to combine inputs from multiple platforms, assays, and genomes into a shared coordinate system. To that end, we will show examples of how Cue can effectively learn signatures of different SV classes, leverage short and long reads, detect low-frequency subclonal SVs, and call somatic SVs from matched tumor-normal samples.

Bio:

Victoria Popic is a Schmidt Fellow and Principal Investigator at the Broad Institute of MIT and Harvard, where she leads a lab focused on the development of deep learning approaches for the characterization and interpretation of the genome and the mechanisms that drive disease. Popic earned her Ph.D. in computer science from Stanford University. She also holds a B.S. in computer science, a B.S. in mathematics, and an M.Eng. in computer science from MIT. Prior to the Broad, she spent several years working in industry, conducting research on DNA sequencing at Illumina and working on compilers at SambaNova Systems.

Teresa Przytycka, NCBI, NLM

"Delineating relation between mutagenic signatures, cellular processes, and environment through computational approaches"

Cancer genomes accumulate many somatic mutations resulting from carcinogenic exposures, cancer related aberrations of DNA maintenance machinery, and normal stochastic events. These processes often lead to distinctive patterns of mutations, called mutational signatures. However interpreting mutation patterns represented by such signatures is often challenging. This talk will focus on computational methods to elucidate the relations between mutational signatures and cellular and environmental processes contributed by my group.

Bio:

Teresa Przytycka is a Senior Principal Investigator at the National Center for Biotechnology Information in NLM. She received a Ph.D. in Computer Science from the University of British Columbia, Vancouver. Her group develops computational methods advancing the understanding of biomolecular systems including new computational approaches to study gene regulation, reconstruction of Gene Regulatory Networks, methods for single cell analysis, and network-based approaches to study mutational processes in cancer and drug response. She serves as an editor of serval computational biology journals including PloS Computational Biology, Bioinformatics, Algorithms for Molecular Biology, among other journals. She is a member of the steering committee of Research in Computational Molecular Biology (RECOMB) – a top algorithmic computational biology conference. In 2021 Przytycka was named a Fellow of the International Society for Computational Biology (ISCB).

Ben Raphael, Princeton

Talk, Abstract and Bio coming soon

Eytan Ruppin, CDSL, NCI

"Next generation precision oncology: a tale of three tales"

Precision oncology has made significant advances, mainly by targeting actionable mutations and fusion events involving cancer driver genes. Aiming to expand on that, I will describe three new approaches for predicting patients response to cancer treatments that have been recently developed in my lab. I will start by predicting response to checkpoint immunotherapy from simple routine lab tests and the tumor mutational burden. Second, I will describe an approach that considers tumor heterogeneity in predicting response, based on single cell RNA sequencing of the patients tumors. Thirdly, I will describe deep learning approaches for predicting patients response to cancer therapies and classifying tumors directly from tumor histopathological images without requiring any sequencing, offering exciting new precision oncology opportunities in underserved areas and the developing world. Finally, I will discuss the challenges lying ahead.

Bio:

Eytan Ruppin received his M.D. and Ph.D. from Tel-Aviv University where he has served as a professor of Computer Science & Medicine since 1995, conducting computational multi-disciplinary research spanning computational neuroscience, natural language processing, machine learning and systems biology. In 2014, he joined the University of Maryland as director of its center for bioinformatics and computational biology (CBCB). In 2018, he moved to the NCI where he founded and is chief of its Cancer Data Science Lab/Branch (CDSL). His research is focused on developing new computational approaches for advancing precision oncology, leading to a few ongoing innovative clinical trials. Eytan is a member of the editorial board of Molecular Systems Biology, a fellow of the International Society for Computational Biology (ISCB), a recipient of the NCI Director award for his work on precision oncology, and the DeLano award for computational biosciences from the American Society for Biochemistry and Molecular Biology for his work on synthetic lethality. He is a member of the SAB of GSK Oncology and a co-founder of a few precision medicine startup companies.

Yana Safonova, PSU

"Detection and Comparative Analysis of Immunoglobulin Genes in Vertebrate Genomes"

A central challenge faced by all organisms is defending themselves against pathogens, including those that are often rapidly evolving. Early in the lineage leading to jawed vertebrates, evolution devised an ingenious solution in the adaptive immune system – in which sets of germline immunoglobulin (IG) genes, collectively called the IG loci, undergo a process called V(D)J recombination that generates an immensely diverse collection of antibodies (antibody repertoire) with a potential to recognize a huge variety of pathogens. We have a remarkably limited understanding of what the IG loci, and the resulting antibodies look like for essentially all (non-model) species — these loci are among the parts of the genome left on the cutting-room floor when reference genomes are released. This is because until very recently, the IG loci had been nearly impossible to assemble as the structural complexity of the regions thwarted standard assemblers designed for short-read sequences. It is only with the advent of long-read sequencing platforms and specialized assembly algorithms over the last several years that researchers were able to reliably conduct population level sampling and variant curation. Recent studies pioneered techniques for estimating the IG gene content from existing genome assemblies and produced the first estimate of the number of V, D, and J genes in a phylogenetically diverse set of mammals. While these methods are a substantial advance, many challenges related to detection of highly diverged IG genes, IG gene verification, IG gene naming, and comparative analysis remain practically unaddressed. In this talk, we will present the state-of-the-art solutions to annotation and comparative analysis of highly diverged IG loci and discuss open immunogenomics questions.

Bio:

Yana Safonova is an Assistant Professor in the Computer Science and Engineering Department at Penn State University. Her research interests cover open problems in computational immunogenomics that include understanding successes and failures of adaptive immune responses, finding novel antibody drugs and studying evolution of adaptive immunity.

Yana Safonova received the B.S. and M.S. degrees in computer science from the Nizhny Novgorod State University in 2012 and the Ph.D. degree in bioinformatics from the Saint Petersburg State University in 2017. In 2017, she joined the Computer Science and Engineering Department at University of California, San Diego (UCSD) as a Postdoctoral Scholar. She was also affiliated with the Department of Biochemistry and Molecular Genetics at the University of Louisville School of Medicine as a Visiting Postdoctoral Fellow (2019-2021) and the Department of Computer Science at Johns Hopkins University as an Assistant Professor (2021-2023).

Dr. Safonova was awarded with Data Science Postdoctoral Fellowship (2017) by UCSD, Intersect Fellowship for Computational Scientists and Immunologists (2019) by the American Associations of Immunologists and selected as a participant of the Interstellar Initiative (2022). She is also a member of the UIUS Nomenclature-Reports Review Committee.

Cenk Sahinalp, CDSL, NCI

"Placement of Copy Number Aberration Impacted Single Nucleotide Variants in a Tumor Phylogeny"

Intra-tumor heterogeneity is a consequence of distinct subclones that emerge during cancer progression. These subclones are characterized by various types of somatic genomic aberrations, with Single Nucleotide Variants (SNVs) and Copy Number Aberrations (CNAs) being the most prominent. While single-cell sequencing provides powerful means for studying tumor progression, most tumor sequencing data is obtained through conventional bulk sequencing. Many existing methods for studying tumor progression from multi-region/multi-sample bulk sequencing data are based on the use of SNVs from genomic loci not affected by CNAs or are designed to handle a small number of SNVs via enumerating their possible copy number trees. The limited use of CNA impacted SNVs in the past was, at least in part, due to the lack of tools for characterizing the set of all copy number states present at genomic loci of interest. To address this need we have developed DETOPT, a combinatorial optimization approach for accurate tumor progression tree inference which places SNVs impacted by CNAs on trees of tumor progression with minimal distortion on their variant allele frequencies observed across the available

samples of a tumor. DETOPT provides more accurate tree placement of SNVs impacted by CNAs than the available alternatives, reports biologically plausible progression histories, and helps in identify gains/losses impacting clinically significant genes.

Bio:

S. Cenk Sahinalp has been a Senior Investigator at the Cancer Data Science Laboratory in CCR, NCI, NIH since 2019. Sahinalp completed his B.Sc. in Electrical Engineering at Bilkent University, Ankara, Turkey, and his Ph.D. in Computer Science at the University of Maryland, College Park. His 1997 Ph.D. thesis introduced the first work optimal parallel algorithm for suffix tree construction and the first linear time algorithm for pattern matching within sublinear edit distance. After a brief postdoctoral fellowship at Bell Labs, Murray Hill, he has worked as a Computer Science professor, most recently at Indiana University, Bloomington.

Sahinalp’s research has focused on combinatorial algorithms and data structures, primarily for strings/sequences, and their applications to biomolecular sequence analysis, especially in the context of cancer. In the past decade, his lab has developed several algorithmic methods for efficient and effective use of high-throughput sequencing data for better characterization of the structure, evolution, and heterogeneity of cancer genomes.

Russell Schwartz, CMU

"Optimization and Mathematical Programming for Clonal Evolution in Cancer"

This lecture will explore applications and tools for optimization in studies of somatic mutability and clonal lineages in cancers while teaching about techniques of mathematical programming widely used to solve them. We will explore some basic concepts in modeling optimization problems for cancer genomic data that allow us to express practical questions we might pose about cancer genomic data sets as computational optimization problems. We will then cover algorithmic principles behind solving these problems through mathematical programming, and particularly integer linear programming, by seeing how we can apply this basic class of methods to solve diverse problems in cancer genomic data analysis. We will illustrate the power and versatility of these tools through applications ranging from toy problems in somatic variation analysis through diverse complex real-world data inference problems in current studies of somatic evolution in cancers, including managing heterogeneous variant types, assorted forms of multiomic data integration, and working with longitudinal (“liquid biopsy”) data.

Bio:

Dr. Russell Schwartz received his B.S., M.Eng., and Ph.D. degrees from the Department of Electrical Engineering and Computer Science (EECS) at the Massachusetts Institute of Technology, the last in 2000. This was followed by postdoctoral work in the MIT Biology Department studying protein biophysics and research as an Informatics Research Scientist at Celera Genomics Corporation, where he was involved in some of the first efforts to sequence the human genome and study human genetic variation at a whole-genome scale. Since leaving Celera in 2002, he has been a faculty member of Carnegie Mellon University where he is currently Professor and Head of the Ray and Stephanie Lane Computational Biology Department and Professor of Biological Sciences with additional appointments in the Computer Science Department and Machine Learning Department. His laboratory has worked broadly on algorithms, machine learning, and simulation methods for computational genetics, genomics, and biophysics, with a current focus largely on computational cancer biology and somatic evolution. He is also active in bioinformatics education, largely through work with the International Society for Computational Biology (ISCB), where he currently serves as a Vice President of the Society and co-chair of its Education Community of Special Interest (COSI).

Roded Sharan, Tel Aviv U

"Multi modal data integration for gene representation learning"

The data deluge in biology calls for computational approaches that can integrate multiple datasets of different types to build a holistic view of biological processes or structures of interest. An emerging paradigm in this domain is the unsupervised learning of data embeddings that can be used for downstream clustering and classification tasks. In this talk I will describe recent work in my group to integrate diverse data types for gene representation learning with applications to gene module detection and gene function prediction.

Bio:

Roded Sharan is a Professor in the School of Computer Science at Tel Aviv University. Sharan's PhD studies at Tel Aviv University and later his postdoctoral training at UC Berkeley shaped his interests in bioinformatics and systems biology. At the end of his postdoctoral training, he was offered a Senior Lecturer position at Tel Aviv University to which he returned as an Alon fellow. Additional awards he obtained include the Krill prize of the Wolf Foundation, Best Paper award in the RECOMB'10 conference, Test of time awards in RECOMB'16, RECOMB'17 and RECOMB'20, the Thomson-Reuters highly cited researcher award, the Kadar prize for excellence in research and an ISCB fellow award . Currently, Sharan heads a research group that specializes in the analysis of biological networks and mutational signatures and their applications to medicine.

Mona Singh, Princeton

"Discovering links between cancer ‘omics data: mutations, transcriptomics, and metabolomics"

Bio:

Mona Singh is the Wang Family Professor in Computer Science at Princeton University. She has been on the faculty at Princeton since 1999, and is jointly appointed in the Computer Science department and the Lewis-Sigler Institute for Integrative Genomics. Mona obtained her AB and SM degrees at Harvard University, and her PhD at MIT, all three in Computer Science. She received the Presidential Early Career Award for Scientists and Engineers (PECASE), and is a Fellow of the International Society for Computational Biology and a Fellow of the Association for Computing Machinery. She is Editor-In-Chief of the Journal of Computational Biology. She has been program committee chair for several major computational biology conferences, including ISMB (2010), WABI (2010), ACM-BCB (2012), and RECOMB (2016). She has been Chair of the NIH Modeling and Analysis of Biological Systems Study Section (2012-2014), is a council member of the Computing Community Consortium, and is on the steering committee for WABI.

Wenyi Wang, MD Anderson

"Cancer risk modeling for deleterious mutations in TP53 using a multi-center consortium"

Sophisticated risk prediction modeling has greatly improved screening and testing for inheritable cancer syndromes such as BRCA1/2 mutations in breast cancer. Such a quantitative risk prediction model was urgently needed for the early detection of the Li-Fraumeni syndrome (LFS) following the demonstration of reduced mortality with surveillance testing for that syndrome. LFS primarily arises from germline mutations in the TP53 tumor suppressor gene and is characterized by cancer that occurs relatively early in life, often repeatedly over a lifetime, and which affects multiple sites that overlap with those of other cancer syndromes, in particular the hereditary breast and ovarian cancer syndrome. Over the past 12 years, we have developed a series of statistical models and software tool LFSPRO to further the understanding of LFS. We have also disseminated our software tool in cancer genetic clinics, such as the MD Anderson Clinical Cancer Genetics Program, to predict who, including index case and family members, may benefit from LFS cancer screening for multiple organ/tissue sites. This talk will give an overview of the our most recently encountered statistical challenges and the corresponding solution, followed by a multi-center validation and software tool update and dissemination. Clinical tools based on statistical risk prediction modeling, similar to what is used for BRCA 1/2 mutations, are needed for LFS. We will discuss our efforts in creating a clinical tool to fill this need as well as the challenges for implementing this tool into clinical practice.

Bio:

Dr. Wenyi Wang is a Professor of Bioinformatics and Computational Biology and Biostatistics at the University of Texas MD Anderson Cancer Center. She received her PhD from Johns Hopkins University and performed postdoctoral training in statistical genomics at UC Berkeley with Terry Speed and genome technology at Stanford with Ron Davis. Wenyi's research includes significant contributions to statistical bioinformatics in cancer, including MuSE for subclonal mutation calling, DeMixT for transcriptome deconvolution, Famdenovo for de novo mutation identification, and more recently a pan-cancer biomarker identification through integrative deconvolution of transcriptomic/genomic data. Her group is focused on the development and application of computational methods to study the evolution of the human genome as well as the cancer genome, and further develop risk prediction models to accelerate the translation of biological findings to clinical practice.

SEE PREVIOUS YEAR'S TALKS at https://ncifrederick.cancer.gov/events/conferences/nci-spring-school-algorithmic-cancer-biology/page-a