Vineet Bafna, UCSD
"Extrachromosomal DNA and the breakage fusion bridge: two avenues for focal amplification in cancer"
Oncogene amplification, which is commonly mediated by localized amplification of genomic segments, is a significant cause of cancer pathogenicity. Amplification mediated by Extrachromosomal DNA (ecDNA) drives tumor evolution, treatment resistance, and poor outcomes for patients.
In this talk, we ask if ecDNA is a later manifestation of genomic instability, or whether it can be an early event in the transition from dysplasia to cancer. To answer this question, we analyzed whole-genome sequencing (WGS) data from patients with esophageal carcinoma (EAC) or Barrett's esophagus (BE) from esophageal biopsies collected in an independent, prospectively collected, case-control study conducted at the Fred Hutchinson Cancer Center (FHCC) of patients with BE, and independently in a cross-sectional cohort of BE patients. Our findings demonstrate that ecDNA can develop early in the transition from high-grade dysplasia to cancer, and that ecDNAs progressively form and evolve under positive selection.
In the second part of the talk, we address focal amplifications that are not mediated by ecDNA. One important and relatively understudied mechanism is the Breakage-fusion-bridge (BFB) cycle. Detecting BFB driven amplifications and the fine structure of the amplicon from short read sequencing remains challenging. In this study, we used Optical Mapping technology which provides physical maps of DNA molecules (~ 200kbp in length) that, can be assembled into ultra-long OM assemblies (N50 ~ 50 Mbp) and detects different types of SVs. We developed a method named OM2BFB, which uses OM data to identify fold-back reads and accurate CN calls and reports either candidate BFB structures that explain the observed OM data with high likelihood or declare ‘no-BFB’ for the window. Time permitting, we will discuss these data, the insights obtained from BFB driven focal amplifications and their connection to cancer pathology.
Doron Betel, Weill-Cornell
"A primer on computational approaches in single cell spatial transcriptomics"
Advancements in single cell technologies have transformed our ability to investigate cellular composition in both healthy and diseases conditions. Multimodal single cell measurements of samples through protein abundance, gene expression, epigenetics, and genotype provide high-resolution view of the cellular states and enable the identification of new biological processes. More recently spatial omics technologies have added a new dimension of information illuminating tissue architecture and cellular interactions. In this primer we will introduce current spatial technologies and their applications. We will focus on algorithmic approaches for utilizing single-cell and spatial measurements towards new discoveries in cancer biology including tumor microenvironment and cellular interactions.
Bio: Coming Soon
Hannah Carter, UCSD
"Bioinformatic analysis for precision immunotherapy"
Cancer immunotherapy to has surged to prominence with some remarkable successes. However, we are still far from seeing the full potential impact of immune treatments on patient survival and from understanding the determinants of clinical success. Currently immune checkpoint inhibitors (ICPi) are only effective in 30-40% of cases and in selected cancer types, such as cutaneous melanoma, non-small cell lung carcinoma NSCLC, and cancers with MSIhigh status. The growing availability of rich multi-omic datasets from tumors in general and from cohorts treated with immune checkpoint inhibitors has supported the growth of immunoinformatics for cancer. A number of cancer-specific tools and approaches have emerged to support analysis of the tumor neoantigen landscapes and characteristics of the local tumor immune microenvironment toward uncovering the determinants of immunotherapy resistance and response. This lecture will cover bioinformatic tools for identifying putative neoantigens, predicting neoantigen immunogenicity and modeling immunoediting.
Dr. Hannah Carter is an Associate Professor in the UCSD Department of Medicine, Division of Genomics and Precision Medicine. She received her Ph.D. in Biomedical Engineering from Johns Hopkins University and her M.Eng. in Electrical and Computer Engineering from the University of Louisville. She is an Azrieli Global Scholar, a Siebel Scholar and a recipient on a 2013 NIH Director’s Early Independence Award. Her research interest is in the development of computational tools and analyses to advance variant interpretation and precision cancer medicine.
Lenore Cowen, Tufts U
"Pathways for Learning from Structure and Organization of Biological Networks"
Biological networks are powerful resources for the discovery of genes and genetic modules that drive disease, and have been highly successful in uncovering previously unknown genes involved in pathways that drive (or protect against) cancer. In particular, diffusion-based low-dimensional network embedding methods have proved quite powerful for biological networks. These diffusion methods will uncover coherent local neighborhoods that correlate to gene function in many types of biological networks, from PPI networks to co-expression networks, enabling the downstream use of the entire machine learning toolbox to perform multiple inference tasks. We can unlock further performance gains for certain types of biological network data by taking advantage of unique graph-theoretic structure. For example, networks built on gene expression correlation will have many more triangles, as compared to classical PPI networks that place an edge between two genes when there is experimental evidence that their proteins physically bind. And in genetic interaction networks, while the coarse structure still allows diffusion methods to unlock global organization, a much-more fine grained organization is obtained on the signed genetic interaction network by considering generalizations of the between-pathway model first introduced by Kelley and Ideker. If time permits, we will also present some recent work that combines a bottom-up protein structure-based view of protein interaction with a top-down network-based approach for computational prediction of novel protein-protein interactions.
Dr. Lenore J. Cowen is a Professor in the Computer Science Department at Tufts University She also has a courtesy appointment in the Tufts Mathematics Department and in the Tufts Graduate School for Biomedical Sciences. Her research interests span three areas: Discrete Mathematics (since high school), Algorithms (since 1991 in graduate school) and most recently Computational Molecular Biology, where she focuses on predicting protein function from structural and biological network information. She received a BA in Mathematics from Yale and a Ph.D. in Mathematics from MIT. After finishing her Ph.D. in 1993, she was an NSF Postdoctoral Fellow and then joined the faculty of the Mathematical Sciences Department (now the Applied Mathematics and Statistics department) at Johns Hopkins University where she was promoted to the rank of Associate Professor in 2000. Dr. Cowen was named an ONR Young Investigator and a fellow of the Radcliffe Institute for Advanced Study. Lured by the Boston area, and the prospect of making an impact in a growing young department, she joined Tufts in September, 2001. She led a team that won the DREAM Disease Module Identification challenge in 2016. She is on the Editorial Board of the IEEE/ACM Transactions of Computational Biology and Bioinformatics (TCBB) and an Associate Editor of the journal Bioinformatics(from Oxford University press). In 2020, she was awarded both the CRA-E Undergraduate Research Faculty Mentoring Award from the Computing Research Association, and the NCWIT Undergraduate Research Mentoring Award from the National Center for Women and Information Technology. She currently leads the NSF funded Tufts T-Tripods Institute on interdisciplinary approaches to studying the foundations of data science.
Mohammed El-Kebir, Univ of Illinois
"Combinatorial Algorithms for Cancer Phylogenetics"
Cancer is a genetic disease, where cell division, mutation and selection produce a heterogeneous tumor composed of multiple subpopulations of cells with different sets of mutations. During later stages of cancer progression, cancerous cells from the primary tumor migrate and seed metastases at distant anatomical sites. The cell division and mutation history of an individual tumor can be represented by a phylogenetic tree, which helps guide patient-specific treatments. In this talk, I will introduce combinatorial algorithms for reconstructing, comparing and interpreting tumor phylogenies from DNA sequencing data. Specifically, I will discuss the challenges that arise when analyzing bulk and single-cell DNA sequencing samples of tumors. I will also discuss how to summarize diverse solution space of tumor phylogenies. Finally, I will demonstrate how inferred tumor phylogenies can be used in downstream analyses to obtain additional insights about tumorigenesis.
Dr. Mohammed El-Kebir is an Assistant Professor of Computer Science at the University of Illinois Urbana-Champaign. Prior to joining Illinois, he was a postdoctoral research associate at Princeton University and Brown University. He received his PhD in 2015 at Centrum Wiskunde & Informatica (CWI) and VU University Amsterdam in the Netherlands. His research is in combinatorial optimization algorithms for problems in computational biology, with a particular focus on cancer evolution.
Iman Hajirasouliha, Weill-Cornell
"Weakly-supervised tumor purity prediction from frozen H&E stained slides"
Estimating tumor purity is especially important in the age of precision medicine. Purity estimates have been shown to be critical for the correction of tumor sequencing results, and higher purity samples allow for more accurate interpretations from next-generation sequencing results. Molecular-based purity estimates using computational approaches require sequencing of tumors, which is both time-consuming and expensive. Here we propoe an approach, weakly-supervised purity (wsPurity), which can accurately quantify tumor purity within a digitally captured hemataxylin and eosin (H&E) stained histological slide, using several types of cancer from The Cancer Genome Atlas (TCGA) as proof-of-concept.
Dr. Iman Hajirasouliha is an Associate Professor of Computational Genomics at the Institute for Computational Biomedicine at Weill Cornell Medicine of Cornell University and a member of the Englander Institute for Precision Medicine and the Meyer Cancer Center, New York, USA. He completed a Postdoctoral Scholarship at the Computer Science Department, Stanford University, and a Simons Research Fellowship at the University of California, Berkeley. His research focuses on computational genomics, computational digital pathology, large-scale sequence analysis, and characterizing somatic variations and intra-tumor heterogeneity in cancer. Iman received his B.Sc. in Computer Engineering from Sharif University and his M.Sc. in Computing Science from Simon Fraser University (SFU). He obtained his Ph.D. with Exceptional Recognition from SFU and also held a postdoctoral appointment at Brown University.
Iman received a Simons-Berkeley Research Fellowship, an NIGMS Maximizing Investigators' Research Award, and an Irma T. Hirschl Career Scientist Award. He is on the program committee of several bioinformatics conferences, including ISMB and RECOMB Link: www.imanh.org
Sridhar Hannenhalli, NCI
"Gene regulatory mechanisms in cancer - beyond genetics"
While mutations, specifically those affecting protein-coding genes, have been a major focus of cancer research, they do not explain oncogenesis, metastasis, and therapy response entirely, and epigenetic plasticity is emerging as a potent complementary mechanism. Stochastic gene expression variability is intimately linked to cellular plasticity, which while being an integral part of development and stress response, is also linked to cancer and presents a major challenge for cancer therapy. I will present our recent work showing existence of transcriptionally distinct subpopulation of healthy pancreatic acinar cells exhibiting features of ductal-acinar progenitor state pancreatic ductal adenocarcinoma. Parallels between development and cancer has long been noted and recent works have identified activation of developmental programs in cancer. I will briefly summarize our recent works showing (1) a novel developing melanoblast cell state associated with metastasis and therapy response in melanoma and (2) a broad misappropriation of developmental splicing programs by cancer. Time permitting, I will summarize our recent attempts to characterize non-coding mutations during evolution and in cancer.
Dr. Hannenhalli obtained a B. Tech from the Indian Institute of Technology (1990) and his Ph.D. in Computer Science from the Pennsylvania State University (1995). After a postdoctoral fellowship at the University of Southern California (1996-1997), he worked as a Senior Scientist at Glaxo Smith-Kline (1997-2000) and then at Celera Genomics (2000-2003), where he was involved in the work reporting the first human genome sequence. He was a faculty member in the Department of Genetics at the University of Pennsylvania (2003-2010), and then at the University of Maryland (UMD) with joint appointments in the Department of Cell Biology and Molecular Genetics, and the University of Maryland Institute for Advanced Computer Studies (2010-2019). Dr. Hannenhalli served as Interim Director of the Center for Bioinformatics and Computational Biology at UMD (2012-2013) and was a Fulbright Scholar and Visiting Professor at the Indian Institute of Sciences and the National Center for Biological Sciences, Bengaluru (2017-2018). The Hannenhalli lab is broadly interested in developing computational and statistical approaches to harness the huge amount of biological data to ultimately answer specific biological questions pertaining to gene regulation and evolution, both from the basic science as well as translational perspective, with specific applications to development and diseases, with an emphasis on cancer.
Peng Jiang, NCI
"Big Data Approaches to Study Intercellular Signaling in Cancer Immunotherapy Resistance"
Recent years have seen the rapid growth of big data in immunology and immune-oncology research. However, leveraging the vast amount of public data resources to make new findings is still challenging for most immunologists due to the complexity and heterogeneity of published datasets. I will introduce two data-integrative frameworks developed to help immunologists to understand intercellular signaling mechanisms, with applications in studying cancer immunotherapy resistance. The first framework CytoSig (https://cytosig.ccr.cancer.gov) contains a vast amount of cytokine treatment response data curated from public repositories. CytoSig can reliably predict cytokine signaling cascades in human inflammatory diseases and cancers. The second framework Tres (https://resilience.ccr.cancer.gov) provides many T-cell genomics datasets and interactive functions for immune oncologists to study molecular markers of T-cell anti-tumor efficacies. The Tres model also identified FIBP knockout as a new approach to potentiate cellular immunotherapies in solid tumors.
Dr. Peng Jiang started his research program at the National Cancer Institute (NCI) in July 2019. His Lab focuses on developing big-data and artificial intelligence frameworks to identify biomarkers and new therapeutic approaches for cancer immunotherapies in solid tumors. Before joining NCI, he finished his postdoctoral training at the Dana Farber Cancer Institute and Harvard University. During his postdoctoral research, Peng developed computational frameworks that repurposed public domain data to identify biomarkers and regulators of cancer immunotherapy resistance. Notably, his computational model TIDE revealed that cancer cells could utilize the self-protection strategy of cytotoxic lymphocytes to resist lymphocyte killing under immune checkpoint blockade. Dr. Peng finished his Ph.D. at the Department of Computer Science & Lewis Sigler Genomics Institute at Princeton University, and his undergraduate study with the highest national honors at the Department of Computer Science at Tsinghua University (GPA ranked 1st in his year). He is a recipient of the NCI K99 Pathway to Independence Award, the Scholar-In-Training Award of the American Association of Cancer Research, and the Technology Innovation Award of the Cancer Research Institute.
Rachel Karchin, JHU
"Predicting immunogenic neoepitopes in cancers"
Neoepitopes are somatically mutated peptides which may elicit an antigen-specific T cell response. The response depends on peptide binding to major histocompatibility complex (MHC) molecules, presentation by MHCs on the cell surface, binding of the pMHC complex to a T cell receptor, and T cell activation. In cancer immunotherapy, neopitopes are used to develop patient-specific vaccines and may be useful in predicting response to immune checkpoint blockade immunotherapies. Personalized epitope prediction is currently not high-throughput, and computational methods are used to predict if a neoepitope is immunogenic, given an individual’s HLA genetic background. The vast majority of neoepitopes are not immunogenic. It is therefore critical that predictors of immunogenic neoepitopes have high precision among their most highly ranked output, because it is necessary to select a short list of candidate neoepitopes that can be validated per patient in a clinical setting. I will introduce the MHCnuggets and BigMHC neoepitope predictors and discuss their different approaches to prediction of peptide-MHC binding, pMHC presentation and neoepitope immunogenicity.
Dr. Rachel Karchin is a Professor in the Department of Biomedical Engineering at Johns Hopkins University. She received a Ph.D. in Computer Science from the University of California, Santa Cruz in 2003, spent three years as a postdoctoral fellow in the Department of Biopharmaceutical Sciences at University of California, San Francisco, and joined the Hopkins faculty in 2006. Working closely with cancer geneticists, pathologists and oncologists, her lab has developed novel tools to identify pathogenic missense mutations, driver genes, multivariate biomarkers to inform cancer treatment, to model tumor evolution from next-generation sequencing data and to predict tumor neoepitopes. She was the leader of the computational efforts to identify driver mutations for pioneering cancer sequencing projects at Johns Hopkins Sidney Kimmel Cancer Center, and co-led the TCGA PanCan Atlas Essential Genes and Drivers Analysis Working Group. In 2017, she was inducted into the College of Fellows of the American Institute for Medical and Biological Engineering for her contributions to translational computational biology
Aly Khan, U Chicago
"Deciphering anti-tumor immune responses with systems immunology and genomics"
One of the major challenges in developing new cancer immunotherapies and identifying effective treatment biomarkers is the incomplete understanding of the molecular and cellular mechanisms that drive human anti-tumor immune responses. Despite the usefulness of animal models in studying cancer immunology, they fail to perfectly replicate the complexity of human immune responses. By analyzing genetic data from large groups of human patients, we can gain insights that cannot be obtained from animal models or reductionist experiments, particularly in the case of non-small cell lung cancer (NSCLC). The response rates to immune checkpoint blockade (ICB) in NSCLC vary greatly, and the mechanisms behind these responses are not fully understood. Somatic loss of heterozygosity at the HLA-I locus (HLA-LOH) has been identified as a mechanism that enables tumors to evade the immune system, but many patients with HLA-I disruptions in their tumors still have durable responses to ICB. By utilizing genomic sequencing and integrating it with single-cell profiling and ICB treatment outcomes, we can better understand the complex and dynamic systems underlying anti-tumor immune responses in patients with HLA-I disruptions. Our research has found that clonally expanded populations of CD4+ T cells with a cytotoxic phenotype can infiltrate tumors in NSCLC patients and may have an underappreciated role in contributing to anti-tumor immune responses via HLA class II mechanisms. When we integrate these findings with tumor mutational burden, we found a significant association with progression-free survival, including in patients with HLA-LOH. These results demonstrate how a systems immunology approach can leverage genomic profiling to generate insights into human immune responses and potentially inform strategies for exploiting other genetic variations to decipher human immune functions.
Dr. Aly A. Khan is an Assistant Professor in the Departments of Pathology, Family Medicine, and the College at the University of Chicago. His research focuses on developing novel computational methods to understand how immune cells interact with each other, the surrounding tissue and organ systems, and the microbiome. A major goal of his lab is to translate computationally-driven discoveries into clinically relevant applications. Prior to joining the University of Chicago, he was a member of the research faculty at the Toyota Technological Institute at Chicago, where he established a research program in computational immunology. In addition to his academic research, he has advanced translational science and research at Merck, Genentech, Tempus Labs, and currently, 23andMe. He obtained his PhD in Computational Biology jointly from Cornell University and Memorial Sloan Kettering Cancer Center.
Misha Kolmogorov, NCI
"Profiling somatic and germline structural variants in cancer genomes using long reads"
Recent studies highlighted the rich landscape of somatic structural variation (SV) across various tumor types. Most current pan-cancer genomics projects rely on reference mapping of short reads to detect and genotype structural variation. A substantial part of the variation in the human genome is not accessible to short reads due to mapping ambiguities. Long-read sequencing (such as PacBio or Oxford Nanopore) can overcome the limitations of short reads, however the current methods were not designed for the analysis of rearranged cancer genomes with complex copy number profiles. First, we describe scalable and cost-effective methods for profiling of germline structural variations. Our wet lab and computational pipeline can generate high-quality de novo diploid assemblies and small variant calls from a single Oxford Nanopore flow cell. It is now used to sequence 1000s of genomes in collaboration with the Center of Alzheimer and Related Dementias at the NIH. The project will provide a first public database of long-read assemblies and structural variant calls for 1000s of human individuals, complementing the existing short-read resources. Second, we developed methods to detect rearrangements and build a breakpoint graph representation that characterizes the structure of derived cancer karyotypes. Complex events with multiple breakpoints form connectivity clusters and are classified based on the subgraph properties. We characterized multiple cell lines and identified many somatic rearrangement clusters involving multiple breakpoints. We frequently observed chromosome fusions, breakage-fusion-bridges, chromothripsis, large focal and ecDNA amplifications. We further applied our methods to multiple related tumor samples to profile structural changes over the course of tumorigenesis.
Before joining the Cancer Data Science Laboratory in January 2022, Dr. Mikhail Kolmogorov was a postdoctoral fellow at the University of California (UC) Santa Cruz, supervised by Dr. Benedict Paten. Prior to that, he was a postdoctoral fellow at the UC San Diego, co-supervised by Dr. Rob Knight and Dr. Pavel Pevzner. Mikhail completed his Ph.D. in September 2019 in Computer Science from UC San Diego, under the mentorship of Dr. Pavel Pevzner. He received his M.Sc. in bioinformatics from St. Petersburg University of the Russian Academy of Sciences.
Eugene Koonin, NCBI
"Mutation-Selection Balance, Compensatory Mechanisms and Driver Epistasis in Tumor Evolution"
Intratumor heterogeneity and phenotypic plasticity, mediated by a range of somatic aberrations, metabolic and epigenetic adaptations, are the principal mechanisms that enable cancers to evolve resistance to treatment and survive under environmental stress. Nonetheless, a comprehensive picture of the interplay between different somatic aberrations across scales, from point mutations to whole-genome duplications, in tumor initiation and progression is lacking. I discuss the temporal behavior of each type of aberration, how it is affected by selection, and how it affects patients’ clinical outcome. The dependency of tumor fitness on the levels of different aberrations is biphasic such that there is an optimal value for each type of change. These optima are attained in different phases of tumor evolution and appear to play compensatory roles in maintaining tumor fitness. Specifically, repeat instability contributes to cancer initiation, whereas larger aberrations (e.g., aneuploidy) are progressively involved in later stages. Consideration of the impact of environmental factors on the emergence of genetic aberrations further supports this temporal order. A better understanding of the interactions between genetic aberrations, the microenvironment and epigenetic and metabolic cellular states is essential for early detection, prevention, and development of therapeutic strategies. Cancer driver mutations often display mutual exclusion or co-occurrence, underscoring the key role of epistasis in carcinogenesis. Conditional selection affects 25%-50% of driver substitutions in tumors with >2 drivers. Conditionally co-selected genes form modular networks, whose structures challenge the traditional interpretation of within-pathway mutual exclusivity and across-pathway synergy, suggesting a more complex scenario where gene-specific across-pathway epistasis shapes differentiated cancer subtypes.
Dr. Eugene V. Koonin is an NIH Distinguished Investigator and leader of the Evolutionary Genomics Group at the National Center for Biotechnology Information (NIH). He received his Ph.D. in Molecular Biology in 1983 from Moscow State University, joined the NIH in 1991 and became a Senior Investigator in 1996. His research interests focus on evolutionary genomics of prokaryotes, eukaryotes and viruses, host-parasite coevolution, evolution of cancer, and general theory of the evolution of life. Dr. Koonin made extensive contributions to the study of the functions and evolution of CRISPR systems. He is a member of the National Academy of Sciences of the USA, National Academy of Medicine of the USA, American Academy of Arts and Sciences, and European Molecular Biology Organization.
Tere Landi, NCI
"Interaction of APOBEC deaminase and tobacco smoking activities in lung cancer"
APOBEC enzymes are part of the innate immunity and are responsible for restricting viruses and retroelements by deaminating cytosine residues. Most solid tumors harbor different levels of somatic mutations attributed to the off-target activities of APOBEC3A (A3A) and/or APOBEC3B (A3B). However, how APOBEC3A/B interact with exogenous mutagenic processes in shaping tumor development is largely unknown. By combining deep whole-genome sequencing with multi-omics profiling of 309 lung cancers from smokers with detailed tobacco smoking information, we identified two subtypes defined by low and high APOBEC mutagenesis. I will present how the interaction between APOBEC and tobacco smoking mutagenesis affects tumor cell state composition, driver genes, neoantigens presentation, and age at onset of lung cancer.
Dr. Tere Landi is an M.D., Ph.D. with training in clinical oncology and molecular epidemiology. She is Senior Advisor for Genomic Epidemiology, Trans-Divisional Research Program, and Senior Investigator, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH. She focuses her research on the genetic and environmental determinants of lung cancer and melanoma, and on the genomic characterization of these tumors. She is the Principal Investigator of both EAGLE and Sherlock-Lung, two landmark studies of lung cancer in smokers and never smokers, respectively, which identified subtypes with distinct genomic features, mutational signatures, and evolutionary trajectories. She is also the leader of the MelaNostrum consortium with the largest family study of melanoma worldwide.
Christina Leslie, MSKCC
"Machine learning models for regulatory and single cell genomics"
The last several years have brought notable successes in the application of machine learning approaches, and especially deep learning models, to problems in regulatory genomics and single cell analyses. Sequence models based on convolutional neural networks are widely used to interpret DNA motif grammars that explain epigenomic signals like transcription factor occupancy and chromatin accessibility. Other models seek to predict gene expression output from genomic sequence context and/or epigenomic data in order to yield insights into gene regulation, and various machine learning approaches have been used for dimensionality reduction and embedding of single cell data sets. We will present novel machine learning methods from our group for learning gene regulatory models and embedding single-cell chromatin accessibility (scATAC-seq) that overcome limitations of current methods. We will first show how to use 3D connectivity via Hi-C/HiChIP data in a graph attention network framework to learn predictive models of gene regulation with a model called GraphReg. We will also present a new sequence-informed embedding algorithm for scATAC-seq called CellSpace that maps DNA k-mers and cells to the same space to learn latent structure while exhibiting strong batch mitigating properties.
Dr. Christina Leslie did her undergraduate degree in Pure and Applied Mathematics at the University of Waterloo in Canada. She was awarded an NSERC 1967 Science and Engineering Fellowship for graduate study and did a PhD in Mathematics at the University of California, Berkeley, where her thesis work dealt with differential geometry and representation theory. She won an NSERC Postdoctoral Fellowship and did her postdoctoral training in the Mathematics Department at Columbia University in 1999-2000. She then joined the faculty of the Computer Science Department and later the Center for Computational Learning Systems at Columbia University, where she began to work in computational biology and machine learning and became the principal investigator leading the Computational Biology Group. In 2007, she moved her lab to Memorial Sloan Kettering Cancer Center, where she is currently a Member of the Computational and Systems Biology Program. Dr. Leslie is well known for developing machine learning approaches for the analysis and interpretation of high-throughput biological data – in particular, bulk and single-cell transcriptomic, epigenomic, and 3D genomic sequencing data sets – with the goal of decoding gene regulation. Biological application domains include basic and cancer immunology, cancer epigenetics, and stem cell biology and cellular differentiation. She is PI, together with Alexander Rudensky, of the NCI U54 Center for Tumor-Immune System for Systems Biology at MSKCC. She is also co-leads projects in the NHGRI Impact of Genomic Variation on Function (IGVF) consortium and the NIH Common Fund 4D Nucleome (4DN) consortium.
Jian Ma, CMU
"Machine learning for spatial genomics"
Abstract: Genomes and cells have spatial organizations that have strong implications for human diseases such as cancer. However, the principles underlying this spatial organization and its functional impact are still poorly understood. During this presentation, I will introduce some of our recent work in developing machine learning algorithms to study single-cell 3D genome organization and spatial transcriptomics. We hope that these algorithms will provide new insights into the structure and function of nuclear architectures, as well as the spatial organization of cells in complex tissues.
Dr. Jian Ma is the Ray and Stephanie Lane Professor of Computational Biology in the School of Computer Science at Carnegie Mellon University. His lab has developed algorithms to study the structure and function of the human genome with a focus on genome organization, gene regulation, single-cell epigenomics, spatial genomics, and comparative genomics.
Lichun Ma, NCI
"Single-cell dissection of tumor heterogeneity and tumor evolution in liver cancer"
Tumor heterogeneity is key factor for therapeutic failures and lethal outcomes of solid malignancies. Intratumor heterogeneity may result from the evolution of tumor cells and their continuous interactions with the tumor microenvironment which collectively drives tumorigenesis. However, an appearance of cellular and molecular heterogeneity creates a challenge to define molecular features linked to tumor malignancy. Thus, understanding tumor heterogeneity in the context of tumor initiation and evolution may shed light on the intrinsic tumor biology and the identification of novel biomarkers. Liver cancer, the second most lethal malignancy in the world, consists of mainly hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (iCCA). Most of patients with HCC and iCCA have limited response to molecularly-targeted therapies. We apply single-cell and spatial approaches to profile primary tumors from liver cancer patients and develop computational methods to understand tumor heterogeneity with the goal of improving early detection and therapeutics for liver cancer.
Dr. Lichun Ma received her Ph.D. degree in Electronic Engineering at the City University of Hong Kong in 2016. After a one-year postdoctoral fellowship at Nanyang Technological University, Singapore, she joined NCI in 2017 as a postdoctoral fellow where she studied cancer biology using single-cell techniques. She initiated her independent research program at NCI as a Stadtman Investigator in 2022. Dr. Ma has a strong background in mathematics, information theory and machine learning. She received many awards during her training, including the NCI CCR Excellence in Postdoctoral Research Transition award. Her recent work on tumor cell biodiversity and microenvironmental reprogramming in liver cancer was showcased in the 2019-2020 NCI Center for Cancer Research Milestones publication.
Erin Molloy, UMD
"Models and methods for reconstructing population-level evolutionary histories and relationships to tumor phylogenetics"
This lecture will provide an overview of phylogenetic models and methods that have transformed the field of systematics over the last decade. These recent developments have been driven by increasing access to genome-scale data, coupled with the observation that the evolutionary history of (recombination-free) regions of the genome can differ from the evolutionary history of the species or populations. This heterogeneity across the genome can be due to a variety of biological processes, with perhaps the most well-studied source being an outcome of ancestry, called incomplete lineage sorting (ILS). In this lecture, we will describe how ILS is modeled under the Multi-Species Coalescent (MSC) model and then present the dominant methods for reconstructing evolutionary trees under the MSC. In particular, we will highlight theoretical guarantees, the assumptions under which these guarantees hold, and the challenges that arise in practice. Lastly, we will discuss how these methods designed for species (or populations) can be applied and interpreted in the context of tumor phylogenetics. Overall, this presentation will provide insight into the performance of popular phylogeny estimation methods under a variety of contexts and highlight avenues for future research.
Dr. Erin Molloy is an assistant professor in computer science at the University of Maryland, College Park. Her lab works on problems in computational evolutionary genomics, with a focus on algorithms for reconstructing species- and population-level histories. She received her PhD in computer science from University of Illinois at Urbana-Champaign, where she was advised by Dr. Tandy Warnow and Dr. Bill Gropp. Before joining Maryland, Molloy was a postdoctoral scholar in Dr. Sriram Sankararaman's Machine Learning and Genomics Lab at the University of California, Los Angeles. Her research has been supported by the NSF Graduate Research Fellowship, two allocations on the Blue Waters supercomputer, a residency at the Institute for Pure and Applied Mathematics, and the State of Maryland.
Pawel Przytycki, Boston U
"Deciphering Cell Type and State with Single Cell Chromatin Accessibility"
Cell type and state specific regulation of genes is often dictated by cis-regulatory regions. Single-cell ATAC sequencing (scATAC-seq) has emerged as a technology for profiling chromatin accessibility in individual cells. However, this data is notoriously sparse and noisy, even in comparison to single-cell RNA sequencing. In this tutorial we will first look at standard pipelines for processing these data including dimensionally reduction and batch correction in order to identify types of cells. We will then look at downstream analyses such as peak calling, motif enrichment, and footprinting. Finally, we will discuss more advanced topics including multi-modal data integration and mapping bulk-derived data.
I am an Assistant Professor in the Faculty of Computing & Data Sciences at Boston University. My lab develops algorithms for the analysis and interpretation of large-scale genomics data, with a focus on the role of the non-coding genome in development and disease. Before starting my own lab, I was a Bioinformatics Fellow in Dr. Katie Pollard's lab at the Gladstone Institutes at UCSF. The focus of my research was investigating the regulatory effects of noncoding variants. Prior to that, I was a PhD student and NSF Graduate Research Fellow in Dr. Mona Singh's lab in the Department of Computer Science at Princeton University where my research focused on the use of algorithms and networks for cancer genomics.
Ben Raphael, Princeton
"Alignment, Integration, and Modeling of Spatial Transcriptomics Data"
Spatial transcriptomics technologies measure RNA expression at thousands of locations in a tissue sample providing information about the spatial distribution of cell types and the spatial variation in gene expression across a tissue. However, these measurements are typically sparse with high rates of missing data. In this talk, I will describe statistical and machine learning approaches that address data sparsity by modelling spatial correlations between measurements within and across tissue slices. The first approach, PASTE, aligns and integrates spatial transcriptomics data from multiple slices from the same tissue enabling downstream applications such as differential gene expression and 3D reconstruction of tissues. The second approach, Belayer, models variation in gene expression using a model of a layered tissue that consists of stacked layers with distinct cell type composition, such as found in the brain and skin. Belayer automatically identifies layer boundaries by combining segmented linear regression with a conformal mapping procedure that transforms curved layers into a rectilinear coordinate system. I will demonstrate the applications of these approaches to spatial transcriptomics data from both human and cancer tissues.
Ben Raphael is a Professor of Computer Science at Princeton University. His research focuses on the design of combinatorial and statistical algorithms for the interpretation of biological data. Recent areas of emphasis include cancer evolution, network/pathway analysis of germline and somatic mutations, single-cell and spatial DNA/RNA sequencing, and structural variation in human and cancer genomes. His group’s algorithms have been used in multiple projects from The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC). He received an S.B. in Mathematics from MIT, a Ph.D. in Mathematics from the University of California, San Diego (UCSD), completed postdoctoral training in Bioinformatics and Computer Science at UCSD, and was on the faculty of Brown University (2006-2016). He is a recipient of the 2021 Innovator Award from the International Society for Computational Biology, the Alfred P. Sloan Research Fellowship, the NSF CAREER award, and a Career Award at the Scientific Interface from the Burroughs Wellcome Fund. He is elected Fellow of the International Society for Computational Biology (2020).
Eytan Ruppin, NCI
"Next generation transcriptomics-based precision oncology"
Precision oncology has made significant advances, mainly by targeting actionable mutations and fusion events involving cancer driver genes. Aiming to expand treatment opportunities, recent studies have begun to explore the utility of tumor transcriptome in guiding patients’ treatment. I will describe four new computational approaches that we have developed to this end: First, SELECT and ENLIGHT, that aim to predict patient response from bulk tumor transcriptome. Second, PERCEPTION, which aims to advance precision cancer therapy from single cell tumor transcriptomics. Thirdly, DeepPT, a precision oncology expression-based approach that starts from tumor histopathological images. Fourthly, the development of Liquid-based transcriptomics (LBT) to learn about the tumor immune microenvironment from the blood. Finally, I will discuss the challenges laying ahead.
Dr. Eytan Ruppin received his M.D. and Ph.D. from Tel-Aviv University where he has served as a professor of Computer Science & Medicine since 1995, conducting computational multi-disciplinary research spanning computational neuroscience, natural language processing, machine learning and systems biology. In 2014, he joined the University of Maryland as director of its center for bioinformatics and computational biology (CBCB). In 2018, he moved to the NCI where he co-founded and is chief of its Cancer Data Science Lab (CDSL). His research is focused on developing new computational approaches for advancing transcriptomics-based precision oncology. Eytan is a member of the editorial board of Molecular Systems Biology, a fellow of the International Society for Computational Biology (ISCB), a recipient of the NCI Director and the Delano awards and is a co-founder of a few precision medicine startup companies.
Alejandro Schaffer, NCI
"Studying precision oncology past by mining a clinical trials database and identifying future opportunities from single-cell analysis"
Journalists are taught to ask questions that start with 'who', 'what', 'when', 'where', and 'why'? In contrast, accountants, discrete mathematicians, epidemiologists, and a few other professions tend to ask questions that start with 'how many'? I will present usage of two tools to address research questions in personalized oncology that start with 'how many'.
To understand the past of personalized oncology, it is useful to analyze systematically past clinical trials and ask questions of the form: How many oncology clinical trials have a particular structure or characteristic? Using the commercial database Trialtrove, natural language processing, and expert curation, one can answer such questions precisely. I will summarize two case studies in which we used Trialtrove to answer the questions:
How many oncology clinical trials used germline DNA information?
How many oncology clinical trials leveraged the principle of synthetic lethality?
Looking towards the future, a key trend in personalized oncology is combination therapy in which a patient concurrently receives multiple treatments that target different problematic aspects of the tumor. A natural combinatorial question in the combination therapy setting is: How many different treatments does a cancer patient need to achieve some quantifiable level of clinical benefit, such as a partial response?
I will present the software MadHitter that combines principles from molecular biology, combinatorial optimization, physical chemistry, and microeconomics to arrive at a precise answer this question for some classes of cancer treatments. The input to MadHitter includes i) the patient's single-cell transcriptomics from the tumor and tumor microenvironment and ii) a set of genes/proteins that might be targeted by treatments such as CAR-T or drug-antibody conjugates.
Dr. Alejandro Schäffer was born in Montevideo, Uruguay and emigrated with his parents to the United States. He received his B. S. in Applied Mathematics and his M.S. in Mathematics from Carnegie Mellon University in 1983. He received his PhD in Computer Science from Stanford University in 1988, focusing on algorithms and theoretical computer science. In 1992, he switched his research focus to software for genetics. He is best known for leading the development of the genetic linkage analysis package FASTLINK and for doing the implementation of the PSI-BLAST module of the sequence analysis package BLAST. The 1997 paper describing PSI-BLAST and other algorithmic improvements to BLAST is one of the 100 most cited scientific papers of all time. He has also carried out genomic data analysis as one member of large teams doing medical genetics studies, especially studies identifying genes that when mutated cause human primary immunodeficiencies. In 1999, Dr. Schäffer co-authored one of the first papers in tumor phylogenetics, now an active area of research in cancer genomics. Dr. Schäffer has been a Computer Scientist at the National Institutes of Health since 1996, first at the National Center for Human Genome Research which became the National Human Genome Research Institute during 1996-1998, second at the National Center for Biotechnology Information 1998-2018, and currently at the Cancer Data Science Laboratory in NCI which he joined on October 28, 2018. In the Cancer Data Science Laboratory, Dr. Schäffer is guided by the Lab Chief, Dr. Eytan Ruppin, to apply his experience in algorithms, biological sequence analysis, genetic data analysis, and immunology to address research questions in cancer genomics.
Russell Schwartz, CMU
"Optimization and Mathematical Programming for Cancer Research"
In this tutorial, we will explore applications and tools for optimization in cancer research, with particular focus on the techniques of mathematical programming and integer linear programming. We will explore some basic concepts in modeling optimization problems for cancer genomic data that allow us to express practical questions we might pose about cancer genomic data sets as computational optimization problems. We will study some basic heuristic methods we can use to find solutions to these problems and then transition to mathematical programming methods, particularly integer linear programming, which provide a rigorous way of solving a broad class of computational problems. We will illustrate use of these tools through a variety of questions we will pose and see how to solve in practice. Instruction will involve a combination of lecture-based material and hands-on workshops in which we work as a group on real data using Jupyter Notebooks and R.
Dr. Russell Schwartz received his B.S., M.Eng., and Ph.D. degrees from the Department of Electrical Engineering and Computer Science (EECS) at the Massachusetts Institute of Technology, the last in 2000. This was followed by postdoctoral work in the MIT Biology Department studying protein biophysics and research as an Informatics Research Scientist at Celera Genomics Corporation, where he was involved in some of the first efforts to sequence the human genome and study human genetic variation at a whole-genome scale. Since leaving Celera in 2002, he has been a faculty member of Carnegie Mellon University where he is currently Professor and Head of the Computational Biology Department and Professor of Biological Sciences with additional appointments in the Computer Science Department and Machine Learning Department. His laboratory has worked broadly on algorithms, machine learning, and simulation methods for computational genetics, genomics, and biophysics, with a current focus largely on computational cancer biology and somatic variation. He is also very active in bioinformatics education, largely through work with the International Society for Computational Biology (ISCB), where he currently serves as a Vice President of the Society and co-chair of its Education Community of Special Interest (COSI).
Ron Shamir, Tel Aviv U
"Integrated analysis of multi-omic data in cancer"
The availability of large multi-modal biological datasets invites researchers to deepen our understanding in basic science and medicine, with the goal of personalized analysis. While inquiry of each data type separately often provides insights, integrative analysis has the potential to reveal more holistic, systems-level findings. We demonstrate the power of integrated analysis in cancer by developing algorithms on several levels, including subtyping based on multiple omics; integrating two different omics from two different cancer cohorts; identifying and ranking driver genes in an individual's tumor based on expression and mutation profiles; and predicting a healthy individual’s future risk of developing cancer based on data from routine periodical checkups. Modularity and network analysis are recurring themes in our studies.
Ron Shamir (PhD, UC Berkeley 1984) is a Sackler professor of Bioinformatics in the Blavatnik School of Computer Science at Tel Aviv University (TAU). His group develops algorithms and software tools in bioinformatics for understanding the genome and human disease.
Shamir is the founder of the Edmond J. Safra Center for Bioinformatics at TAU and headed it between 2005 and 2022. He has published > 300 scientific works, including 17 books and edited volumes, and has supervised > 70 research students. He was among the founders of RECOMB, and of the Israeli Society of Bioinformatics and Computational Biology. He is a recipient of the Landau Prize in Bioinformatics, the Kadar family prize for excellence in research, and a Fellow of the ISCB and the ACM. In 2022 he received the ISCB Senior Scientist Award.
Mona Singh, Princeton
"The case for equitable computational method development for precision oncology"
Precision oncology promises to transform cancer treatment by providing individualized therapeutic approaches that consider the unique biology of patient tumors. However, it is well known that there are existing health disparities in cancer outcomes. Since computational methods are increasingly being developed to facilitate precision oncology---from identifying medically relevant alterations to predicting whether tumors will respond to specific drug treatments---it is important to develop computational oncology approaches that work well across diverse populations. I will discuss these topics broadly but will largely focus on our recent work towards developing equitable machine learning methods for predicting peptide binding by major histocompatibility complex (MHC) proteins, as these MHC-peptide prediction methods play an important role in current approaches to harness adaptive immunity to fight viral pathogens and cancers.
Peter Van Loo, MD Anderson
“Molecular archeology of cancer”.
Tumor development is driven by changes to the genome and epigenome leading to fitness advantages underlying successive clonal expansions. As somatic genetic and epigenetic changes occur across most or all cell cycles, the cancer (epi)genome carries an archeological record of its past. Over the past years, we have developed several approaches to mine that archeological record from the cancer genome, which we collectively call 'molecular archeology of cancer'. Using these approaches, we are able to infer the subclonal architecture of tumors, and gain key insights into the order and timing of the genomic changes that occurred over their evolutionary history. We have applied these approach in a large-scale pan-cancer setting, showing that intra-tumor heterogeneity is pervasive across cancers, and that the timelines of tumor evolution span multiple years to decades, with typically similar key driver events occurring early. We have recently also started to develop methods that leverage the cancer epigenome to reconstruct the subclonal architecture of cancer, which will open up new avenues to study tumor evolution.
Dr. Peter Van Loo is a Professor and CPRIT Scholar in Cancer Research at the University of Texas MD Anderson Cancer Center, Department of Genetics, with a joint appointment at the Department of Genomic Medicine. His research focusses on leveraging massively parallel sequencing efforts to study the evolutionary history of cancers. During his postdoctoral training at the University of Oslo, the University of Leuven, and the Wellcome Trust Sanger Institute, Peter developed computational techniques to study copy-number alterations in cancer genomes, and approaches to study the evolutionary history and subclonal architecture of tumors from whole-genome sequencing data, a field coined “molecular archeology of cancer”. Prior to joining MD Anderson, Peter was a Group Leader at the Francis Crick Institute in London, UK, where he still leads a research group. His work there has sketched the typical evolutionary trajectories of many cancer types, allowing insight into the timelines of cancer development, as well as insight into how tumors metastasize. Peter was the main lead of Evolution and Heterogeneity working group of the Pan-Cancer Analysis of Whole Genomics (PCAWG) Consortium, and is the genomics lead of the Sarcoma arm of the 100,000 Genomes Project. Peter has been awarded a Cancer Research UK Future Leaders in Cancer Research Prize in 2015 and a VIB Alumni Award in 2017.
Fabio Vandin, U Padova
"Discovering Significant Evolutionary Trajectories in Cancer Phylogenies"
Tumors are the result of a somatic evolutionary process leading to substantial intra- tumor heterogeneity. Single-cell and multi-region sequencing enable the detailed characterization of the clonal architecture of tumors, and have highlighted its extensive diversity across tumors. While several computational methods have been developed to characterize the clonal composition and the evolutionary history of tumors, the identification of significantly conserved evolutionary trajectories across tumors is still a major challenge. I will describe our work on finding significantly conserved evolutionary trajectories in cancer. I will present MASTRO, our algorithm to discover all conserved trajectories in a collection of phylogenetic trees describing the evolution of a cohort of tumors, allowing the discovery of conserved complex relations between alterations.
Dr. Fabio Vandin is a Professor in the Department of Information Engineering at the University of Padova, Italy. His research interests are in efficient and rigorous algorithms for the extraction of useful information from large amounts of data, and in the application of such methods in computational biology and biomedicine. He received his PhD in Information Engineering from the University of Padova (Italy), and he has held research positions with various titles at the Department of Computer Science at Brown University (USA) and at the Department of Mathematics and Computer Science at the University of Southern Denmark.
Wenyi Wang, MD Anderson
"Benchmarking-related model development for deconvoluting cancer genomes and heterogeneous tissue transcriptomes"
Sophisticated risk prediction modeling has greatly improved screening and testing for inheritable cancer syndromes such as BRCA1/2 mutations in breast cancer. Such a quantitative risk prediction model was urgently needed for the early detection of the Li-Fraumeni syndrome (LFS) following the demonstration of reduced mortality with surveillance testing for that syndrome. LFS primarily arises from germline mutations in the TP53 tumor suppressor gene and is characterized by cancer that occurs relatively early in life, often repeatedly over a lifetime, and which affects multiple sites that overlap with those of other cancer syndromes, in particular the hereditary breast and ovarian cancer syndrome. Over the past 12 years, we have developed a series of statistical models and software tool LFSPRO to further the understanding of LFS. We have also disseminated our software tool in cancer genetic clinics, such as the MD Anderson Clinical Cancer Genetics Program, to predict who, including index case and family members, may benefit from LFS cancer screening for multiple organ/tissue sites. This talk will give an overview of the our most recently encountered statistical challenges and the corresponding solution, followed by a multi-center validation and software tool update and dissemination. Clinical tools based on statistical risk prediction modeling, similar to what is used for BRCA 1/2 mutations, are needed for LFS. We will discuss our efforts in creating a clinical tool to fill this need as well as the challenges for implementing this tool into clinical practice.
Dr. Wenyi Wang is a Professor of Bioinformatics and Computational Biology and Biostatistics at the University of Texas MD Anderson Cancer Center. She received her PhD from Johns Hopkins University and performed postdoctoral training in statistical genomics at UC Berkeley with Terry Speed and genome technology at Stanford with Ron Davis. Wenyi's research includes significant contributions to statistical bioinformatics in cancer, including MuSE for subclonal mutation calling, DeMixT for transcriptome deconvolution, Famdenovo for de novo mutation identification, and more recently, a pan-cancer characterization of genetic intra-tumor heterogeneity in subclonal selection, as well as a pan-cancer biomarker identification through integrative deconvolution of transcriptomic/genomic data. Her group is focused on the development and application of computational methods to study the evolution of the human genome as well as the cancer genome, and further develop risk prediction models to accelerate the translation of biological findings to clinical practice.
Damian Wojtowicz, Warsaw U
“Signatures of mutational processes in cancer: methods and mechanisms”
Cancer genomes accumulate a large number of somatic mutations resulting from stochastic errors in DNA processing, naturally occurring DNA damage, replication errors, dysregulation of DNA repair mechanisms, and carcinogenic exposures. These mutagenic processes often produce characteristic mutational patterns called mutational signatures. Identifying and analyzing such patterns can provide essential information on mutational processes underlying the development of cancer and can have potential implications for the understanding of cancer etiology, prevention, and therapy. Mathematical and computational tools are indispensable in extracting patterns buried within cancer genomes. I will present several computational approaches for studying mutagenic processes and their associations with mutagens and cellular processes through the lens of mutational signatures.
Dr. Damian Wójtowicz’s research interests spread across various topics in computational molecular biology and cancer genomics. He earned a Ph.D. in computer science in 2007 from the University of Warsaw (Poland) where he worked on modeling of genome evolution. During his postdoctoral training at the National Center for Biotechnology Information, National Institutes of Health, he mainly worked on computational methods to study topological and structural characteristics of DNA based on novel data derived from next-generation sequencing technologies. After five years, he transitioned to a staff scientist position to work on methods for better understanding the mutational processes causing human cancer through the lens of mutational signatures. In 2022, Damian returned to his alma mater and joined the faculty at the Department of Mathematics, Informatics, and Mechanics to continue his efforts in developing computational methods for cancer research. His current position is supported by the Polish National Agency for Academic Exchange (Polish Returns Programme).
James Zou, Stanford U
"What’s next for computational precision oncology??
Precision medicine aims to identify treatments that work best for each individual based on their genomics and other personal features. While there has been tremendous interest and investment, the translation of precision medicine to patients has been challenging. In oncology, in particular, only a small number of genomic alterations inform treatment selection. In this talk, I will discuss new directions for precision oncology. We will first discuss how to leverage large real-world clinico-genomics data and in silico trials to facilitate the discovery of predictive biomarkers that inform treatment choice. Then we will see how modeling rich spatial omics data with graph neural networks generate actionable biological insights into why some patients respond better than others.
Dr. James Zou is an assistant professor of Biomedical Data Science, CS and EE at Stanford University. He develops machine learning methods for biology and medicine. He works on both improving the foundations of ML–-by making models more trustworthy and reliable–-as well as in-depth scientific and clinical applications. He has received a Sloan Fellowship, an NSF CAREER Award, two Chan-Zuckerberg Investigator Awards, a Top Ten Clinical Achievement Award, several best paper awards, and faculty awards from Google, Amazon, Tencent and Adobe.