Just a few years ago, it was difficult to get any meaningful sequence data from formalin-fixed, paraffin-embedded (FFPE) patient tissue samples.
These biopsy specimens, which are tissues preserved in formaldehyde and embedded in wax for storage, are easy to make and store, even in remote research locations. Many are available, and more are made all the time, for use in experimental research and drug development. Because there are so many samples that have been taken over numerous years, they can also be used to help study the evolution of diseases and viruses.
But, deep inside those tissues, the DNA and RNA—the sequences of molecules that comprise genes and program proteins—degrade over time, particularly the RNA. That makes it difficult, if not impossible, using today’s most sophisticated sequencing technologies, to produce meaningful genomic data from old or poorly stored samples.
Compounding the problem, the process used make FFPE samples causes molecules, including DNA and RNA, to crosslink, or “stick,” to other molecules. Sequencing requires linear, individual, easily accessible strands of DNA or RNA, so this sticking makes most FFPE samples “un-sequenceable.”
DNA and RNA sequencing can provide genome-variation and gene-expression data that is essential for studying disease mechanisms. Studying RNA in particular can help scientists understand how cells work by allowing them to determine which genes are active in a particular cell type. Understanding how disease cell types work can unlock discoveries leading to therapeutics.
Now, the Sequencing Facility, a service headed by Bao Tran for the National Cancer Institute’s Center for Cancer Research, has developed and shared a methodology, along with quality control (QC) metrics, for getting the most information out of degraded samples.
Work with what you have and adapt as you go
To develop its new methodology and to work with such low-quality tissues, the team had to alter its standard protocols. This process involved some trial and error, beginning with just a few samples from the group’s larger sample set.
“We kind of put things together. We developed our own metrics … at three different starting points: at the initial sample QC level, at the library QC level, and also… some data QC metrics,” said Shetty.
One big thing they changed was in the sequencing library preparation step. Instead of using the typical method, they tried using a whole-transcriptome method that captures both coding and non-coding RNA and ultimately makes it more likely to detect useful information.
“These are gene-expression studies, so all you need is a stretch of sequence that can unambiguously identify the transcript that they’re coming from. As long as you can generate that much sequence, you can identify which gene it came from, and that is good enough because you get the information you are looking for,” said Mehta.
The project required extensive collaboration between many teams in the Sequencing Facility (R&D, Illumina sequencing, and bioinformatics), as every part of the sequencing process was tweaked in order to optimally sequence the degraded RNA samples. Fortunately, the team is used to collaborating internally and with other principal investigators, and even sequencing cores outside of the Frederick National Laboratory for Cancer Research.
“We call it a community effort and share the knowledge we have learned. And we also learn from each other in order to develop best practices in next-generation sequencing analysis,” said Yongmei Zhao, the Sequencing Facility’s bioinformatics manager.
Mining for meaningful data in available samples
Two things came together that allowed the team to make these important developments. One was an opportunity to beta-test a new sequencing library preparation kit. The other was that Danielle Mercatante Carrick, Ph.D., of NCI, approached the team with interest in getting RNA sequence data from 67 FFPE samples (60 different samples, plus seven replicates) of ovarian cancer.
Once the Sequencing Facility found a process that showed encouraging results, the staff processed the rest of the samples and gave Clark usable information. Later, the team published its methodology in JoVE, along with a video, for the benefit of the sequencing community.
“Five or six years ago, we couldn’t even do anything with [degraded FFPE] samples because the technology and the protocols [were] not evolved enough. But since [then] … [the technology has improved enough so we’ve] been able to work with lower and lower inputs of DNA and RNA,” said Jyoti Shetty, the Sequencing Facility’s Illumina laboratory manager.
Indeed, next-generation sequencing technology and modified protocols have enabled the Sequencing Facility to get meaningful information from samples that, in the past, would not have generated any information.
“Over the last few years, sequencing has really kind of exploded in the scientific community,” said Monika Mehta, Ph.D., the Sequencing Facility’s research and development (R&D) manager. She added that “the big advantage [of using FFPE samples for research] is the number of available samples. … The more patients you look at, the [higher] the probability that you will identify the molecular mechanisms behind different cancers to identify new therapeutic targets or identify markers for quick diagnosis.”
Innovating into the future
The opportunity to beta-test a new kit and try something that hadn’t been done before at this scale has allowed the Sequencing Facility team to remain on the cutting edge of cancer research.
“We thought it was an exciting opportunity … [for] many other investigators to expand their research. … This has opened the door to many more projects for our group,” said Shetty.
And, since taking on this project and publishing their methodology paper, working with FFPE samples has become more routine. The group estimates having processed hundreds of FFPE samples for DNA or RNA for a dozen or more projects so far.
But they haven’t stopped there. The group continues to improve sequencing and analysis methods and test new methods through the R&D team. In the future, Mehta hopes they can offer a method using Cas9, part of the CRISPR gene-editing tool, to target specific sequences.
“We are trying to optimize that method for the platforms that we have available in the Sequencing Facility for long-read sequencing. It’s an ongoing project, but we are very hopeful … especially in cancer, you have a lot of gene rearrangements, and it’s not easy to detect them because it is expensive,” said Mehta. Targeting one specific part of the sequence could save investigators money that could be spent sequencing more samples or used for a different project.
Finding ways to support investigators is part of the group’s mission to enrich cancer research, so using the latest technologies and finding new applications for their capabilities is built into their workflow.
“We never stop. We always … continue to evolve,” said Zhao.