Genomic Data Commons 2.0: A Valuable Tool for Cancer Researchers

Infographic detailing the Genome Characterization Pipeline, starting with the collection of tissue at the Tissue Source Sites and ending with the data being added to the Genomic Data Commons and made available to the research community. Image credit: National Cancer Institute

Imagine, for a moment, that you’re a scientist studying ways to combat a rare form of cancer that overwhelmingly manifests in a specific group of people. You suspect a series of genetic mutations. Testing for the presence of those alterations one at a time would involve a prohibitive amount of time and money. What you need is a database of thousands of cancer cases, characterized for genetic data, to which you could compare your cases.

Enter NCI’s Genomic Data Commons.

Establishing the Genomic Data Commons

The National Cancer Institute’s Genomic Data Commons (GDC) exemplifies how the Frederick National Laboratory for Cancer Research (FNL) and university partners are working in concert to bring cutting-edge research and tools to the greater cancer research community. 

The GDC follows on the work of NCI’s The Cancer Genome Atlas (TCGA), a landmark cancer genomics program launched in 2006 as a collaboration between NCI and The National Human Genome Research Institute. TCGA molecularly characterized more than 20,000 primary cancer and matched normal samples spanning 33 cancer types.

Even as TCGA accomplished its goals and began to wind down, NCI’s Center for Cancer Genomics began work on a new, larger follow-on initiative that would include and expand on TCGA data. NCI tasked FNL with establishing a platform to collect and characterize genetic and genomic tumor data that scientists could use to better understand cancer genomics. A 2016 paper in the New England Journal of Medicine, Toward a Shared Vision for Cancer Genomic Data,” heralded the new program’s launch.

“An unusual and powerful feature of the GDC,” the paper’s authors said, “is that all researchers will be welcome to submit their cancer genomics data and use the system’s computational pipelines, as long as they agree to share their data broadly.”

In its coordinating role, FNL established a subcontract with the University of Chicago to develop and operate the GDC, launching a collaboration still thriving 10 years later, and containing one of the largest data commons for cancer research at NIH.

The GDC is a part of the Cancer Moonshot, established in 2016 under the Obama administration. Then Vice-President Joe Biden attended the GDC public launch in Chicago in 2016. Since then, the GDC has grown to house data on major cancers from nearly 100 projects, spanning 69 primary cancer sites and about 50,000 cases. The NCI GDC genomic cancer tumor data resource is now used by more than 90,000 researchers internationally per month to molecularly characterize cancer types and match samples to make cancer diagnoses more precise.

GDC 2.0: An Updated, More User-Friendly Tool

Thanks to the launch of GDC 2.0, an app-based portal, in June 2024, researchers can comb through and analyze data from anywhere, directly in the GDC portal. Researchers can easily use the portal without in-depth knowledge of the data or specialized analysis tools. This is an improvement from the original GDC, where users needed an extensive knowledge base to find and download data for analysis offline. Now, GDC 2.0 has monthly data and software releases that expand and improve the data available.

The NCI video highlights the benefits of using GDC 2.0 to help analyze massive data sets directly online. 

GDC 2.0 has already received ample encouraging feedback from the user community, validating the positive impact this resource has had on cancer research.

One GDC user said it’s “great to have this absolutely crucial measurement information (days from study to test) available now in the public portal!” 

Another user stated, “We have been successfully using TCGA data for both generation and validation of our hypotheses about the molecular mechanisms of AML [acute myeloid leukemia]. We very much appreciate this beautiful resource!” 

The Process of Making Data Available

The GDC is just one part of a larger effort to bring cancer tumor genetic and genomic data to the cancer research community. The data is separated from any identifying information before being made available to the research community (see image). 

It was novel and helpful from the start, but now especially with its newest upgrade, GDC has become an invaluable tool to cancer researchers and has clearly played a role in moving cancer research forward. 

 

Melissa Porter is the Director of Operations for the Office of Cancer Genomics (OCG) within NCI’s Division of Cancer Biology (DCB). Prior to her position with DCB OCG, she was the administrative lead for the Frederick Office of Scientific Operations (FOSO), National Cancer Institute at Frederick, the office which oversees operations of the Frederick National Laboratory for Cancer Research. She formerly served as executive editor of the Poster newsletter.