Skip NavigationSkip to Content

Improved detection of low-frequency within-host variants from deep sequencing: A case study with human papillomavirus

  1. Author:
    Mishra,Sambit [ORCID]
    Nelson, Chase W [ORCID]
    Zhu, Bin
    Pinheiro, Maisa
    Lee,Hyo Jung
    Dean, Michael [ORCID]
    Burdett,Laurie
    Yeager,Meredith
    Mirabello, Lisa
  2. Author Address

    Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, 9609 Medical Center Drive, Rockville, MD 20850, USA., Cancer Genomics Research Laboratory, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, P.O. Box B, Bldg. 430, Frederick, MD 21702, USA.,
    1. Year: 2024
    2. Epub Date: 2024 02 07
  1. Journal: Virus Evolution
    1. 10
    2. 1
    3. Pages: veae013
  2. Type of Article: Article
  3. Article Number: veae013
  1. Abstract:

    High-coverage sequencing allows the study of variants occurring at low frequencies within samples, but is susceptible to false-positives caused by sequencing error. Ion Torrent has a very low single nucleotide variant (SNV) error rate and has been employed for the majority of human papillomavirus (HPV) whole genome sequences. However, benchmarking of intrahost SNVs (iSNVs) has been challenging, partly due to limitations imposed by the HPV life cycle. We address this problem by deep sequencing three replicates for each of 31 samples of HPV type 18 (HPV18). Errors, defined as iSNVs observed in only one of three replicates, are dominated by C?T (G?A) changes, independently of trinucleotide context. True iSNVs, defined as those observed in all three replicates, instead show a more diverse SNV type distribution, with particularly elevated C?T rates in CCG context (CCG?CTG; CGG?CAG) and C?A rates in ACG context (ACG?AAG; CGT?CTT). Characterization of true iSNVs allowed us to develop two methods for detecting true variants: (1) VCFgenie, a dynamic binomial filtering tool which uses each variant's allele count and coverage instead of fixed frequency cut-offs; and (2) a machine learning binary classifier which trains eXtreme Gradient Boosting models on variant features such as quality and trinucleotide context. Each approach outperforms fixed-cut-off filtering of iSNVs, and performance is enhanced when both are used together. Our results provide improved methods for identifying true iSNVs in within-host applications across sequencing platforms, specifically using HPV18 as a case study.

    See More

External Sources

  1. DOI: 10.1093/ve/veae013
  2. PMID: 38455683
  3. PMCID: PMC10919477
  4. PII : veae013

Library Notes

  1. Fiscal Year: FY2023-2024
NCI at Frederick

You are leaving a government website.

This external link provides additional information that is consistent with the intended purpose of this site. The government cannot attest to the accuracy of a non-federal site.

Linking to a non-federal site does not constitute an endorsement by this institution or any of its employees of the sponsors or the information and products presented on the site. You will be subject to the destination site's privacy policy when you follow the link.

ContinueCancel