Skip NavigationSkip to Content

Learning curves for drug response prediction in cancer cell lines

  1. Author:
    Partin, Alexander
    Brettin, Thomas
    Evrard,Yvonne
    Zhu, Yitan
    Yoo, Hyunseung
    Xia, Fangfang
    Jiang, Songhao
    Clyde, Austin
    Shukla, Maulik
    Fonstein, Michael
    Doroshow, James H.
    Stevens, Rick L.
  2. Author Address

    Argonne Natl Lab, Div Data Sci & Learning, Lemont, IL 60439 USA.Univ Chicago, Consortium Adv Sci & Engn, Chicago, IL 60637 USA.Argonne Natl Lab, Comp Environm & Life Sci, Lemont, IL USA.Leidos Biomed Res Inc, Frederick Natl Lab Canc Res, Frederick, MD USA.Argonne Natl Lab, Biosci Div, Lemont, IL USA.NCI, Div Canc Therapeut & Diag, Bethesda, MD 20892 USA.Univ Chicago, Dept Comp Sci, Chicago, IL 60637 USA.
    1. Year: 2021
    2. Date: May 17
    3. Epub Date: 2021 05 17
  1. Journal: BMC bioinformatics
  2. BMC,
    1. 22
    2. 1
  3. Type of Article: Article
  4. Article Number: 252
  5. ISSN: 1471-2105
  1. Abstract:

    BackgroundMotivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data.MethodsWe utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models.ResultsThe curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics.ConclusionsA fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.

    See More

External Sources

  1. DOI: 10.1186/s12859-021-04163-y
  2. PMID: 34001007
  3. WOS: 000657747000002

Library Notes

  1. Fiscal Year: FY2020-2021
NCI at Frederick

You are leaving a government website.

This external link provides additional information that is consistent with the intended purpose of this site. The government cannot attest to the accuracy of a non-federal site.

Linking to a non-federal site does not constitute an endorsement by this institution or any of its employees of the sponsors or the information and products presented on the site. You will be subject to the destination site's privacy policy when you follow the link.

ContinueCancel