Skip NavigationSkip to Content

Comparison of the NCI open database with seven large chemical structural databases

  1. Author:
    Voigt, J. H.
    Bienfait, B.
    Wang, S. M.
    Nicklaus, M. C.
  2. Author Address

    NCI, Med Chem Lab, Canc Res Ctr, NIH, 376 Boyles St, Frederick, MD 21702 USA. NCI, Med Chem Lab, Canc Res Ctr, NIH, Frederick, MD 21702 USA. Georgetown Univ, Med Ctr, Struct Biol & Canc Drug Discovery Program, Lombardi Canc Ctr, Washington, DC 20007 USA. Georgetown Univ, Med Ctr, Dept Oncol, Washington, DC 20007 USA. Georgetown Univ, Med Ctr, Dept Neurosci, Washington, DC 20007 USA.
    1. Year: 2001
  1. Journal: Journal of Chemical Information and Computer Sciences
    1. 41
    2. 3
    3. Pages: 702-712
  2. Type of Article: Article
  1. Abstract:

    Eight large chemical databases have been analyzed and compared to each other. Central to this comparison is the open National Cancer Institute (NCI) database, consisting of approximately 250 000 structures. The other databases analyzed are the Available Chemicals Directory ("ACD," from MDL, release 1.99, 3D-version); the ChemACX ("ACX," from CamSoft, Version 4.5); the Maybridge Catalog and the Asinex database (both as distributed by CamSoft as part of ChemInfo 4.5); the Sigma- Aldrich Catalog (CD-ROM, 1999 Version); the World Drug Index ("WDI," Derwent, version 1999.03): and the organic part of the Cambridge Crystallographic Database ("CSD," from Cambridge Crystallographic Data Center, 1999 Version 5.18). The database properties analyzed are internal duplication rates; compounds unique to each database; cumulative occurrence of compounds in an increasing number of databases, overlap of identical compounds between two databases: similarity overlap: diversity; and others. The crystallographic database CSD and the WDI show somewhat less overlap with the other databases than those with each other. In particular the collections of commercial compounds and compilations of vendor catalogs have a substantial degree of overlap among each other. Still, no database is completely a subset of any other, and each appears to have its own niche and thus "raison d'etre". The NCI database has by far the highest number of compounds that are unique to it. Approximately 200 000 of the NCI structures were not found in any of the other analyzed databases.

    See More

External Sources

  1. No sources found.

Library Notes

  1. No notes added.
NCI at Frederick

You are leaving a government website.

This external link provides additional information that is consistent with the intended purpose of this site. The government cannot attest to the accuracy of a non-federal site.

Linking to a non-federal site does not constitute an endorsement by this institution or any of its employees of the sponsors or the information and products presented on the site. You will be subject to the destination site's privacy policy when you follow the link.

ContinueCancel