Skip NavigationSkip to Content

Tautomerism in large databases

  1. Author:
    Sitzmann, M.
    Ihlenfeldt, W. D.
    Nicklaus, M. C.
  2. Author Address

    [Sitzmann, Markus; Nicklaus, Marc C.] NCI, Biol Chem Lab, Ctr Canc Res, NIH,DHHS, Frederick, MD 21702 USA. [Ihlenfeldt, Wolf-Dietrich] Xemistry GmbH, D-61462 Konigstein, Germany.;Nicklaus, MC, NCI, Biol Chem Lab, Ctr Canc Res, NIH,DHHS, 376 Boyles St, Frederick, MD 21702 USA.;mn1@helix.nih.gov
    1. Year: 2010
    2. Date: Jun
  1. Journal: Journal of Computer-Aided Molecular Design
    1. 24
    2. 6-7
    3. Pages: 521-551
  2. Type of Article: Article
  3. ISSN: 0920-654X
  1. Abstract:

    We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compounds. This analysis was carried out using calculable chemical structure identifiers developed by the NCI CADD Group, based on hash codes available in the chemoinformatics toolkit CACTVS and a newly developed scoring scheme to define a canonical tautomer for any encountered structure. CACTVS's tautomerism definition, a set of 21 transform rules expressed in SMIRKS line notation, was used, which takes a comprehensive stance as to the possible types of tautomeric interconversion included. Tautomerism was found to be possible for more than 2/3 of the unique structures in the CSDB. A total of 680 million tautomers were calculated from, and including, the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Projected onto the set of unique structures (by FICuS identifier), this still occurred in about 1.5% of the cases. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection.

    See More

External Sources

  1. DOI: 10.1007/s10822-010-9346-4
  2. WOS: 000278894800005

Library Notes

  1. Fiscal Year: FY2009-2010
NCI at Frederick

You are leaving a government website.

This external link provides additional information that is consistent with the intended purpose of this site. The government cannot attest to the accuracy of a non-federal site.

Linking to a non-federal site does not constitute an endorsement by this institution or any of its employees of the sponsors or the information and products presented on the site. You will be subject to the destination site's privacy policy when you follow the link.

ContinueCancel