Skip NavigationSkip to Content

A data mining approach to discover unusual folding regions in genome sequences

  1. Author:
    Le, S. Y.
    Liu, W. M.
    Maizel, J. V.
  2. Author Address

    NCI, Lab Expt & Computat Biol, Div Basic Sci, NIH, Bldg 469,Room 151, Frederick, MD 21702 USA NCI, Lab Expt & Computat Biol, Div Basic Sci, NIH, Frederick, MD 21702 USA Indiana Univ, Dept Comp & Informat Sci, Indianapolis, IN 46202 USA Le SY NCI, Lab Expt & Computat Biol, Div Basic Sci, NIH, Bldg 469,Room 151, Frederick, MD 21702 USA
    1. Year: 2002
  1. Journal: Knowledge-Based Systems
    1. 15
    2. 4, Special Issue
    3. Pages: 243-250
  2. Type of Article: Article
  1. Abstract:

    Numerous experiments and analyses of RNA structures have revealed that the local distinct structure closely correlates with the biological function. In this study, we present a data mining approach to discover such unusual folding regions (UFRs) in genome sequences. Our approach is a three-step procedure. During the first step, the quality of a local structure different from a random folding in a genomic sequence is evaluated by two z-scores, significance score (SIGSCR) and stability score (STBSCR) of the local segment. The two scores are computed by sliding a fixed window stepped a base along the sequence from the start to end position. Next, based on the non-central Student's t distribution theory we derive a linearly transformed non-central Student's t distribution (LTNSTD) to describe the distribution of SIGSCR and STBSCR computed in the sequence. In the third step, we extract these significant UFRs from the sequence whose SIGSCR and/or STBSCR are greater or less than a given threshold calculated from the derived LTNSTD. Our data mining approach is successfully applied to the complete genome of Mycoplasma genitalium (M. gen) and discovers these statistical extremes in the genome. By comparisons with the two scores computed from randomly shuffled sequences of the entire M. gen genome, our results demonstrate that the UFRs in the M. gen sequence are not selected by chance. These UFRs may imply an important structure role involved in their sequence information. (C) 2002 Elsevier Science B.V. All rights reserved.

    See More

External Sources

  1. No sources found.

Library Notes

  1. No notes added.
NCI at Frederick

You are leaving a government website.

This external link provides additional information that is consistent with the intended purpose of this site. The government cannot attest to the accuracy of a non-federal site.

Linking to a non-federal site does not constitute an endorsement by this institution or any of its employees of the sponsors or the information and products presented on the site. You will be subject to the destination site's privacy policy when you follow the link.

ContinueCancel