Skip NavigationSkip to Content

Finding patterns in three-dimensional graphs: Algorithms and applications to scientific data mining

  1. Author:
    Wang, X.
    Wang, J. T. L.
    Shasha, D.
    Shapiro, B. A.
    Rigoutsos, I.
    Zhang, K. Z.
  2. Author Address

    Calif State Univ Fullerton, Dept Comp Sci, Fullerton, CA 92834 USA Calif State Univ Fullerton, Dept Comp Sci, Fullerton, CA 92834 USA New Jersey Inst Technol, Dept Comp & Informat Sci, Newark, NJ 07102 USA NYU, Courant Inst Math Sci, Dept Comp Sci, New York, NY 10012 USA Natl Canc Inst, Lab Expt & Computat Biol, Div Basic Sci, Frederick, MD 21702 USA IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA Univ Western Ontario, Dept Comp Sci, London, ON N6A 5B7, Canada Wang X Calif State Univ Fullerton, Dept Comp Sci, Fullerton, CA 92834 USA
    1. Year: 2002
  1. Journal: Ieee Transactions on Knowledge and Data Engineering
    1. 14
    2. 4
    3. Pages: 731-749
  2. Type of Article: Article
  1. Abstract:

    This paper presents a method for finding patterns in 3D graphs. Each node in a graph is an undecomposable or atomic unit and has a label. Edges are links between the atomic units. Patterns are rigid substructures that may occur in a graph after allowing for an arbitrary number of whole-structure rotations and translations as well as a small number (specified by the user) of edit operations in the patterns or in the graph. (When a pattern appears in a graph only after the graph has been modified, we call that appearance "approximate occurrence.") The edit operations include relabeling a node, deleting a node and inserting a node. The proposed method is based on the geometric hashing technique, which hashes node-triplets of the graphs into a 3D table and compresses the label-triplets in the table. To demonstrate the utility of our algorithms, we discuss two applications of them in scientific data mining. First, we apply the method to locating frequently occurring motifs in two families of proteins pertaining to RNA-directed DNA Polymerase and Thymidylate Synthase and use the motifs to classify the proteins. Then, we apply the method to clustering chemical compounds pertaining to aromatic, bicyclicalkanes, and photosynthesis. Experimental results indicate the good performance of our algorithms and high recall and precision rates for both classification and clustering.

    See More

External Sources

  1. No sources found.

Library Notes

  1. No notes added.
NCI at Frederick

You are leaving a government website.

This external link provides additional information that is consistent with the intended purpose of this site. The government cannot attest to the accuracy of a non-federal site.

Linking to a non-federal site does not constitute an endorsement by this institution or any of its employees of the sponsors or the information and products presented on the site. You will be subject to the destination site's privacy policy when you follow the link.

ContinueCancel