{"title":"Efficient discovery of common substructures in macromolecules","authors":"S. Parthasarathy, M. Coatney","doi":"10.1109/ICDM.2002.1183924","DOIUrl":null,"url":null,"abstract":"Biological macromolecules play a fundamental role in disease; therefore, they are of great interest to fields such as pharmacology and chemical genomics. Yet due to macromolecules' complexity, development of effective techniques for elucidating structure-function macromolecular relationships has been ill explored. Previous techniques have either focused on sequence analysis, which only approximates structure-function relationships, or on small coordinate datasets, which does not scale to large datasets or handle noise. We present a novel scalable approach to efficiently discover macromolecule substructures based on three-dimensional coordinate data, without domain-specific knowledge. The approach combines structure-based frequent pattern discovery with search space reduction and coordinate noise handling. We analyze computational performance compared to traditional approaches, validate that our approach can discover meaningful substructures in noisy macromolecule data by automated discovery of primary and secondary protein structures, and show that our technique is superior to sequence-based approaches at determining structural, and thus functional, similarity between proteins.","PeriodicalId":405340,"journal":{"name":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2002 IEEE International Conference on Data Mining, 2002. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2002.1183924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 39
Abstract
Biological macromolecules play a fundamental role in disease; therefore, they are of great interest to fields such as pharmacology and chemical genomics. Yet due to macromolecules' complexity, development of effective techniques for elucidating structure-function macromolecular relationships has been ill explored. Previous techniques have either focused on sequence analysis, which only approximates structure-function relationships, or on small coordinate datasets, which does not scale to large datasets or handle noise. We present a novel scalable approach to efficiently discover macromolecule substructures based on three-dimensional coordinate data, without domain-specific knowledge. The approach combines structure-based frequent pattern discovery with search space reduction and coordinate noise handling. We analyze computational performance compared to traditional approaches, validate that our approach can discover meaningful substructures in noisy macromolecule data by automated discovery of primary and secondary protein structures, and show that our technique is superior to sequence-based approaches at determining structural, and thus functional, similarity between proteins.