{"title":"Identification and Prediction of Intrinsically Disordered Regions in Proteins Using n-grams","authors":"Mauricio Oberti, I. Vaisman","doi":"10.1145/3107411.3107480","DOIUrl":null,"url":null,"abstract":"Intrinsically disordered proteins (IDPs) play an important role in many biological processes and are closely related to human diseases. They also have the potential to serve as targets for drug discovery, especially in disordered binding regions. Accurate prediction of IDPs is challenging, most methods rely on sequence profiles to improve accuracy making them computationally expensive. This paper describes a method based on n-gram frequencies using reduced amino acid alphabets, which tries to overcome this challenge by utilizing only sequence information. Our results show that the described IDP prediction approach performs at the same level as some of the other state of the art ab initio methods. However, the simplicity of n-grams allows to construct decision trees which can provide important insights into common patterns and properties associated with disordered regions.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"1997 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3107411.3107480","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Intrinsically disordered proteins (IDPs) play an important role in many biological processes and are closely related to human diseases. They also have the potential to serve as targets for drug discovery, especially in disordered binding regions. Accurate prediction of IDPs is challenging, most methods rely on sequence profiles to improve accuracy making them computationally expensive. This paper describes a method based on n-gram frequencies using reduced amino acid alphabets, which tries to overcome this challenge by utilizing only sequence information. Our results show that the described IDP prediction approach performs at the same level as some of the other state of the art ab initio methods. However, the simplicity of n-grams allows to construct decision trees which can provide important insights into common patterns and properties associated with disordered regions.