{"title":"基于理化性质和蛋白质粒度的蛋白质功能预测","authors":"Wanlu Wang, Xin Zhang, Jun Meng, Yushi Luan","doi":"10.1109/GrC.2013.6740433","DOIUrl":null,"url":null,"abstract":"Assigning biological function to uncharacterized proteins is a fundamental problem in the post-genomic age. The increasing availability of large amounts of data on protein sequences has led to the emergence of developing effective computational methods for quickly and accurately predicting their functions. In this work, we extract 353 numerical features from sequences based not only on physiochemical properties but also on protein granularity. A tool in exploratory data analysis, Principal Component Analysis (PCA), is applied to obtain an optimized feature set by excluding poor-performed or redundant features, resulting in 204 remaining features. Then the optimized 204-feature subset is used to predict protein function with k-nearest neighbors algorithm (KNN). This prediction model achieves an overall accurate prediction rate of 84.6%. The experiment results show that our approach is quite efficient to predict functional class of unknown proteins.","PeriodicalId":415445,"journal":{"name":"2013 IEEE International Conference on Granular Computing (GrC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Protein function prediction based on physiochemical properties and protein granularity\",\"authors\":\"Wanlu Wang, Xin Zhang, Jun Meng, Yushi Luan\",\"doi\":\"10.1109/GrC.2013.6740433\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Assigning biological function to uncharacterized proteins is a fundamental problem in the post-genomic age. The increasing availability of large amounts of data on protein sequences has led to the emergence of developing effective computational methods for quickly and accurately predicting their functions. In this work, we extract 353 numerical features from sequences based not only on physiochemical properties but also on protein granularity. A tool in exploratory data analysis, Principal Component Analysis (PCA), is applied to obtain an optimized feature set by excluding poor-performed or redundant features, resulting in 204 remaining features. Then the optimized 204-feature subset is used to predict protein function with k-nearest neighbors algorithm (KNN). This prediction model achieves an overall accurate prediction rate of 84.6%. The experiment results show that our approach is quite efficient to predict functional class of unknown proteins.\",\"PeriodicalId\":415445,\"journal\":{\"name\":\"2013 IEEE International Conference on Granular Computing (GrC)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Conference on Granular Computing (GrC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GrC.2013.6740433\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Granular Computing (GrC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GrC.2013.6740433","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Protein function prediction based on physiochemical properties and protein granularity
Assigning biological function to uncharacterized proteins is a fundamental problem in the post-genomic age. The increasing availability of large amounts of data on protein sequences has led to the emergence of developing effective computational methods for quickly and accurately predicting their functions. In this work, we extract 353 numerical features from sequences based not only on physiochemical properties but also on protein granularity. A tool in exploratory data analysis, Principal Component Analysis (PCA), is applied to obtain an optimized feature set by excluding poor-performed or redundant features, resulting in 204 remaining features. Then the optimized 204-feature subset is used to predict protein function with k-nearest neighbors algorithm (KNN). This prediction model achieves an overall accurate prediction rate of 84.6%. The experiment results show that our approach is quite efficient to predict functional class of unknown proteins.