{"title":"利用新型串核预测蛋白质功能位点","authors":"C. Das, P. Maji","doi":"10.1109/ICIT.2008.11","DOIUrl":null,"url":null,"abstract":"In most pattern recognition algorithms, amino acids cannot be used directly as inputs since they are nonnumerical variables. They, therefore, need encoding prior to input. In this regard, a novel string kernel is introduced, which maps a nonnumerical sequence space to a numerical feature space.The proposed string kernel is developed based on the conventional bio-basis function and termed as novel bio-basis function. The novel bio-basis function is designed based on the principle of asymmetricity of biological distance, which is calculated using an amino acid mutation matrix. The concept of zone of influence of bio-basis is introduced in the proposed string kernel to normalize the asymmetric distance. An efficient method to select bio-bases for the novel string kernel is described integrating the concepts of the Fisher ratio and degree of resemblance. The effectiveness of the proposed string kernel and bio-bases selection method, along with a comparison with existing kernel and related selection methods, is demonstrated on different protein data sets.","PeriodicalId":184201,"journal":{"name":"2008 International Conference on Information Technology","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prediction of Protein Functional Sites Using Novel String Kernels\",\"authors\":\"C. Das, P. Maji\",\"doi\":\"10.1109/ICIT.2008.11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In most pattern recognition algorithms, amino acids cannot be used directly as inputs since they are nonnumerical variables. They, therefore, need encoding prior to input. In this regard, a novel string kernel is introduced, which maps a nonnumerical sequence space to a numerical feature space.The proposed string kernel is developed based on the conventional bio-basis function and termed as novel bio-basis function. The novel bio-basis function is designed based on the principle of asymmetricity of biological distance, which is calculated using an amino acid mutation matrix. The concept of zone of influence of bio-basis is introduced in the proposed string kernel to normalize the asymmetric distance. An efficient method to select bio-bases for the novel string kernel is described integrating the concepts of the Fisher ratio and degree of resemblance. The effectiveness of the proposed string kernel and bio-bases selection method, along with a comparison with existing kernel and related selection methods, is demonstrated on different protein data sets.\",\"PeriodicalId\":184201,\"journal\":{\"name\":\"2008 International Conference on Information Technology\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 International Conference on Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIT.2008.11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Conference on Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIT.2008.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Prediction of Protein Functional Sites Using Novel String Kernels
In most pattern recognition algorithms, amino acids cannot be used directly as inputs since they are nonnumerical variables. They, therefore, need encoding prior to input. In this regard, a novel string kernel is introduced, which maps a nonnumerical sequence space to a numerical feature space.The proposed string kernel is developed based on the conventional bio-basis function and termed as novel bio-basis function. The novel bio-basis function is designed based on the principle of asymmetricity of biological distance, which is calculated using an amino acid mutation matrix. The concept of zone of influence of bio-basis is introduced in the proposed string kernel to normalize the asymmetric distance. An efficient method to select bio-bases for the novel string kernel is described integrating the concepts of the Fisher ratio and degree of resemblance. The effectiveness of the proposed string kernel and bio-bases selection method, along with a comparison with existing kernel and related selection methods, is demonstrated on different protein data sets.