{"title":"一种新的蛋白质分类在线层次特征提取算法","authors":"M. Kchouk, F. Mhamdi","doi":"10.1109/DEXA.2014.20","DOIUrl":null,"url":null,"abstract":"Feature extraction from biological data is a very important discipline in bioinformatics. The aim of this work is to classify protein sequences automatically. To do this, it seemed appropriate to use a data mining process: the process of Knowledge Discovery and Data mining (KDD) from biological data. We are interested in the first phase of the KDD, that consists in the preprocessing and we focus on the step: Feature extraction. Feature extraction is translated by the generation of a set of feature that is presented to a supervised learning algorithm for classification. An extraction method that we have adopted is the method of N-grams. The algorithm of n-grams consists in extracting feature of fixed size of length n. In this paper, we propose a hierarchical algorithm of construction of n-grams to obtain feature of variable sizes. This algorithm of extraction is used to meet the needs of biologists. By using the linear classifier SVM, the experiments on real protein banks show the efficiency of our algorithm while presenting a comparison of our work to previous works.","PeriodicalId":291899,"journal":{"name":"2014 25th International Workshop on Database and Expert Systems Applications","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"New Online Hierarchical Feature Extraction Algorithm for Classification of Protein\",\"authors\":\"M. Kchouk, F. Mhamdi\",\"doi\":\"10.1109/DEXA.2014.20\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature extraction from biological data is a very important discipline in bioinformatics. The aim of this work is to classify protein sequences automatically. To do this, it seemed appropriate to use a data mining process: the process of Knowledge Discovery and Data mining (KDD) from biological data. We are interested in the first phase of the KDD, that consists in the preprocessing and we focus on the step: Feature extraction. Feature extraction is translated by the generation of a set of feature that is presented to a supervised learning algorithm for classification. An extraction method that we have adopted is the method of N-grams. The algorithm of n-grams consists in extracting feature of fixed size of length n. In this paper, we propose a hierarchical algorithm of construction of n-grams to obtain feature of variable sizes. This algorithm of extraction is used to meet the needs of biologists. By using the linear classifier SVM, the experiments on real protein banks show the efficiency of our algorithm while presenting a comparison of our work to previous works.\",\"PeriodicalId\":291899,\"journal\":{\"name\":\"2014 25th International Workshop on Database and Expert Systems Applications\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 25th International Workshop on Database and Expert Systems Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEXA.2014.20\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 25th International Workshop on Database and Expert Systems Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEXA.2014.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
New Online Hierarchical Feature Extraction Algorithm for Classification of Protein
Feature extraction from biological data is a very important discipline in bioinformatics. The aim of this work is to classify protein sequences automatically. To do this, it seemed appropriate to use a data mining process: the process of Knowledge Discovery and Data mining (KDD) from biological data. We are interested in the first phase of the KDD, that consists in the preprocessing and we focus on the step: Feature extraction. Feature extraction is translated by the generation of a set of feature that is presented to a supervised learning algorithm for classification. An extraction method that we have adopted is the method of N-grams. The algorithm of n-grams consists in extracting feature of fixed size of length n. In this paper, we propose a hierarchical algorithm of construction of n-grams to obtain feature of variable sizes. This algorithm of extraction is used to meet the needs of biologists. By using the linear classifier SVM, the experiments on real protein banks show the efficiency of our algorithm while presenting a comparison of our work to previous works.