T. Johnsten, Laura Fain, Leanna Fain, Ryan G. Benton, Ethan Butler, L. Pannell, Ming Tan
{"title":"利用多层向量空间进行信号肽检测","authors":"T. Johnsten, Laura Fain, Leanna Fain, Ryan G. Benton, Ethan Butler, L. Pannell, Ming Tan","doi":"10.1504/IJDMB.2015.071544","DOIUrl":null,"url":null,"abstract":"Analysing and classifying sequences based on similarities and differences is a mathematical problem of escalating relevance and importance in many scientific disciplines. One of the primary challenges in applying machine learning algorithms to sequential data, such as biological sequences, is the extraction and representation of significant features from the data. To address this problem, we have recently developed a representation, entitled Multi-Layered Vector Spaces (MLVS), which is a simple mathematical model that maps sequences into a set of MLVS. We demonstrate the usefulness of the model by applying it to the problem of identifying signal peptides. MLVS feature vectors are generated from a collection of protein sequences and the resulting vectors are used to create support vector machine classifiers. Experiments show that the MLVS-based classifiers are able to outperform or perform on par with several existing methods that are specifically designed for the purpose of identifying signal peptides.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.071544","citationCount":"3","resultStr":"{\"title\":\"Exploiting multi-layered vector spaces for signal peptide detection\",\"authors\":\"T. Johnsten, Laura Fain, Leanna Fain, Ryan G. Benton, Ethan Butler, L. Pannell, Ming Tan\",\"doi\":\"10.1504/IJDMB.2015.071544\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analysing and classifying sequences based on similarities and differences is a mathematical problem of escalating relevance and importance in many scientific disciplines. One of the primary challenges in applying machine learning algorithms to sequential data, such as biological sequences, is the extraction and representation of significant features from the data. To address this problem, we have recently developed a representation, entitled Multi-Layered Vector Spaces (MLVS), which is a simple mathematical model that maps sequences into a set of MLVS. We demonstrate the usefulness of the model by applying it to the problem of identifying signal peptides. MLVS feature vectors are generated from a collection of protein sequences and the resulting vectors are used to create support vector machine classifiers. Experiments show that the MLVS-based classifiers are able to outperform or perform on par with several existing methods that are specifically designed for the purpose of identifying signal peptides.\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0,\"publicationDate\":\"2015-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1504/IJDMB.2015.071544\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1504/IJDMB.2015.071544\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1504/IJDMB.2015.071544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploiting multi-layered vector spaces for signal peptide detection
Analysing and classifying sequences based on similarities and differences is a mathematical problem of escalating relevance and importance in many scientific disciplines. One of the primary challenges in applying machine learning algorithms to sequential data, such as biological sequences, is the extraction and representation of significant features from the data. To address this problem, we have recently developed a representation, entitled Multi-Layered Vector Spaces (MLVS), which is a simple mathematical model that maps sequences into a set of MLVS. We demonstrate the usefulness of the model by applying it to the problem of identifying signal peptides. MLVS feature vectors are generated from a collection of protein sequences and the resulting vectors are used to create support vector machine classifiers. Experiments show that the MLVS-based classifiers are able to outperform or perform on par with several existing methods that are specifically designed for the purpose of identifying signal peptides.