{"title":"序列分类中副作用机的最近邻训练","authors":"D. Ashlock, Andrew McEachern","doi":"10.1109/CIBCB.2010.5510426","DOIUrl":null,"url":null,"abstract":"Side effect machines operate by associating side effects with the states of a finite state machine. The use of side effect machines permits the researcher to leverage information stored in the state transition structure, making machines that might be identical as recognizers behave differently as classifiers. The side effect machines in this study associate a counter with each state so that the number of times each state is visited becomes a numerical feature associated with each state. The key to effective use of these numerical feature is to locate side effect machines for which the count vectors are good feature sets. In this study side effect machines are selected with an evolutionary algorithm. The Rand index of nearest neighbor classification of the count vectors serves as the fitness function for selecting side effect machines. A parameter study is performed on simple synthetic data and then side effect machines are trained to classify two sets of biological sequences. The first set comprises two categories of HLA sequences from the human major histocompatibility complex. The second are positive and negative examples of human endogenous retroviral sequences taken from the human genome. The retroviral sequences are challenging but good results are obtained. The HLA data is classified with complete accuracy.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Nearest neighbor training of side effect machines for sequence classification\",\"authors\":\"D. Ashlock, Andrew McEachern\",\"doi\":\"10.1109/CIBCB.2010.5510426\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Side effect machines operate by associating side effects with the states of a finite state machine. The use of side effect machines permits the researcher to leverage information stored in the state transition structure, making machines that might be identical as recognizers behave differently as classifiers. The side effect machines in this study associate a counter with each state so that the number of times each state is visited becomes a numerical feature associated with each state. The key to effective use of these numerical feature is to locate side effect machines for which the count vectors are good feature sets. In this study side effect machines are selected with an evolutionary algorithm. The Rand index of nearest neighbor classification of the count vectors serves as the fitness function for selecting side effect machines. A parameter study is performed on simple synthetic data and then side effect machines are trained to classify two sets of biological sequences. The first set comprises two categories of HLA sequences from the human major histocompatibility complex. The second are positive and negative examples of human endogenous retroviral sequences taken from the human genome. The retroviral sequences are challenging but good results are obtained. The HLA data is classified with complete accuracy.\",\"PeriodicalId\":340637,\"journal\":{\"name\":\"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIBCB.2010.5510426\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBCB.2010.5510426","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Nearest neighbor training of side effect machines for sequence classification
Side effect machines operate by associating side effects with the states of a finite state machine. The use of side effect machines permits the researcher to leverage information stored in the state transition structure, making machines that might be identical as recognizers behave differently as classifiers. The side effect machines in this study associate a counter with each state so that the number of times each state is visited becomes a numerical feature associated with each state. The key to effective use of these numerical feature is to locate side effect machines for which the count vectors are good feature sets. In this study side effect machines are selected with an evolutionary algorithm. The Rand index of nearest neighbor classification of the count vectors serves as the fitness function for selecting side effect machines. A parameter study is performed on simple synthetic data and then side effect machines are trained to classify two sets of biological sequences. The first set comprises two categories of HLA sequences from the human major histocompatibility complex. The second are positive and negative examples of human endogenous retroviral sequences taken from the human genome. The retroviral sequences are challenging but good results are obtained. The HLA data is classified with complete accuracy.