Daniil O. Komarovskikh, Vladislav L. Litvinov, I. A. Kiselev, Artur M. Paniukov, N. I. Trofimov
{"title":"核苷酸序列向量表示的递归神经网络研究","authors":"Daniil O. Komarovskikh, Vladislav L. Litvinov, I. A. Kiselev, Artur M. Paniukov, N. I. Trofimov","doi":"10.1109/SCM50615.2020.9198757","DOIUrl":null,"url":null,"abstract":"Embeddings can be a good way to work with gene sequences, helping to avoid the high computational complexity problems that are often found in multiple sequence alignment. In this paper, we investigate the possibilities of the LSTM network for constructing embeddings of nucleotide sequences. To do this, we trained model on a dataset of 1.5 million pairs of gene sequences belonging to the Escherichia coli species.","PeriodicalId":169458,"journal":{"name":"2020 XXIII International Conference on Soft Computing and Measurements (SCM)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research of a Recurrent Neural Network for the Vector Representation of Nucleotide Sequences\",\"authors\":\"Daniil O. Komarovskikh, Vladislav L. Litvinov, I. A. Kiselev, Artur M. Paniukov, N. I. Trofimov\",\"doi\":\"10.1109/SCM50615.2020.9198757\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Embeddings can be a good way to work with gene sequences, helping to avoid the high computational complexity problems that are often found in multiple sequence alignment. In this paper, we investigate the possibilities of the LSTM network for constructing embeddings of nucleotide sequences. To do this, we trained model on a dataset of 1.5 million pairs of gene sequences belonging to the Escherichia coli species.\",\"PeriodicalId\":169458,\"journal\":{\"name\":\"2020 XXIII International Conference on Soft Computing and Measurements (SCM)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 XXIII International Conference on Soft Computing and Measurements (SCM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCM50615.2020.9198757\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 XXIII International Conference on Soft Computing and Measurements (SCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCM50615.2020.9198757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research of a Recurrent Neural Network for the Vector Representation of Nucleotide Sequences
Embeddings can be a good way to work with gene sequences, helping to avoid the high computational complexity problems that are often found in multiple sequence alignment. In this paper, we investigate the possibilities of the LSTM network for constructing embeddings of nucleotide sequences. To do this, we trained model on a dataset of 1.5 million pairs of gene sequences belonging to the Escherichia coli species.