Seq2Image:使用可视化和深度卷积神经网络的序列分析

Neda Tavakoli
{"title":"Seq2Image:使用可视化和深度卷积神经网络的序列分析","authors":"Neda Tavakoli","doi":"10.1109/COMPSAC48688.2020.00-71","DOIUrl":null,"url":null,"abstract":"Sequence classification has been widely used in numerous application domains. There exists a good number of classification algorithms that can be applied to feature vectors. However, these classification algorithms cannot be directly applied to the sequence classification problem, mainly because of the difficulties to capture feature vectors from sequences. More specifically, due to the sequential nature of features that exist in a sequence, the clustering problem in sequences suffers from the curse of dimensionality, which makes the sequence classification task more challenging compared to a typical classification on feature vectors. In this paper, we present a novel idea of transforming sequences to images, called Seq2Image, a simple yet effective method to perform genomic sequence classification using Convolutional Neural Network (CNN). We first convert a given genomic sequence to a tensor, and then the obtained tensor is transformed into an image. We then employ the CNN deep learning-based image processing techniques to classify the created images of sequences. The results of our preliminary experimental study are very promising achieving 95.78% training accuracy, 95.76% validation accuracy, and 95.83% testing accuracy for classification of human genome of 166 samples with six different sequence families.","PeriodicalId":430098,"journal":{"name":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Seq2Image: Sequence Analysis using Visualization and Deep Convolutional Neural Network\",\"authors\":\"Neda Tavakoli\",\"doi\":\"10.1109/COMPSAC48688.2020.00-71\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sequence classification has been widely used in numerous application domains. There exists a good number of classification algorithms that can be applied to feature vectors. However, these classification algorithms cannot be directly applied to the sequence classification problem, mainly because of the difficulties to capture feature vectors from sequences. More specifically, due to the sequential nature of features that exist in a sequence, the clustering problem in sequences suffers from the curse of dimensionality, which makes the sequence classification task more challenging compared to a typical classification on feature vectors. In this paper, we present a novel idea of transforming sequences to images, called Seq2Image, a simple yet effective method to perform genomic sequence classification using Convolutional Neural Network (CNN). We first convert a given genomic sequence to a tensor, and then the obtained tensor is transformed into an image. We then employ the CNN deep learning-based image processing techniques to classify the created images of sequences. The results of our preliminary experimental study are very promising achieving 95.78% training accuracy, 95.76% validation accuracy, and 95.83% testing accuracy for classification of human genome of 166 samples with six different sequence families.\",\"PeriodicalId\":430098,\"journal\":{\"name\":\"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMPSAC48688.2020.00-71\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC48688.2020.00-71","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

摘要

序列分类在许多应用领域得到了广泛的应用。存在许多可以应用于特征向量的分类算法。然而,这些分类算法不能直接应用于序列分类问题,主要原因是难以从序列中捕获特征向量。更具体地说,由于序列中存在的特征的顺序性,序列中的聚类问题受到维数诅咒的影响,这使得序列分类任务比典型的基于特征向量的分类更具挑战性。在本文中,我们提出了一种将序列转换为图像的新想法,称为Seq2Image,这是一种使用卷积神经网络(CNN)进行基因组序列分类的简单而有效的方法。首先将给定的基因组序列转换为张量,然后将得到的张量转换为图像。然后,我们使用基于CNN深度学习的图像处理技术对创建的序列图像进行分类。我们的初步实验研究结果非常有希望,对6个不同序列家族的166个样本进行人类基因组分类,训练准确率为95.78%,验证准确率为95.76%,测试准确率为95.83%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Seq2Image: Sequence Analysis using Visualization and Deep Convolutional Neural Network
Sequence classification has been widely used in numerous application domains. There exists a good number of classification algorithms that can be applied to feature vectors. However, these classification algorithms cannot be directly applied to the sequence classification problem, mainly because of the difficulties to capture feature vectors from sequences. More specifically, due to the sequential nature of features that exist in a sequence, the clustering problem in sequences suffers from the curse of dimensionality, which makes the sequence classification task more challenging compared to a typical classification on feature vectors. In this paper, we present a novel idea of transforming sequences to images, called Seq2Image, a simple yet effective method to perform genomic sequence classification using Convolutional Neural Network (CNN). We first convert a given genomic sequence to a tensor, and then the obtained tensor is transformed into an image. We then employ the CNN deep learning-based image processing techniques to classify the created images of sequences. The results of our preliminary experimental study are very promising achieving 95.78% training accuracy, 95.76% validation accuracy, and 95.83% testing accuracy for classification of human genome of 166 samples with six different sequence families.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信