Assessment and classification of COVID-19 DNA sequence using pairwise features concatenation from multi-transformer and deep features with machine learning models

IF 2.5 4区 医学 Q3 BIOCHEMICAL RESEARCH METHODS
Abdul Qayyum , Abdesslam Benzinou , Oumaima Saidani , Fatimah Alhayan , Muhammad Attique Khan , Anum Masood , Moona Mazher
{"title":"Assessment and classification of COVID-19 DNA sequence using pairwise features concatenation from multi-transformer and deep features with machine learning models","authors":"Abdul Qayyum ,&nbsp;Abdesslam Benzinou ,&nbsp;Oumaima Saidani ,&nbsp;Fatimah Alhayan ,&nbsp;Muhammad Attique Khan ,&nbsp;Anum Masood ,&nbsp;Moona Mazher","doi":"10.1016/j.slast.2024.100147","DOIUrl":null,"url":null,"abstract":"<div><p>The 2019 novel coronavirus (renamed SARS-CoV-2, and generally referred to as the COVID-19 virus) has spread to 184 countries with over 1.5 million confirmed cases. Such a major viral outbreak demands early elucidation of taxonomic classification and origin of the virus genomic sequence, for strategic planning, containment, and treatment. The emerging global infectious COVID-19 disease by novel Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) presents critical threats to global public health and the economy since it was identified in late December 2019 in China. The virus has gone through various pathways of evolution. Due to the continued evolution of the SARS-CoV-2 pandemic, researchers worldwide are working to mitigate, suppress its spread, and better understand it by deploying deep learning and machine learning approaches. In a general computational context for biomedical data analysis, DNA sequence classification is a crucial challenge. Several machine and deep learning techniques have been used in recent years to complete this task with some success. The classification of DNA sequences is a key research area in bioinformatics as it enables researchers to conduct genomic analysis and detect possible diseases. In this paper, three state-of-the-art deep learning-based models are proposed using two DNA sequence conversion methods. We also proposed a novel multi-transformer deep learning model and pairwise features fusion technique for DNA sequence classification. Furthermore, deep features are extracted from the last layer of the multi-transformer and used in machine-learning models for DNA sequence classification. The k-mer and one-hot encoding sequence conversion techniques have been presented. The proposed multi-transformer achieved the highest performance in COVID DNA sequence classification. Automatic identification and classification of viruses are essential to avoid an outbreak like COVID-19. It also helps in detecting the effect of viruses and drug design.</p></div>","PeriodicalId":54248,"journal":{"name":"SLAS Technology","volume":"29 4","pages":"Article 100147"},"PeriodicalIF":2.5000,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2472630324000293/pdfft?md5=0ab09ef81ec9244301c92fe5a93da5ad&pid=1-s2.0-S2472630324000293-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SLAS Technology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2472630324000293","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

The 2019 novel coronavirus (renamed SARS-CoV-2, and generally referred to as the COVID-19 virus) has spread to 184 countries with over 1.5 million confirmed cases. Such a major viral outbreak demands early elucidation of taxonomic classification and origin of the virus genomic sequence, for strategic planning, containment, and treatment. The emerging global infectious COVID-19 disease by novel Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) presents critical threats to global public health and the economy since it was identified in late December 2019 in China. The virus has gone through various pathways of evolution. Due to the continued evolution of the SARS-CoV-2 pandemic, researchers worldwide are working to mitigate, suppress its spread, and better understand it by deploying deep learning and machine learning approaches. In a general computational context for biomedical data analysis, DNA sequence classification is a crucial challenge. Several machine and deep learning techniques have been used in recent years to complete this task with some success. The classification of DNA sequences is a key research area in bioinformatics as it enables researchers to conduct genomic analysis and detect possible diseases. In this paper, three state-of-the-art deep learning-based models are proposed using two DNA sequence conversion methods. We also proposed a novel multi-transformer deep learning model and pairwise features fusion technique for DNA sequence classification. Furthermore, deep features are extracted from the last layer of the multi-transformer and used in machine-learning models for DNA sequence classification. The k-mer and one-hot encoding sequence conversion techniques have been presented. The proposed multi-transformer achieved the highest performance in COVID DNA sequence classification. Automatic identification and classification of viruses are essential to avoid an outbreak like COVID-19. It also helps in detecting the effect of viruses and drug design.

利用多变换器的成对特征串联和机器学习模型的深度特征对 COVID-19 DNA 序列进行评估和分类
2019 年新型冠状病毒(更名为 SARS-CoV-2,一般称为 COVID-19 病毒)已蔓延至 184 个国家,确诊病例超过 150 万例。如此大规模的病毒爆发要求尽早阐明分类学分类和病毒基因组序列的来源,以便制定战略计划、进行遏制和治疗。自 2019 年 12 月底在中国发现新型严重急性呼吸系统综合征冠状病毒 2(SARS-CoV-2)以来,新出现的全球性传染性 COVID-19 疾病对全球公共卫生和经济构成了严重威胁。该病毒经历了多种进化途径。由于 SARS-CoV-2 大流行的持续演变,世界各地的研究人员正致力于通过部署深度学习和机器学习方法来缓解、抑制其传播并更好地了解它。在生物医学数据分析的一般计算环境中,DNA 序列分类是一项关键挑战。近年来,一些机器学习和深度学习技术已被用于完成这项任务,并取得了一些成功。DNA 序列分类是生物信息学的一个关键研究领域,因为它能帮助研究人员进行基因组分析和检测可能的疾病。本文利用两种 DNA 序列转换方法,提出了三种最先进的基于深度学习的模型。我们还提出了一种新颖的多变换器深度学习模型和成对特征融合技术,用于 DNA 序列分类。此外,我们还从多变换器的最后一层提取了深度特征,并将其用于 DNA 序列分类的机器学习模型中。此外,还介绍了 k-mer 和单次编码序列转换技术。所提出的多重变换器在 COVID DNA 序列分类中取得了最高的性能。病毒的自动识别和分类对于避免类似 COVID-19 病毒的爆发至关重要。它还有助于检测病毒的影响和药物设计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
SLAS Technology
SLAS Technology Computer Science-Computer Science Applications
CiteScore
6.30
自引率
7.40%
发文量
47
审稿时长
106 days
期刊介绍: SLAS Technology emphasizes scientific and technical advances that enable and improve life sciences research and development; drug-delivery; diagnostics; biomedical and molecular imaging; and personalized and precision medicine. This includes high-throughput and other laboratory automation technologies; micro/nanotechnologies; analytical, separation and quantitative techniques; synthetic chemistry and biology; informatics (data analysis, statistics, bio, genomic and chemoinformatics); and more.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信