Analysis of protein determinants of genotype-specific properties of group a rotaviruses using machine learning

IF 7 2区 医学 Q1 BIOLOGY
Myeongji Cho , Nara Been , Hyeon S. Son
{"title":"Analysis of protein determinants of genotype-specific properties of group a rotaviruses using machine learning","authors":"Myeongji Cho ,&nbsp;Nara Been ,&nbsp;Hyeon S. Son","doi":"10.1016/j.compbiomed.2025.110143","DOIUrl":null,"url":null,"abstract":"<div><div>Group A rotaviruses (RVAs) are the leading cause of viral diarrhoea across various host species, including mammals and birds. The VP7 and VP4 proteins of these viruses play critical roles in determining genotype specificity, influencing viral infectivity and host adaptation. This study employed machine-learning techniques to classify RVA genotypes based on the molecular and physicochemical properties of these proteins. A dataset of 94 VP7 and 68 VP4 protein sequences was collected from various host species. Seven machine-learning algorithms—Naïve Bayes (NB), logistic regression (LR), decision tree (DT), random forest (RF), k-nearest neighbour (kNN), support vector machine (SVM), and artificial neural network (ANN)—were used for genotype classification. Feature subsets were configured using ranking-based attribute selection, and classification performance was evaluated using accuracy (ACC), precision, recall, Matthews’ correlation coefficient (MCC), and the area under the curve (AUC). kNN demonstrated the highest classification accuracy for both VP7 (ACC = 97.87 %) and VP4 (ACC = 100 %), outperforming NB, LR, DT, RF, SVM, and ANN. For VP7 sequences, key properties influencing genotype classification included hydrophobicity, normalised van der Waals volume, and leucine composition. For VP4, polarity, normalised van der Waals volume, and polarizability were the most significant factors. In summary, the genotype-specific molecular features of VP7 and VP4 proteins served as reliable markers for RVA classification. Our findings highlight the potential of machine-learning approaches to predict RVA genotypes based on the physicochemical properties of amino acids, providing valuable insights into the molecular mechanisms that drive viral evolution, host specificity, and immune evasion.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"191 ","pages":""},"PeriodicalIF":7.0000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525004949","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Group A rotaviruses (RVAs) are the leading cause of viral diarrhoea across various host species, including mammals and birds. The VP7 and VP4 proteins of these viruses play critical roles in determining genotype specificity, influencing viral infectivity and host adaptation. This study employed machine-learning techniques to classify RVA genotypes based on the molecular and physicochemical properties of these proteins. A dataset of 94 VP7 and 68 VP4 protein sequences was collected from various host species. Seven machine-learning algorithms—Naïve Bayes (NB), logistic regression (LR), decision tree (DT), random forest (RF), k-nearest neighbour (kNN), support vector machine (SVM), and artificial neural network (ANN)—were used for genotype classification. Feature subsets were configured using ranking-based attribute selection, and classification performance was evaluated using accuracy (ACC), precision, recall, Matthews’ correlation coefficient (MCC), and the area under the curve (AUC). kNN demonstrated the highest classification accuracy for both VP7 (ACC = 97.87 %) and VP4 (ACC = 100 %), outperforming NB, LR, DT, RF, SVM, and ANN. For VP7 sequences, key properties influencing genotype classification included hydrophobicity, normalised van der Waals volume, and leucine composition. For VP4, polarity, normalised van der Waals volume, and polarizability were the most significant factors. In summary, the genotype-specific molecular features of VP7 and VP4 proteins served as reliable markers for RVA classification. Our findings highlight the potential of machine-learning approaches to predict RVA genotypes based on the physicochemical properties of amino acids, providing valuable insights into the molecular mechanisms that drive viral evolution, host specificity, and immune evasion.

Abstract Image

利用机器学习分析a组轮状病毒基因型特异性特性的蛋白质决定因素
A群轮状病毒(RVAs)是各种宿主物种(包括哺乳动物和鸟类)病毒性腹泻的主要原因。这些病毒的VP7和VP4蛋白在决定基因型特异性、影响病毒传染性和宿主适应性方面起着关键作用。本研究采用机器学习技术根据这些蛋白质的分子和物理化学性质对RVA基因型进行分类。收集了94个VP7和68个VP4蛋白序列。七种机器学习algorithms-Naïve采用贝叶斯(NB)、逻辑回归(LR)、决策树(DT)、随机森林(RF)、k近邻(kNN)、支持向量机(SVM)和人工神经网络(ANN)进行基因型分类。使用基于排序的属性选择配置特征子集,并使用准确率(ACC)、精密度(precision)、召回率(recall)、Matthews相关系数(MCC)和曲线下面积(AUC)评估分类性能。kNN对VP7 (ACC = 97.87%)和VP4 (ACC = 100%)的分类准确率最高,优于NB、LR、DT、RF、SVM和ANN。对于VP7序列,影响基因型分类的关键特性包括疏水性、归一化范德华体积和亮氨酸组成。对于VP4,极性、归一化范德华体积和极化率是最重要的因素。综上所述,VP7和VP4蛋白的基因型特异性分子特征可作为RVA分类的可靠标记。我们的研究结果强调了机器学习方法基于氨基酸的物理化学性质预测RVA基因型的潜力,为驱动病毒进化、宿主特异性和免疫逃避的分子机制提供了有价值的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers in biology and medicine
Computers in biology and medicine 工程技术-工程:生物医学
CiteScore
11.70
自引率
10.40%
发文量
1086
审稿时长
74 days
期刊介绍: Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信