{"title":"Analysis of protein determinants of genotype-specific properties of group a rotaviruses using machine learning","authors":"Myeongji Cho , Nara Been , Hyeon S. Son","doi":"10.1016/j.compbiomed.2025.110143","DOIUrl":null,"url":null,"abstract":"<div><div>Group A rotaviruses (RVAs) are the leading cause of viral diarrhoea across various host species, including mammals and birds. The VP7 and VP4 proteins of these viruses play critical roles in determining genotype specificity, influencing viral infectivity and host adaptation. This study employed machine-learning techniques to classify RVA genotypes based on the molecular and physicochemical properties of these proteins. A dataset of 94 VP7 and 68 VP4 protein sequences was collected from various host species. Seven machine-learning algorithms—Naïve Bayes (NB), logistic regression (LR), decision tree (DT), random forest (RF), k-nearest neighbour (kNN), support vector machine (SVM), and artificial neural network (ANN)—were used for genotype classification. Feature subsets were configured using ranking-based attribute selection, and classification performance was evaluated using accuracy (ACC), precision, recall, Matthews’ correlation coefficient (MCC), and the area under the curve (AUC). kNN demonstrated the highest classification accuracy for both VP7 (ACC = 97.87 %) and VP4 (ACC = 100 %), outperforming NB, LR, DT, RF, SVM, and ANN. For VP7 sequences, key properties influencing genotype classification included hydrophobicity, normalised van der Waals volume, and leucine composition. For VP4, polarity, normalised van der Waals volume, and polarizability were the most significant factors. In summary, the genotype-specific molecular features of VP7 and VP4 proteins served as reliable markers for RVA classification. Our findings highlight the potential of machine-learning approaches to predict RVA genotypes based on the physicochemical properties of amino acids, providing valuable insights into the molecular mechanisms that drive viral evolution, host specificity, and immune evasion.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"191 ","pages":""},"PeriodicalIF":7.0000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525004949","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Group A rotaviruses (RVAs) are the leading cause of viral diarrhoea across various host species, including mammals and birds. The VP7 and VP4 proteins of these viruses play critical roles in determining genotype specificity, influencing viral infectivity and host adaptation. This study employed machine-learning techniques to classify RVA genotypes based on the molecular and physicochemical properties of these proteins. A dataset of 94 VP7 and 68 VP4 protein sequences was collected from various host species. Seven machine-learning algorithms—Naïve Bayes (NB), logistic regression (LR), decision tree (DT), random forest (RF), k-nearest neighbour (kNN), support vector machine (SVM), and artificial neural network (ANN)—were used for genotype classification. Feature subsets were configured using ranking-based attribute selection, and classification performance was evaluated using accuracy (ACC), precision, recall, Matthews’ correlation coefficient (MCC), and the area under the curve (AUC). kNN demonstrated the highest classification accuracy for both VP7 (ACC = 97.87 %) and VP4 (ACC = 100 %), outperforming NB, LR, DT, RF, SVM, and ANN. For VP7 sequences, key properties influencing genotype classification included hydrophobicity, normalised van der Waals volume, and leucine composition. For VP4, polarity, normalised van der Waals volume, and polarizability were the most significant factors. In summary, the genotype-specific molecular features of VP7 and VP4 proteins served as reliable markers for RVA classification. Our findings highlight the potential of machine-learning approaches to predict RVA genotypes based on the physicochemical properties of amino acids, providing valuable insights into the molecular mechanisms that drive viral evolution, host specificity, and immune evasion.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.