{"title":"Amino acid sequence-based IDR classification using ensemble machine learning and quantum neural networks","authors":"Seok-Jin Kang , Hongchul Shin","doi":"10.1016/j.compbiolchem.2025.108480","DOIUrl":null,"url":null,"abstract":"<div><div>Biologically traditional methods, such as the Uversky plot, which rely on hydrophobicity and net charge, have inherent limitations in accurately distinguishing intrinsically disordered regions (IDRs) from ordered protein regions. To overcome these constraints, we propose a novel ensemble framework integrating Machine Learning (ML), Deep Neural Networks (DNN), and Quantum Neural Networks (QNN) to enhance IDR classification accuracy. Notably, this study is the first to employ QNNs for IDR classification, leveraging quantum entanglement to model intricate feature interactions. Amino acid sequences were analyzed to extract biophysical features, including charge distribution, hydrophobicity, and structural properties, which served as inputs for the predictive models. ML was utilized for independent feature learning, DNN for hierarchical interaction modeling, and QNN for capturing high-order dependencies. Our meta-model demonstrated an accuracy of 0.85, surpassing individual classifiers and highlighting the importance of buried amino acids and feature interactions between scaled hydrophobicity and large, buried, and charged residues. This study advances computational protein science by demonstrating the applicability of QNNs in bioinformatics and establishing a robust framework for IDR classification.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"118 ","pages":"Article 108480"},"PeriodicalIF":2.6000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927125001409","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Biologically traditional methods, such as the Uversky plot, which rely on hydrophobicity and net charge, have inherent limitations in accurately distinguishing intrinsically disordered regions (IDRs) from ordered protein regions. To overcome these constraints, we propose a novel ensemble framework integrating Machine Learning (ML), Deep Neural Networks (DNN), and Quantum Neural Networks (QNN) to enhance IDR classification accuracy. Notably, this study is the first to employ QNNs for IDR classification, leveraging quantum entanglement to model intricate feature interactions. Amino acid sequences were analyzed to extract biophysical features, including charge distribution, hydrophobicity, and structural properties, which served as inputs for the predictive models. ML was utilized for independent feature learning, DNN for hierarchical interaction modeling, and QNN for capturing high-order dependencies. Our meta-model demonstrated an accuracy of 0.85, surpassing individual classifiers and highlighting the importance of buried amino acids and feature interactions between scaled hydrophobicity and large, buried, and charged residues. This study advances computational protein science by demonstrating the applicability of QNNs in bioinformatics and establishing a robust framework for IDR classification.
期刊介绍:
Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered.
Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered.
Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.