mCNN-GenEfflux: enhanced predicting Efflux protein and their super families by using generative proteins combined with multiple windows convolution neural networks
{"title":"mCNN-GenEfflux: enhanced predicting Efflux protein and their super families by using generative proteins combined with multiple windows convolution neural networks","authors":"Muhammad Hussain , Yu-Yen Ou , Quang Thai Ho","doi":"10.1016/j.compbiolchem.2025.108595","DOIUrl":null,"url":null,"abstract":"<div><div>Efflux transporters play a critical role in bacterial antibiotic resistance by facilitating the removal of harmful substances. These are classified into five distinct families: ABC, MFS, MATE, RND, and SMR. The significant sequence variability among these families, coupled with insufficient functional annotation, presents considerable challenges for traditional categorization methods. We hypothesize that integrating of ProtGPT2-generated efflux protein sequences with a multi-window convolutional neural network (mCNN) is proposed to enhance classification accuracy by effectively capturing local motifs and broader evolutionary patterns often overlooked by previous methods. Generative models like ProtGPT2 are effective for this purpose, as they generate a variety of sequence variants that reflect patterns observed in natural efflux families, thereby minimizing issues related to data scarcity. The proposed GenEfflux framework, unlike alignment-based methods such as HHblits and single-feature CNNs, combines generative sequence expanding with multi-scale evolutionary feature extraction via Position-Specific Scoring Matrices (PSSMs), thereby enhancing the understanding of sequence-function relationships. In comparative evaluations, GenEfflux consistently outperformed the baseline deepEfflux model across all Efflux transporter classes. In Class B, sensitivity increased from 0.5385 to 0.9999, and the Matthews correlation coefficient (MCC) rose from 0.4397 to 0.9327. In Class C, accuracy improved from 0.8977 to 0.9668, alongside an increase in MCC from 0.7668 to 0.9331. The findings demonstrate that sequences generated by ProtGPT2 possess functional relevance and improve classification effectiveness. GenEfflux suggestions a comprehensive framework for enhancing efflux protein analysis and advancing research on antibiotic resistance.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"119 ","pages":"Article 108595"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927125002567","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Efflux transporters play a critical role in bacterial antibiotic resistance by facilitating the removal of harmful substances. These are classified into five distinct families: ABC, MFS, MATE, RND, and SMR. The significant sequence variability among these families, coupled with insufficient functional annotation, presents considerable challenges for traditional categorization methods. We hypothesize that integrating of ProtGPT2-generated efflux protein sequences with a multi-window convolutional neural network (mCNN) is proposed to enhance classification accuracy by effectively capturing local motifs and broader evolutionary patterns often overlooked by previous methods. Generative models like ProtGPT2 are effective for this purpose, as they generate a variety of sequence variants that reflect patterns observed in natural efflux families, thereby minimizing issues related to data scarcity. The proposed GenEfflux framework, unlike alignment-based methods such as HHblits and single-feature CNNs, combines generative sequence expanding with multi-scale evolutionary feature extraction via Position-Specific Scoring Matrices (PSSMs), thereby enhancing the understanding of sequence-function relationships. In comparative evaluations, GenEfflux consistently outperformed the baseline deepEfflux model across all Efflux transporter classes. In Class B, sensitivity increased from 0.5385 to 0.9999, and the Matthews correlation coefficient (MCC) rose from 0.4397 to 0.9327. In Class C, accuracy improved from 0.8977 to 0.9668, alongside an increase in MCC from 0.7668 to 0.9331. The findings demonstrate that sequences generated by ProtGPT2 possess functional relevance and improve classification effectiveness. GenEfflux suggestions a comprehensive framework for enhancing efflux protein analysis and advancing research on antibiotic resistance.
期刊介绍:
Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered.
Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered.
Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.