Pradeep Kumar Yadalam , Raghavendra Vamsi Anegundi , Prabhu Manickam Natarajan , Carlos M. Ardila
{"title":"龈卟啉单胞菌耐药序列的神经网络预测与分类","authors":"Pradeep Kumar Yadalam , Raghavendra Vamsi Anegundi , Prabhu Manickam Natarajan , Carlos M. Ardila","doi":"10.1016/j.identj.2025.100890","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction and objective</h3><div><em>Porphyromonas gingivalis</em> is a key pathogen associated with periodontal disease linked to various systemic conditions. Accurate identification of <em>P. gingivalis</em> proteins is essential for understanding its pathogenicity and developing targeted interventions. Recent advances in whole-genome sequencing of <em>P. gingivalis</em> have enhanced the detection and classification of antimicrobial resistance (AMR) determinants, aiding in the early identification of resistance trends and improving patient care. In this study, we developed a deep learning approach using convolutional neural networks (CNNs) to classify <em>P. gingivalis</em> proteins based on their amino acid sequences.</div></div><div><h3>Methods</h3><div>A dataset of 685 protein sequences, including 150 <em>P. gingivalis</em> proteins and 535 nonresistant variants, was compiled and split into training (60%), validation (20%), and test (20%) sets. The sequences were preprocessed by padding to 750 amino acids and one-hot encoded into a feature matrix. A CNN model, consisting of two convolutional layers, max pooling, dropout, and fully connected layers for binary classification, was designed and implemented in PyTorch with 6192,258 parameters. The model was trained using the Adam optimizer for 30 epochs with early stopping based on validation accuracy.</div></div><div><h3>Results</h3><div>The CNN model outperforms traditional methods like BLAST, HMM Profiles, and DeepSig in predicting and classifying AMR in <em>P. gingivalis</em>. The hypothetical ProtBERT model shows slightly better performance, with an accuracy of 97%. Key metrics like accuracy, precision, recall, <em>F</em>1 score, and the area under the curve were assessed. CNN and ProtBERT have high recall rates (0.93 and 0.95, respectively), indicating their effectiveness in predicting AMR classifications.</div></div><div><h3>Conclusion</h3><div>Our CNN model outperforms SOTA methods in classifying <em>P. gingivalis</em>-resistant protein sequences, achieving 96.35% accuracy and an area under the curve of 0.98.</div></div><div><h3>Clinical relevance</h3><div>Precise and rapid prediction of AMR based solely on protein sequences, potentially leading to earlier identification of resistance trends and improved antibiotic stewardship in periodontal treatment.</div></div>","PeriodicalId":13785,"journal":{"name":"International dental journal","volume":"75 5","pages":"Article 100890"},"PeriodicalIF":3.2000,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Neural Networks for Predicting and Classifying Antimicrobial Resistance Sequences in Porphyromonas gingivalis\",\"authors\":\"Pradeep Kumar Yadalam , Raghavendra Vamsi Anegundi , Prabhu Manickam Natarajan , Carlos M. Ardila\",\"doi\":\"10.1016/j.identj.2025.100890\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction and objective</h3><div><em>Porphyromonas gingivalis</em> is a key pathogen associated with periodontal disease linked to various systemic conditions. Accurate identification of <em>P. gingivalis</em> proteins is essential for understanding its pathogenicity and developing targeted interventions. Recent advances in whole-genome sequencing of <em>P. gingivalis</em> have enhanced the detection and classification of antimicrobial resistance (AMR) determinants, aiding in the early identification of resistance trends and improving patient care. In this study, we developed a deep learning approach using convolutional neural networks (CNNs) to classify <em>P. gingivalis</em> proteins based on their amino acid sequences.</div></div><div><h3>Methods</h3><div>A dataset of 685 protein sequences, including 150 <em>P. gingivalis</em> proteins and 535 nonresistant variants, was compiled and split into training (60%), validation (20%), and test (20%) sets. The sequences were preprocessed by padding to 750 amino acids and one-hot encoded into a feature matrix. A CNN model, consisting of two convolutional layers, max pooling, dropout, and fully connected layers for binary classification, was designed and implemented in PyTorch with 6192,258 parameters. The model was trained using the Adam optimizer for 30 epochs with early stopping based on validation accuracy.</div></div><div><h3>Results</h3><div>The CNN model outperforms traditional methods like BLAST, HMM Profiles, and DeepSig in predicting and classifying AMR in <em>P. gingivalis</em>. The hypothetical ProtBERT model shows slightly better performance, with an accuracy of 97%. Key metrics like accuracy, precision, recall, <em>F</em>1 score, and the area under the curve were assessed. CNN and ProtBERT have high recall rates (0.93 and 0.95, respectively), indicating their effectiveness in predicting AMR classifications.</div></div><div><h3>Conclusion</h3><div>Our CNN model outperforms SOTA methods in classifying <em>P. gingivalis</em>-resistant protein sequences, achieving 96.35% accuracy and an area under the curve of 0.98.</div></div><div><h3>Clinical relevance</h3><div>Precise and rapid prediction of AMR based solely on protein sequences, potentially leading to earlier identification of resistance trends and improved antibiotic stewardship in periodontal treatment.</div></div>\",\"PeriodicalId\":13785,\"journal\":{\"name\":\"International dental journal\",\"volume\":\"75 5\",\"pages\":\"Article 100890\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-07-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International dental journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020653925001790\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International dental journal","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020653925001790","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
Neural Networks for Predicting and Classifying Antimicrobial Resistance Sequences in Porphyromonas gingivalis
Introduction and objective
Porphyromonas gingivalis is a key pathogen associated with periodontal disease linked to various systemic conditions. Accurate identification of P. gingivalis proteins is essential for understanding its pathogenicity and developing targeted interventions. Recent advances in whole-genome sequencing of P. gingivalis have enhanced the detection and classification of antimicrobial resistance (AMR) determinants, aiding in the early identification of resistance trends and improving patient care. In this study, we developed a deep learning approach using convolutional neural networks (CNNs) to classify P. gingivalis proteins based on their amino acid sequences.
Methods
A dataset of 685 protein sequences, including 150 P. gingivalis proteins and 535 nonresistant variants, was compiled and split into training (60%), validation (20%), and test (20%) sets. The sequences were preprocessed by padding to 750 amino acids and one-hot encoded into a feature matrix. A CNN model, consisting of two convolutional layers, max pooling, dropout, and fully connected layers for binary classification, was designed and implemented in PyTorch with 6192,258 parameters. The model was trained using the Adam optimizer for 30 epochs with early stopping based on validation accuracy.
Results
The CNN model outperforms traditional methods like BLAST, HMM Profiles, and DeepSig in predicting and classifying AMR in P. gingivalis. The hypothetical ProtBERT model shows slightly better performance, with an accuracy of 97%. Key metrics like accuracy, precision, recall, F1 score, and the area under the curve were assessed. CNN and ProtBERT have high recall rates (0.93 and 0.95, respectively), indicating their effectiveness in predicting AMR classifications.
Conclusion
Our CNN model outperforms SOTA methods in classifying P. gingivalis-resistant protein sequences, achieving 96.35% accuracy and an area under the curve of 0.98.
Clinical relevance
Precise and rapid prediction of AMR based solely on protein sequences, potentially leading to earlier identification of resistance trends and improved antibiotic stewardship in periodontal treatment.
期刊介绍:
The International Dental Journal features peer-reviewed, scientific articles relevant to international oral health issues, as well as practical, informative articles aimed at clinicians.