Fangqiao Wei, Zailong Wu, Guanghui Li, Xiangyu Sun, Xiangru Shi, Lei Tan, Tianxiang Ai, Long Qu, Shuguo Zheng
{"title":"基于微生物组的龋齿诊断的集成学习:多组建模和来自唾液和牙菌斑宏基因组数据的生物学解释。","authors":"Fangqiao Wei, Zailong Wu, Guanghui Li, Xiangyu Sun, Xiangru Shi, Lei Tan, Tianxiang Ai, Long Qu, Shuguo Zheng","doi":"10.1186/s12903-025-06590-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Oral microbiota is a major etiological factor in the development of dental caries. Next-generation sequencing techniques have been widely used, generating vast amounts of data which is underexplored. The advancement of artificial intelligence (AI) technologies has made it possible to mine information from these large datasets. This study aimed to develop AI-driven diagnostic models and identify key microbial features for caries.</p><p><strong>Methods: </strong>We collected raw metagenomic and full-length 16 S rRNA gene sequencing data from previous studies on saliva and plaque to construct a caries AI training dataset comprising nearly 600 samples. Samples were grouped based on age, sequencing and sampling method. Through systematic comparison of seven machine learning architectures, including Logistic Regression, Random Forest, Support Vector Machines, Gradient Boosting, Convolutional Neural Networks, Feedforward Neural Networks, and Transformer models, we developed subgroup-specific caries diagnostic models, with subsequent ensemble learning integration to enhance generalizability.</p><p><strong>Results: </strong>The caries diagnostic model achieved a maximum AUC value of 1 (accuracy of 100%) for children under 6 years old in both saliva and plaque groups. The consistency of top features (species and metabolic pathways) contributing to the models was demonstrated through intra- and inter-group analyses. Key caries-associated species included Streptococcus salivarius, Streptococcus parasanguinis and Veillonella dispar. Veillonella parvula exhibits higher abundance in caries plaque samples, while being elevated in healthy saliva samples. Metabolic pathways like geranylgeranyl diphosphate and fructan biosynthesis were enriched in caries, whereas Bifidobacterium shunt and peptidoglycan biosynthesis were depleted.</p><p><strong>Conclusion: </strong>The current work provided reliable diagnostic models for early childhood caries, and established a robust computational framework for AI-driven microbiome analysis. This study, by focusing on the characteristics of the oral microbiome, offers novel perspectives for data mining and validation of existing data through the application of AI modelling.</p>","PeriodicalId":9072,"journal":{"name":"BMC Oral Health","volume":"25 1","pages":"1188"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12272970/pdf/","citationCount":"0","resultStr":"{\"title\":\"Ensemble learning for microbiome-based caries diagnosis: multi-group modeling and biological interpretation from salivary and plaque metagenomic data.\",\"authors\":\"Fangqiao Wei, Zailong Wu, Guanghui Li, Xiangyu Sun, Xiangru Shi, Lei Tan, Tianxiang Ai, Long Qu, Shuguo Zheng\",\"doi\":\"10.1186/s12903-025-06590-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Oral microbiota is a major etiological factor in the development of dental caries. Next-generation sequencing techniques have been widely used, generating vast amounts of data which is underexplored. The advancement of artificial intelligence (AI) technologies has made it possible to mine information from these large datasets. This study aimed to develop AI-driven diagnostic models and identify key microbial features for caries.</p><p><strong>Methods: </strong>We collected raw metagenomic and full-length 16 S rRNA gene sequencing data from previous studies on saliva and plaque to construct a caries AI training dataset comprising nearly 600 samples. Samples were grouped based on age, sequencing and sampling method. Through systematic comparison of seven machine learning architectures, including Logistic Regression, Random Forest, Support Vector Machines, Gradient Boosting, Convolutional Neural Networks, Feedforward Neural Networks, and Transformer models, we developed subgroup-specific caries diagnostic models, with subsequent ensemble learning integration to enhance generalizability.</p><p><strong>Results: </strong>The caries diagnostic model achieved a maximum AUC value of 1 (accuracy of 100%) for children under 6 years old in both saliva and plaque groups. The consistency of top features (species and metabolic pathways) contributing to the models was demonstrated through intra- and inter-group analyses. Key caries-associated species included Streptococcus salivarius, Streptococcus parasanguinis and Veillonella dispar. Veillonella parvula exhibits higher abundance in caries plaque samples, while being elevated in healthy saliva samples. Metabolic pathways like geranylgeranyl diphosphate and fructan biosynthesis were enriched in caries, whereas Bifidobacterium shunt and peptidoglycan biosynthesis were depleted.</p><p><strong>Conclusion: </strong>The current work provided reliable diagnostic models for early childhood caries, and established a robust computational framework for AI-driven microbiome analysis. This study, by focusing on the characteristics of the oral microbiome, offers novel perspectives for data mining and validation of existing data through the application of AI modelling.</p>\",\"PeriodicalId\":9072,\"journal\":{\"name\":\"BMC Oral Health\",\"volume\":\"25 1\",\"pages\":\"1188\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12272970/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Oral Health\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12903-025-06590-2\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Oral Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12903-025-06590-2","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
Ensemble learning for microbiome-based caries diagnosis: multi-group modeling and biological interpretation from salivary and plaque metagenomic data.
Background: Oral microbiota is a major etiological factor in the development of dental caries. Next-generation sequencing techniques have been widely used, generating vast amounts of data which is underexplored. The advancement of artificial intelligence (AI) technologies has made it possible to mine information from these large datasets. This study aimed to develop AI-driven diagnostic models and identify key microbial features for caries.
Methods: We collected raw metagenomic and full-length 16 S rRNA gene sequencing data from previous studies on saliva and plaque to construct a caries AI training dataset comprising nearly 600 samples. Samples were grouped based on age, sequencing and sampling method. Through systematic comparison of seven machine learning architectures, including Logistic Regression, Random Forest, Support Vector Machines, Gradient Boosting, Convolutional Neural Networks, Feedforward Neural Networks, and Transformer models, we developed subgroup-specific caries diagnostic models, with subsequent ensemble learning integration to enhance generalizability.
Results: The caries diagnostic model achieved a maximum AUC value of 1 (accuracy of 100%) for children under 6 years old in both saliva and plaque groups. The consistency of top features (species and metabolic pathways) contributing to the models was demonstrated through intra- and inter-group analyses. Key caries-associated species included Streptococcus salivarius, Streptococcus parasanguinis and Veillonella dispar. Veillonella parvula exhibits higher abundance in caries plaque samples, while being elevated in healthy saliva samples. Metabolic pathways like geranylgeranyl diphosphate and fructan biosynthesis were enriched in caries, whereas Bifidobacterium shunt and peptidoglycan biosynthesis were depleted.
Conclusion: The current work provided reliable diagnostic models for early childhood caries, and established a robust computational framework for AI-driven microbiome analysis. This study, by focusing on the characteristics of the oral microbiome, offers novel perspectives for data mining and validation of existing data through the application of AI modelling.
期刊介绍:
BMC Oral Health is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of disorders of the mouth, teeth and gums, as well as related molecular genetics, pathophysiology, and epidemiology.