基于微生物组的龋齿诊断的集成学习:多组建模和来自唾液和牙菌斑宏基因组数据的生物学解释。

IF 3.1 2区 医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE
Fangqiao Wei, Zailong Wu, Guanghui Li, Xiangyu Sun, Xiangru Shi, Lei Tan, Tianxiang Ai, Long Qu, Shuguo Zheng
{"title":"基于微生物组的龋齿诊断的集成学习:多组建模和来自唾液和牙菌斑宏基因组数据的生物学解释。","authors":"Fangqiao Wei, Zailong Wu, Guanghui Li, Xiangyu Sun, Xiangru Shi, Lei Tan, Tianxiang Ai, Long Qu, Shuguo Zheng","doi":"10.1186/s12903-025-06590-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Oral microbiota is a major etiological factor in the development of dental caries. Next-generation sequencing techniques have been widely used, generating vast amounts of data which is underexplored. The advancement of artificial intelligence (AI) technologies has made it possible to mine information from these large datasets. This study aimed to develop AI-driven diagnostic models and identify key microbial features for caries.</p><p><strong>Methods: </strong>We collected raw metagenomic and full-length 16 S rRNA gene sequencing data from previous studies on saliva and plaque to construct a caries AI training dataset comprising nearly 600 samples. Samples were grouped based on age, sequencing and sampling method. Through systematic comparison of seven machine learning architectures, including Logistic Regression, Random Forest, Support Vector Machines, Gradient Boosting, Convolutional Neural Networks, Feedforward Neural Networks, and Transformer models, we developed subgroup-specific caries diagnostic models, with subsequent ensemble learning integration to enhance generalizability.</p><p><strong>Results: </strong>The caries diagnostic model achieved a maximum AUC value of 1 (accuracy of 100%) for children under 6 years old in both saliva and plaque groups. The consistency of top features (species and metabolic pathways) contributing to the models was demonstrated through intra- and inter-group analyses. Key caries-associated species included Streptococcus salivarius, Streptococcus parasanguinis and Veillonella dispar. Veillonella parvula exhibits higher abundance in caries plaque samples, while being elevated in healthy saliva samples. Metabolic pathways like geranylgeranyl diphosphate and fructan biosynthesis were enriched in caries, whereas Bifidobacterium shunt and peptidoglycan biosynthesis were depleted.</p><p><strong>Conclusion: </strong>The current work provided reliable diagnostic models for early childhood caries, and established a robust computational framework for AI-driven microbiome analysis. This study, by focusing on the characteristics of the oral microbiome, offers novel perspectives for data mining and validation of existing data through the application of AI modelling.</p>","PeriodicalId":9072,"journal":{"name":"BMC Oral Health","volume":"25 1","pages":"1188"},"PeriodicalIF":3.1000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12272970/pdf/","citationCount":"0","resultStr":"{\"title\":\"Ensemble learning for microbiome-based caries diagnosis: multi-group modeling and biological interpretation from salivary and plaque metagenomic data.\",\"authors\":\"Fangqiao Wei, Zailong Wu, Guanghui Li, Xiangyu Sun, Xiangru Shi, Lei Tan, Tianxiang Ai, Long Qu, Shuguo Zheng\",\"doi\":\"10.1186/s12903-025-06590-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Oral microbiota is a major etiological factor in the development of dental caries. Next-generation sequencing techniques have been widely used, generating vast amounts of data which is underexplored. The advancement of artificial intelligence (AI) technologies has made it possible to mine information from these large datasets. This study aimed to develop AI-driven diagnostic models and identify key microbial features for caries.</p><p><strong>Methods: </strong>We collected raw metagenomic and full-length 16 S rRNA gene sequencing data from previous studies on saliva and plaque to construct a caries AI training dataset comprising nearly 600 samples. Samples were grouped based on age, sequencing and sampling method. Through systematic comparison of seven machine learning architectures, including Logistic Regression, Random Forest, Support Vector Machines, Gradient Boosting, Convolutional Neural Networks, Feedforward Neural Networks, and Transformer models, we developed subgroup-specific caries diagnostic models, with subsequent ensemble learning integration to enhance generalizability.</p><p><strong>Results: </strong>The caries diagnostic model achieved a maximum AUC value of 1 (accuracy of 100%) for children under 6 years old in both saliva and plaque groups. The consistency of top features (species and metabolic pathways) contributing to the models was demonstrated through intra- and inter-group analyses. Key caries-associated species included Streptococcus salivarius, Streptococcus parasanguinis and Veillonella dispar. Veillonella parvula exhibits higher abundance in caries plaque samples, while being elevated in healthy saliva samples. Metabolic pathways like geranylgeranyl diphosphate and fructan biosynthesis were enriched in caries, whereas Bifidobacterium shunt and peptidoglycan biosynthesis were depleted.</p><p><strong>Conclusion: </strong>The current work provided reliable diagnostic models for early childhood caries, and established a robust computational framework for AI-driven microbiome analysis. This study, by focusing on the characteristics of the oral microbiome, offers novel perspectives for data mining and validation of existing data through the application of AI modelling.</p>\",\"PeriodicalId\":9072,\"journal\":{\"name\":\"BMC Oral Health\",\"volume\":\"25 1\",\"pages\":\"1188\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12272970/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Oral Health\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12903-025-06590-2\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Oral Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12903-025-06590-2","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

背景:口腔微生物群是龋病发生的主要病因。新一代测序技术已被广泛使用,产生了大量尚未开发的数据。人工智能(AI)技术的进步使得从这些大型数据集中挖掘信息成为可能。本研究旨在开发人工智能驱动的诊断模型,并确定龋齿的关键微生物特征。方法:收集前人唾液和牙菌斑研究的原始宏基因组和全长16s rRNA基因测序数据,构建包含近600个样本的龋AI训练数据集。根据年龄、排序和抽样方法对样本进行分组。通过对包括逻辑回归、随机森林、支持向量机、梯度增强、卷积神经网络、前馈神经网络和变压器模型在内的7种机器学习架构的系统比较,我们开发了针对子群体的龋齿诊断模型,并随后进行了集成学习集成以增强泛化能力。结果:唾液组和牙菌斑组对6岁以下儿童龋病诊断模型的最大AUC值均为1(准确率100%)。通过组内和组间分析,证明了有助于模型的顶级特征(物种和代谢途径)的一致性。主要与龋齿相关的菌种包括唾液链球菌、副鳗链球菌和斑点细孔菌。小舌细孔菌在龋齿菌斑样本中表现出较高的丰度,而在健康唾液样本中则升高。像香叶二磷酸和果聚糖生物合成这样的代谢途径在龋齿中丰富,而双歧杆菌分流和肽聚糖生物合成则被耗尽。结论:本研究为早期儿童龋齿提供了可靠的诊断模型,并为人工智能驱动的微生物组分析建立了强大的计算框架。本研究通过关注口腔微生物组的特征,通过应用人工智能建模为数据挖掘和现有数据验证提供了新的视角。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Ensemble learning for microbiome-based caries diagnosis: multi-group modeling and biological interpretation from salivary and plaque metagenomic data.

Background: Oral microbiota is a major etiological factor in the development of dental caries. Next-generation sequencing techniques have been widely used, generating vast amounts of data which is underexplored. The advancement of artificial intelligence (AI) technologies has made it possible to mine information from these large datasets. This study aimed to develop AI-driven diagnostic models and identify key microbial features for caries.

Methods: We collected raw metagenomic and full-length 16 S rRNA gene sequencing data from previous studies on saliva and plaque to construct a caries AI training dataset comprising nearly 600 samples. Samples were grouped based on age, sequencing and sampling method. Through systematic comparison of seven machine learning architectures, including Logistic Regression, Random Forest, Support Vector Machines, Gradient Boosting, Convolutional Neural Networks, Feedforward Neural Networks, and Transformer models, we developed subgroup-specific caries diagnostic models, with subsequent ensemble learning integration to enhance generalizability.

Results: The caries diagnostic model achieved a maximum AUC value of 1 (accuracy of 100%) for children under 6 years old in both saliva and plaque groups. The consistency of top features (species and metabolic pathways) contributing to the models was demonstrated through intra- and inter-group analyses. Key caries-associated species included Streptococcus salivarius, Streptococcus parasanguinis and Veillonella dispar. Veillonella parvula exhibits higher abundance in caries plaque samples, while being elevated in healthy saliva samples. Metabolic pathways like geranylgeranyl diphosphate and fructan biosynthesis were enriched in caries, whereas Bifidobacterium shunt and peptidoglycan biosynthesis were depleted.

Conclusion: The current work provided reliable diagnostic models for early childhood caries, and established a robust computational framework for AI-driven microbiome analysis. This study, by focusing on the characteristics of the oral microbiome, offers novel perspectives for data mining and validation of existing data through the application of AI modelling.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Oral Health
BMC Oral Health DENTISTRY, ORAL SURGERY & MEDICINE-
CiteScore
3.90
自引率
6.90%
发文量
481
审稿时长
6-12 weeks
期刊介绍: BMC Oral Health is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of disorders of the mouth, teeth and gums, as well as related molecular genetics, pathophysiology, and epidemiology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信