Using gut microbiome metagenomic hypervariable features for diabetes screening and typing through supervised machine learning.

IF 4 2区 生物学 Q1 GENETICS & HEREDITY
Xavier Chavarria, Hyun Seo Park, Singeun Oh, Dongjun Kang, Jun Ho Choi, Myungjun Kim, Yoon Hee Cho, Myung-Hee Yi, Ju Yeong Kim
{"title":"Using gut microbiome metagenomic hypervariable features for diabetes screening and typing through supervised machine learning.","authors":"Xavier Chavarria, Hyun Seo Park, Singeun Oh, Dongjun Kang, Jun Ho Choi, Myungjun Kim, Yoon Hee Cho, Myung-Hee Yi, Ju Yeong Kim","doi":"10.1099/mgen.0.001365","DOIUrl":null,"url":null,"abstract":"<p><p>Diabetes mellitus is a complex metabolic disorder and one of the fastest-growing global public health concerns. The gut microbiota is implicated in the pathophysiology of various diseases, including diabetes. This study utilized 16S rRNA metagenomic data from a volunteer citizen science initiative to investigate microbial markers associated with diabetes status (positive or negative) and type (type 1 or type 2 diabetes mellitus) using supervised machine learning (ML) models. The diversity of the microbiome varied according to diabetes status and type. Differential microbial signatures between diabetes types and negative group revealed an increased presence of <i>Brucellaceae</i>, <i>Ruminococcaceae</i>, <i>Clostridiaceae</i>, <i>Micrococcaceae</i>, <i>Barnesiellaceae</i> and <i>Fusobacteriaceae</i> in subjects with diabetes type 1, and <i>Veillonellaceae</i>, <i>Streptococcaceae</i> and the order <i>Gammaproteobacteria</i> in subjects with diabetes type 2. The decision tree, elastic net, random forest (RF) and support vector machine with radial kernel ML algorithms were trained to screen and type diabetes based on microbial profiles of 76 subjects with type 1 diabetes, 366 subjects with type 2 diabetes and 250 subjects without diabetes. Using the 1000 most variable features, tree-based models were the highest-performing algorithms. The RF screening models achieved the best performance, with an average area under the receiver operating characteristic curve (AUC) of 0.76, although all models lacked sensitivity. Reducing the dataset to 500 features produced an AUC of 0.77 with sensitivity increasing by 74% from 0.46 to 0.80. Model performance improved for the classification of negative-status and type 2 diabetes. Diabetes type models performed best with 500 features, but the metric performed poorly across all model iterations. ML has the potential to facilitate early diagnosis of diabetes based on microbial profiles of the gut microbiome.</p>","PeriodicalId":18487,"journal":{"name":"Microbial Genomics","volume":"11 3","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11893737/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbial Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1099/mgen.0.001365","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Diabetes mellitus is a complex metabolic disorder and one of the fastest-growing global public health concerns. The gut microbiota is implicated in the pathophysiology of various diseases, including diabetes. This study utilized 16S rRNA metagenomic data from a volunteer citizen science initiative to investigate microbial markers associated with diabetes status (positive or negative) and type (type 1 or type 2 diabetes mellitus) using supervised machine learning (ML) models. The diversity of the microbiome varied according to diabetes status and type. Differential microbial signatures between diabetes types and negative group revealed an increased presence of Brucellaceae, Ruminococcaceae, Clostridiaceae, Micrococcaceae, Barnesiellaceae and Fusobacteriaceae in subjects with diabetes type 1, and Veillonellaceae, Streptococcaceae and the order Gammaproteobacteria in subjects with diabetes type 2. The decision tree, elastic net, random forest (RF) and support vector machine with radial kernel ML algorithms were trained to screen and type diabetes based on microbial profiles of 76 subjects with type 1 diabetes, 366 subjects with type 2 diabetes and 250 subjects without diabetes. Using the 1000 most variable features, tree-based models were the highest-performing algorithms. The RF screening models achieved the best performance, with an average area under the receiver operating characteristic curve (AUC) of 0.76, although all models lacked sensitivity. Reducing the dataset to 500 features produced an AUC of 0.77 with sensitivity increasing by 74% from 0.46 to 0.80. Model performance improved for the classification of negative-status and type 2 diabetes. Diabetes type models performed best with 500 features, but the metric performed poorly across all model iterations. ML has the potential to facilitate early diagnosis of diabetes based on microbial profiles of the gut microbiome.

通过监督机器学习,利用肠道微生物组宏基因组高变特征进行糖尿病筛查和分型。
糖尿病是一种复杂的代谢紊乱,是全球增长最快的公共卫生问题之一。肠道微生物群与包括糖尿病在内的多种疾病的病理生理有关。本研究利用来自志愿者公民科学计划的16S rRNA宏基因组数据,使用监督机器学习(ML)模型研究与糖尿病状态(阳性或阴性)和类型(1型或2型糖尿病)相关的微生物标记物。微生物组的多样性根据糖尿病状态和类型而变化。糖尿病类型和阴性组之间的微生物特征差异显示,1型糖尿病患者中布鲁氏菌科、瘤胃球菌科、梭菌科、微球菌科、巴氏菌科和梭杆菌科的存在增加,2型糖尿病患者中韦氏菌科、链球菌科和γ变形菌目的存在增加。基于76例1型糖尿病患者、366例2型糖尿病患者和250例非糖尿病患者的微生物谱,训练决策树、弹性网络、随机森林(RF)和支持向量机(svm)算法进行糖尿病筛查和分型。使用1000个最可变的特征,基于树的模型是性能最高的算法。尽管所有模型都缺乏灵敏度,但RF筛选模型的性能最好,接收器工作特性曲线下的平均面积(AUC)为0.76。将数据集减少到500个特征产生的AUC为0.77,灵敏度从0.46增加到0.80,增加了74%。对消极状态和2型糖尿病的分类,模型性能有所提高。糖尿病类型模型在拥有500个特征时表现最好,但是度量在所有模型迭代中表现不佳。ML有可能根据肠道微生物组的微生物特征促进糖尿病的早期诊断。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Microbial Genomics
Microbial Genomics Medicine-Epidemiology
CiteScore
6.60
自引率
2.60%
发文量
153
审稿时长
12 weeks
期刊介绍: Microbial Genomics (MGen) is a fully open access, mandatory open data and peer-reviewed journal publishing high-profile original research on archaea, bacteria, microbial eukaryotes and viruses.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信