应用于人类微生物组的有监督和无监督机器学习算法比较研究。

Q2 Medicine

Clinica Terapeutica Pub Date : 2024-05-01 DOI:10.7417/CT.2024.5051

E Kalluçi, B Preni, X Dhamo, E Noka, S Bardhi, A Macchia, G Bonetti, K Dhuli, K Donato, M Bertelli, L J M Zambrano, S Janaqi

{"title":"应用于人类微生物组的有监督和无监督机器学习算法比较研究。","authors":"E Kalluçi, B Preni, X Dhamo, E Noka, S Bardhi, A Macchia, G Bonetti, K Dhuli, K Donato, M Bertelli, L J M Zambrano, S Janaqi","doi":"10.7417/CT.2024.5051","DOIUrl":null,"url":null,"abstract":"Background: The human microbiome, consisting of diverse bacte-rial, fungal, protozoan and viral species, exerts a profound influence on various physiological processes and disease susceptibility. However, the complexity of microbiome data has presented significant challenges in the analysis and interpretation of these intricate datasets, leading to the development of specialized software that employs machine learning algorithms for these aims.Methods: In this paper, we analyze raw data taken from 16S rRNA gene sequencing from three studies, including stool samples from healthy control, patients with adenoma, and patients with colorectal cancer. Firstly, we use network-based methods to reduce dimensions of the dataset and consider only the most important features. In addition, we employ supervised machine learning algorithms to make prediction.Results: Results show that graph-based techniques reduces dimen-sion from 255 up to 78 features with modularity score 0.73 based on different centrality measures. On the other hand, projection methods (non-negative matrix factorization and principal component analysis) reduce dimensions to 7 features. Furthermore, we apply supervised machine learning algorithms on the most important features obtained from centrality measures and on the ones obtained from projection methods, founding that the evaluation metrics have approximately the same scores when applying the algorithms on the entire dataset, on 78 feature and on 7 features.Conclusions: This study demonstrates the efficacy of graph-based and projection methods in the interpretation for 16S rRNA gene sequencing data. Supervised machine learning on refined features from both approaches yields comparable predictive performance, emphasizing specific microbial features-bacteroides, prevotella, fusobacterium, lysinibacillus, blautia, sphingomonas, and faecalibacterium-as key in predicting patient conditions from raw data.","PeriodicalId":50686,"journal":{"name":"Clinica Terapeutica","volume":"175 3","pages":"98-116"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A comparative study of supervised and unsupervised machine learning algorithms applied to human microbiome.\",\"authors\":\"E Kalluçi, B Preni, X Dhamo, E Noka, S Bardhi, A Macchia, G Bonetti, K Dhuli, K Donato, M Bertelli, L J M Zambrano, S Janaqi\",\"doi\":\"10.7417/CT.2024.5051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: The human microbiome, consisting of diverse bacte-rial, fungal, protozoan and viral species, exerts a profound influence on various physiological processes and disease susceptibility. However, the complexity of microbiome data has presented significant challenges in the analysis and interpretation of these intricate datasets, leading to the development of specialized software that employs machine learning algorithms for these aims.Methods: In this paper, we analyze raw data taken from 16S rRNA gene sequencing from three studies, including stool samples from healthy control, patients with adenoma, and patients with colorectal cancer. Firstly, we use network-based methods to reduce dimensions of the dataset and consider only the most important features. In addition, we employ supervised machine learning algorithms to make prediction.Results: Results show that graph-based techniques reduces dimen-sion from 255 up to 78 features with modularity score 0.73 based on different centrality measures. On the other hand, projection methods (non-negative matrix factorization and principal component analysis) reduce dimensions to 7 features. Furthermore, we apply supervised machine learning algorithms on the most important features obtained from centrality measures and on the ones obtained from projection methods, founding that the evaluation metrics have approximately the same scores when applying the algorithms on the entire dataset, on 78 feature and on 7 features.Conclusions: This study demonstrates the efficacy of graph-based and projection methods in the interpretation for 16S rRNA gene sequencing data. Supervised machine learning on refined features from both approaches yields comparable predictive performance, emphasizing specific microbial features-bacteroides, prevotella, fusobacterium, lysinibacillus, blautia, sphingomonas, and faecalibacterium-as key in predicting patient conditions from raw data.\",\"PeriodicalId\":50686,\"journal\":{\"name\":\"Clinica Terapeutica\",\"volume\":\"175 3\",\"pages\":\"98-116\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinica Terapeutica\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7417/CT.2024.5051\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinica Terapeutica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7417/CT.2024.5051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

摘要

背景：人类微生物组由多种细菌、真菌、原生动物和病毒物种组成，对各种生理过程和疾病易感性有着深远的影响。然而，微生物组数据的复杂性给分析和解读这些错综复杂的数据集带来了巨大挑战，因此开发了专门的软件，采用机器学习算法来实现这些目标：本文分析了三项研究的 16S rRNA 基因测序原始数据，包括健康对照组、腺瘤患者和结直肠癌患者的粪便样本。首先，我们使用基于网络的方法降低数据集的维度，只考虑最重要的特征。此外，我们还采用有监督的机器学习算法进行预测：结果表明，基于图的技术将数据集的维度从 255 降至 78，根据不同的中心度量，模块化得分为 0.73。另一方面，投影方法（非负矩阵因式分解和主成分分析）将维度减少到 7 个特征。此外，我们还对通过中心性度量获得的最重要特征和通过投影方法获得的最重要特征应用了监督机器学习算法，发现在整个数据集、78 个特征和 7 个特征上应用算法时，评价指标的得分大致相同：本研究证明了基于图的方法和投影方法在解释 16S rRNA 基因测序数据方面的功效。对这两种方法提炼出的特征进行监督机器学习可获得相当的预测性能，强调了特定的微生物特征--乳杆菌、前驱菌、镰刀菌、赖氨巴氏杆菌、布氏杆菌、鞘氨醇单胞菌和粪杆菌--是根据原始数据预测患者病情的关键。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A comparative study of supervised and unsupervised machine learning algorithms applied to human microbiome.

Background: The human microbiome, consisting of diverse bacte-rial, fungal, protozoan and viral species, exerts a profound influence on various physiological processes and disease susceptibility. However, the complexity of microbiome data has presented significant challenges in the analysis and interpretation of these intricate datasets, leading to the development of specialized software that employs machine learning algorithms for these aims.

Methods: In this paper, we analyze raw data taken from 16S rRNA gene sequencing from three studies, including stool samples from healthy control, patients with adenoma, and patients with colorectal cancer. Firstly, we use network-based methods to reduce dimensions of the dataset and consider only the most important features. In addition, we employ supervised machine learning algorithms to make prediction.

Results: Results show that graph-based techniques reduces dimen-sion from 255 up to 78 features with modularity score 0.73 based on different centrality measures. On the other hand, projection methods (non-negative matrix factorization and principal component analysis) reduce dimensions to 7 features. Furthermore, we apply supervised machine learning algorithms on the most important features obtained from centrality measures and on the ones obtained from projection methods, founding that the evaluation metrics have approximately the same scores when applying the algorithms on the entire dataset, on 78 feature and on 7 features.

Conclusions: This study demonstrates the efficacy of graph-based and projection methods in the interpretation for 16S rRNA gene sequencing data. Supervised machine learning on refined features from both approaches yields comparable predictive performance, emphasizing specific microbial features-bacteroides, prevotella, fusobacterium, lysinibacillus, blautia, sphingomonas, and faecalibacterium-as key in predicting patient conditions from raw data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Clinica Terapeutica PHARMACOLOGY & PHARMACY-

CiteScore

2.50

自引率

0.00%

发文量

124

审稿时长

6-12 weeks

期刊介绍： La Clinica Terapeutica è una rivista di Clinica e Terapia in Medicina e Chirurgia, fondata nel 1951 dal Prof. Mariano Messini (1901-1980), Direttore dell''Istituto di Idrologia Medica dell''Università di Roma “La Sapienza”. La rivista è pubblicata come “periodico bimestrale” dalla Società Editrice Universo, casa editrice fondata nel 1945 dal Comm. Luigi Pellino. La Clinica Terapeutica è indicizzata su MEDLINE, INDEX MEDICUS, EMBASE/Excerpta Medica.