E Kalluçi, B Preni, X Dhamo, E Noka, S Bardhi, A Macchia, G Bonetti, K Dhuli, K Donato, M Bertelli, L J M Zambrano, S Janaqi
{"title":"应用于人类微生物组的有监督和无监督机器学习算法比较研究。","authors":"E Kalluçi, B Preni, X Dhamo, E Noka, S Bardhi, A Macchia, G Bonetti, K Dhuli, K Donato, M Bertelli, L J M Zambrano, S Janaqi","doi":"10.7417/CT.2024.5051","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The human microbiome, consisting of diverse bacte-rial, fungal, protozoan and viral species, exerts a profound influence on various physiological processes and disease susceptibility. However, the complexity of microbiome data has presented significant challenges in the analysis and interpretation of these intricate datasets, leading to the development of specialized software that employs machine learning algorithms for these aims.</p><p><strong>Methods: </strong>In this paper, we analyze raw data taken from 16S rRNA gene sequencing from three studies, including stool samples from healthy control, patients with adenoma, and patients with colorectal cancer. Firstly, we use network-based methods to reduce dimensions of the dataset and consider only the most important features. In addition, we employ supervised machine learning algorithms to make prediction.</p><p><strong>Results: </strong>Results show that graph-based techniques reduces dimen-sion from 255 up to 78 features with modularity score 0.73 based on different centrality measures. On the other hand, projection methods (non-negative matrix factorization and principal component analysis) reduce dimensions to 7 features. Furthermore, we apply supervised machine learning algorithms on the most important features obtained from centrality measures and on the ones obtained from projection methods, founding that the evaluation metrics have approximately the same scores when applying the algorithms on the entire dataset, on 78 feature and on 7 features.</p><p><strong>Conclusions: </strong>This study demonstrates the efficacy of graph-based and projection methods in the interpretation for 16S rRNA gene sequencing data. Supervised machine learning on refined features from both approaches yields comparable predictive performance, emphasizing specific microbial features-bacteroides, prevotella, fusobacterium, lysinibacillus, blautia, sphingomonas, and faecalibacterium-as key in predicting patient conditions from raw data.</p>","PeriodicalId":50686,"journal":{"name":"Clinica Terapeutica","volume":"175 3","pages":"98-116"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A comparative study of supervised and unsupervised machine learning algorithms applied to human microbiome.\",\"authors\":\"E Kalluçi, B Preni, X Dhamo, E Noka, S Bardhi, A Macchia, G Bonetti, K Dhuli, K Donato, M Bertelli, L J M Zambrano, S Janaqi\",\"doi\":\"10.7417/CT.2024.5051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The human microbiome, consisting of diverse bacte-rial, fungal, protozoan and viral species, exerts a profound influence on various physiological processes and disease susceptibility. However, the complexity of microbiome data has presented significant challenges in the analysis and interpretation of these intricate datasets, leading to the development of specialized software that employs machine learning algorithms for these aims.</p><p><strong>Methods: </strong>In this paper, we analyze raw data taken from 16S rRNA gene sequencing from three studies, including stool samples from healthy control, patients with adenoma, and patients with colorectal cancer. Firstly, we use network-based methods to reduce dimensions of the dataset and consider only the most important features. In addition, we employ supervised machine learning algorithms to make prediction.</p><p><strong>Results: </strong>Results show that graph-based techniques reduces dimen-sion from 255 up to 78 features with modularity score 0.73 based on different centrality measures. On the other hand, projection methods (non-negative matrix factorization and principal component analysis) reduce dimensions to 7 features. Furthermore, we apply supervised machine learning algorithms on the most important features obtained from centrality measures and on the ones obtained from projection methods, founding that the evaluation metrics have approximately the same scores when applying the algorithms on the entire dataset, on 78 feature and on 7 features.</p><p><strong>Conclusions: </strong>This study demonstrates the efficacy of graph-based and projection methods in the interpretation for 16S rRNA gene sequencing data. Supervised machine learning on refined features from both approaches yields comparable predictive performance, emphasizing specific microbial features-bacteroides, prevotella, fusobacterium, lysinibacillus, blautia, sphingomonas, and faecalibacterium-as key in predicting patient conditions from raw data.</p>\",\"PeriodicalId\":50686,\"journal\":{\"name\":\"Clinica Terapeutica\",\"volume\":\"175 3\",\"pages\":\"98-116\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinica Terapeutica\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7417/CT.2024.5051\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinica Terapeutica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7417/CT.2024.5051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
A comparative study of supervised and unsupervised machine learning algorithms applied to human microbiome.
Background: The human microbiome, consisting of diverse bacte-rial, fungal, protozoan and viral species, exerts a profound influence on various physiological processes and disease susceptibility. However, the complexity of microbiome data has presented significant challenges in the analysis and interpretation of these intricate datasets, leading to the development of specialized software that employs machine learning algorithms for these aims.
Methods: In this paper, we analyze raw data taken from 16S rRNA gene sequencing from three studies, including stool samples from healthy control, patients with adenoma, and patients with colorectal cancer. Firstly, we use network-based methods to reduce dimensions of the dataset and consider only the most important features. In addition, we employ supervised machine learning algorithms to make prediction.
Results: Results show that graph-based techniques reduces dimen-sion from 255 up to 78 features with modularity score 0.73 based on different centrality measures. On the other hand, projection methods (non-negative matrix factorization and principal component analysis) reduce dimensions to 7 features. Furthermore, we apply supervised machine learning algorithms on the most important features obtained from centrality measures and on the ones obtained from projection methods, founding that the evaluation metrics have approximately the same scores when applying the algorithms on the entire dataset, on 78 feature and on 7 features.
Conclusions: This study demonstrates the efficacy of graph-based and projection methods in the interpretation for 16S rRNA gene sequencing data. Supervised machine learning on refined features from both approaches yields comparable predictive performance, emphasizing specific microbial features-bacteroides, prevotella, fusobacterium, lysinibacillus, blautia, sphingomonas, and faecalibacterium-as key in predicting patient conditions from raw data.
期刊介绍:
La Clinica Terapeutica è una rivista di Clinica e Terapia in Medicina e Chirurgia, fondata nel 1951 dal Prof. Mariano Messini (1901-1980), Direttore dell''Istituto di Idrologia Medica dell''Università di Roma “La Sapienza”. La rivista è pubblicata come “periodico bimestrale” dalla Società Editrice Universo, casa editrice fondata nel 1945 dal Comm. Luigi Pellino. La Clinica Terapeutica è indicizzata su MEDLINE, INDEX MEDICUS, EMBASE/Excerpta Medica.