机器学习技术在基于代谢组学的早产预测中的比较分析。

IF 4.1 2区生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY

Computational and structural biotechnology journal Pub Date : 2025-07-13 eCollection Date: 2025-01-01 DOI:10.1016/j.csbj.2025.07.010

Ying-Chieh Han, Jane Shearer, Chunlong Mu, Donna M Slater, Suzanne C Tough, Gavin E Duggan

{"title":"机器学习技术在基于代谢组学的早产预测中的比较分析。","authors":"Ying-Chieh Han, Jane Shearer, Chunlong Mu, Donna M Slater, Suzanne C Tough, Gavin E Duggan","doi":"10.1016/j.csbj.2025.07.010","DOIUrl":null,"url":null,"abstract":"Background: Machine learning (ML), with advancements in algorithms and computations, is seeing an increased presence in life science research. This study investigated several ML models' efficacy in predicting preterm birth using untargeted metabolomics from serum collected during the third trimester of gestation.Methods: Samples from 48 preterm and 102 term delivery mothers from the All Our Families Cohort (Calgary, AB) were examined. Four ML algorithms: Partial Least Squares Discriminant Analysis (PLS-DA), linear logistic regression, artificial neural networks (ANN), Extreme Gradient Boosting (XGBoost) - with and without bootstrap resampling were used to examine the small-scale clinical dataset for both model performance and metabolite interpretation.Results: Model performance was evaluated based on confusion matrices, area under the receiver operating characteristic (AUROC) curve analysis, and feature importance rankings. Linear models such as PLS-DA and logistic regression demonstrated moderate classification performance (AUROC ≈ 0.60), whereas non-linear approaches, including ANN and XGBoost, exhibited marginal improvements. Among all models, XGBoost combined with bootstrap resampling achieved the highest performance, yielding an AUROC of 0.85 (95 % CI: 0.57-0.99, p < 0.001), indicating a significant improvement in classification accuracy. Metabolite importance, derived from Shapley Additive Explanations (SHAP), consistently identified acylcarnitines and amino acid derivatives as principal discriminative features. Pathway analysis revealed disruptions to tyrosine metabolism as well as phenylalanine, tyrosine and tryptophan biosynthesis to be associated with preterm delivery.Conclusions: Our results highlight the complexity of metabolomics-based modelling for preterm birth and support an iterative, model-driven approach for optimizing predictive accuracy in small-scale clinical datasets.","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"3240-3250"},"PeriodicalIF":4.1000,"publicationDate":"2025-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12312043/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comparative analysis of machine learning techniques in metabolomic-based preterm birth prediction.\",\"authors\":\"Ying-Chieh Han, Jane Shearer, Chunlong Mu, Donna M Slater, Suzanne C Tough, Gavin E Duggan\",\"doi\":\"10.1016/j.csbj.2025.07.010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Machine learning (ML), with advancements in algorithms and computations, is seeing an increased presence in life science research. This study investigated several ML models' efficacy in predicting preterm birth using untargeted metabolomics from serum collected during the third trimester of gestation.Methods: Samples from 48 preterm and 102 term delivery mothers from the All Our Families Cohort (Calgary, AB) were examined. Four ML algorithms: Partial Least Squares Discriminant Analysis (PLS-DA), linear logistic regression, artificial neural networks (ANN), Extreme Gradient Boosting (XGBoost) - with and without bootstrap resampling were used to examine the small-scale clinical dataset for both model performance and metabolite interpretation.Results: Model performance was evaluated based on confusion matrices, area under the receiver operating characteristic (AUROC) curve analysis, and feature importance rankings. Linear models such as PLS-DA and logistic regression demonstrated moderate classification performance (AUROC ≈ 0.60), whereas non-linear approaches, including ANN and XGBoost, exhibited marginal improvements. Among all models, XGBoost combined with bootstrap resampling achieved the highest performance, yielding an AUROC of 0.85 (95 % CI: 0.57-0.99, p < 0.001), indicating a significant improvement in classification accuracy. Metabolite importance, derived from Shapley Additive Explanations (SHAP), consistently identified acylcarnitines and amino acid derivatives as principal discriminative features. Pathway analysis revealed disruptions to tyrosine metabolism as well as phenylalanine, tyrosine and tryptophan biosynthesis to be associated with preterm delivery.Conclusions: Our results highlight the complexity of metabolomics-based modelling for preterm birth and support an iterative, model-driven approach for optimizing predictive accuracy in small-scale clinical datasets.\",\"PeriodicalId\":10715,\"journal\":{\"name\":\"Computational and structural biotechnology journal\",\"volume\":\"27 \",\"pages\":\"3240-3250\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12312043/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational and structural biotechnology journal\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.csbj.2025.07.010\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and structural biotechnology journal","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.csbj.2025.07.010","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景：随着算法和计算的进步，机器学习（ML）在生命科学研究中的应用越来越多。本研究利用妊娠晚期收集的血清非靶向代谢组学研究了几种ML模型在预测早产方面的有效性。方法：对来自All Our Families队列（Calgary， AB）的48名早产儿和102名足月分娩母亲的样本进行了检查。四种机器学习算法：偏最小二乘判别分析（PLS-DA）、线性逻辑回归、人工神经网络（ANN）、极端梯度增强（XGBoost）——带和不带自举重采样，用于检查小规模临床数据集的模型性能和代谢物解释。结果：基于混淆矩阵、接收者工作特征（AUROC）曲线下面积分析和特征重要性排名来评估模型的性能。线性模型如PLS-DA和逻辑回归表现出中等的分类性能（AUROC≈0.60），而非线性方法，包括ANN和XGBoost，表现出边际改进。在所有模型中，XGBoost结合bootstrap重采样获得了最高的性能，AUROC为0.85（95 % CI: 0.57-0.99, p ）。结论：我们的研究结果突出了基于代谢组学的早产建模的复杂性，并支持迭代的、模型驱动的方法来优化小规模临床数据集的预测准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Comparative analysis of machine learning techniques in metabolomic-based preterm birth prediction.

查看原文本刊更多论文

Comparative analysis of machine learning techniques in metabolomic-based preterm birth prediction.

Background: Machine learning (ML), with advancements in algorithms and computations, is seeing an increased presence in life science research. This study investigated several ML models' efficacy in predicting preterm birth using untargeted metabolomics from serum collected during the third trimester of gestation.

Methods: Samples from 48 preterm and 102 term delivery mothers from the All Our Families Cohort (Calgary, AB) were examined. Four ML algorithms: Partial Least Squares Discriminant Analysis (PLS-DA), linear logistic regression, artificial neural networks (ANN), Extreme Gradient Boosting (XGBoost) - with and without bootstrap resampling were used to examine the small-scale clinical dataset for both model performance and metabolite interpretation.

Results: Model performance was evaluated based on confusion matrices, area under the receiver operating characteristic (AUROC) curve analysis, and feature importance rankings. Linear models such as PLS-DA and logistic regression demonstrated moderate classification performance (AUROC ≈ 0.60), whereas non-linear approaches, including ANN and XGBoost, exhibited marginal improvements. Among all models, XGBoost combined with bootstrap resampling achieved the highest performance, yielding an AUROC of 0.85 (95 % CI: 0.57-0.99, p < 0.001), indicating a significant improvement in classification accuracy. Metabolite importance, derived from Shapley Additive Explanations (SHAP), consistently identified acylcarnitines and amino acid derivatives as principal discriminative features. Pathway analysis revealed disruptions to tyrosine metabolism as well as phenylalanine, tyrosine and tryptophan biosynthesis to be associated with preterm delivery.

Conclusions: Our results highlight the complexity of metabolomics-based modelling for preterm birth and support an iterative, model-driven approach for optimizing predictive accuracy in small-scale clinical datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational and structural biotechnology journal Biochemistry, Genetics and Molecular Biology-Biophysics

CiteScore

9.30

自引率

3.30%

发文量

540

审稿时长

6 weeks

期刊介绍： Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to: Structure and function of proteins, nucleic acids and other macromolecules Structure and function of multi-component complexes Protein folding, processing and degradation Enzymology Computational and structural studies of plant systems Microbial Informatics Genomics Proteomics Metabolomics Algorithms and Hypothesis in Bioinformatics Mathematical and Theoretical Biology Computational Chemistry and Drug Discovery Microscopy and Molecular Imaging Nanotechnology Systems and Synthetic Biology