利用机器学习集合方法改进甲基丙二酸血症患者的二级分类。

IF 6.1 2区医学 Q1 PEDIATRICS

World Journal of Pediatrics Pub Date : 2024-10-01 Epub Date: 2024-02-24 DOI:10.1007/s12519-023-00788-6

Zhi-Xing Zhu, Georgi Z Genchev, Yan-Min Wang, Wei Ji, Yong-Yong Ren, Guo-Li Tian, Sira Sriswasdi, Hui Lu

{"title":"利用机器学习集合方法改进甲基丙二酸血症患者的二级分类。","authors":"Zhi-Xing Zhu, Georgi Z Genchev, Yan-Min Wang, Wei Ji, Yong-Yong Ren, Guo-Li Tian, Sira Sriswasdi, Hui Lu","doi":"10.1007/s12519-023-00788-6","DOIUrl":null,"url":null,"abstract":"Introduction: Methylmalonic acidemia (MMA) is a disorder of autosomal recessive inheritance, with an estimated prevalence of 1:50,000. First-tier clinical diagnostic tests often return many false positives [five false positive (FP): one true positive (TP)]. In this work, our goal was to refine a classification model that can minimize the number of false positives, currently an unmet need in the upstream diagnostics of MMA.Methods: We developed machine learning multivariable screening models for MMA with utility as a secondary-tier tool for false positives reduction. We utilized mass spectrometry-based features consisting of 11 amino acids and 31 carnitines derived from dried blood samples of neonatal patients, followed by additional ratio feature construction. Feature selection strategies (selection by filter, recursive feature elimination, and learned vector quantization) were used to determine the input set for evaluating the performance of 14 classification models to identify a candidate model set for an ensemble model development.Results: Our work identified computational models that explore metabolic analytes to reduce the number of false positives without compromising sensitivity. The best results [area under the receiver operating characteristic curve (AUROC) of 97%, sensitivity of 92%, and specificity of 95%] were obtained utilizing an ensemble of the algorithms random forest, C5.0, sparse linear discriminant analysis, and autoencoder deep neural network stacked with the algorithm stochastic gradient boosting as the supervisor. The model achieved a good performance trade-off for a screening application with 6% false-positive rate (FPR) at 95% sensitivity, 35% FPR at 99% sensitivity, and 39% FPR at 100% sensitivity.Conclusions: The classification results and approach of this research can be utilized by clinicians globally, to improve the overall discovery of MMA in pediatric patients. The improved method, when adjusted to 100% precision, can be used to further inform the diagnostic process journey of MMA and help reduce the burden for patients and their families.","PeriodicalId":23883,"journal":{"name":"World Journal of Pediatrics","volume":" ","pages":"1090-1101"},"PeriodicalIF":6.1000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502559/pdf/","citationCount":"0","resultStr":"{\"title\":\"Improving the second-tier classification of methylmalonic acidemia patients using a machine learning ensemble method.\",\"authors\":\"Zhi-Xing Zhu, Georgi Z Genchev, Yan-Min Wang, Wei Ji, Yong-Yong Ren, Guo-Li Tian, Sira Sriswasdi, Hui Lu\",\"doi\":\"10.1007/s12519-023-00788-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: Methylmalonic acidemia (MMA) is a disorder of autosomal recessive inheritance, with an estimated prevalence of 1:50,000. First-tier clinical diagnostic tests often return many false positives [five false positive (FP): one true positive (TP)]. In this work, our goal was to refine a classification model that can minimize the number of false positives, currently an unmet need in the upstream diagnostics of MMA.Methods: We developed machine learning multivariable screening models for MMA with utility as a secondary-tier tool for false positives reduction. We utilized mass spectrometry-based features consisting of 11 amino acids and 31 carnitines derived from dried blood samples of neonatal patients, followed by additional ratio feature construction. Feature selection strategies (selection by filter, recursive feature elimination, and learned vector quantization) were used to determine the input set for evaluating the performance of 14 classification models to identify a candidate model set for an ensemble model development.Results: Our work identified computational models that explore metabolic analytes to reduce the number of false positives without compromising sensitivity. The best results [area under the receiver operating characteristic curve (AUROC) of 97%, sensitivity of 92%, and specificity of 95%] were obtained utilizing an ensemble of the algorithms random forest, C5.0, sparse linear discriminant analysis, and autoencoder deep neural network stacked with the algorithm stochastic gradient boosting as the supervisor. The model achieved a good performance trade-off for a screening application with 6% false-positive rate (FPR) at 95% sensitivity, 35% FPR at 99% sensitivity, and 39% FPR at 100% sensitivity.Conclusions: The classification results and approach of this research can be utilized by clinicians globally, to improve the overall discovery of MMA in pediatric patients. The improved method, when adjusted to 100% precision, can be used to further inform the diagnostic process journey of MMA and help reduce the burden for patients and their families.\",\"PeriodicalId\":23883,\"journal\":{\"name\":\"World Journal of Pediatrics\",\"volume\":\" \",\"pages\":\"1090-1101\"},\"PeriodicalIF\":6.1000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502559/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"World Journal of Pediatrics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s12519-023-00788-6\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/2/24 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"PEDIATRICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Journal of Pediatrics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s12519-023-00788-6","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/24 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PEDIATRICS","Score":null,"Total":0}

引用次数: 0

摘要

简介：甲基丙二酸血症（MMA）是一种常染色体隐性遗传疾病：甲基丙二酸血症（MMA）是一种常染色体隐性遗传疾病，发病率约为 1:50,000。一级临床诊断测试通常会出现许多假阳性[五个假阳性（FP）：一个真阳性（TP）]。在这项工作中，我们的目标是改进一种分类模型，以尽量减少假阳性的数量，这是目前 MMA 上游诊断中尚未满足的需求：我们开发了针对 MMA 的机器学习多变量筛选模型，作为减少假阳性的二级工具。我们利用了基于质谱的特征，其中包括从新生儿患者的干血样中提取的 11 种氨基酸和 31 种肉毒碱，然后再构建额外的比值特征。我们采用特征选择策略（过滤选择、递归特征消除和学习向量量化）来确定输入集，以评估 14 个分类模型的性能，从而确定用于开发集合模型的候选模型集：我们的工作确定了探索新陈代谢分析物的计算模型，以在不影响灵敏度的情况下减少假阳性的数量。利用随机森林算法、C5.0 算法、稀疏线性判别分析算法和自动编码器深度神经网络算法的集合，并以随机梯度提升算法作为监督算法，获得了最佳结果[接收者操作特征曲线下面积（AUROC）为 97%，灵敏度为 92%，特异性为 95%]。该模型在筛选应用中实现了良好的性能权衡，在灵敏度为 95% 时，假阳性率（FPR）为 6%；在灵敏度为 99% 时，假阳性率（FPR）为 35%；在灵敏度为 100% 时，假阳性率（FPR）为 39%：这项研究的分类结果和方法可供全球临床医生使用，以提高儿科患者MMA的整体发现率。改进后的方法在调整到 100% 精确度后，可用于进一步指导 MMA 的诊断过程，并帮助减轻患者及其家属的负担。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Improving the second-tier classification of methylmalonic acidemia patients using a machine learning ensemble method.

查看原文本刊更多论文

Improving the second-tier classification of methylmalonic acidemia patients using a machine learning ensemble method.

Introduction: Methylmalonic acidemia (MMA) is a disorder of autosomal recessive inheritance, with an estimated prevalence of 1:50,000. First-tier clinical diagnostic tests often return many false positives [five false positive (FP): one true positive (TP)]. In this work, our goal was to refine a classification model that can minimize the number of false positives, currently an unmet need in the upstream diagnostics of MMA.

Methods: We developed machine learning multivariable screening models for MMA with utility as a secondary-tier tool for false positives reduction. We utilized mass spectrometry-based features consisting of 11 amino acids and 31 carnitines derived from dried blood samples of neonatal patients, followed by additional ratio feature construction. Feature selection strategies (selection by filter, recursive feature elimination, and learned vector quantization) were used to determine the input set for evaluating the performance of 14 classification models to identify a candidate model set for an ensemble model development.

Results: Our work identified computational models that explore metabolic analytes to reduce the number of false positives without compromising sensitivity. The best results [area under the receiver operating characteristic curve (AUROC) of 97%, sensitivity of 92%, and specificity of 95%] were obtained utilizing an ensemble of the algorithms random forest, C5.0, sparse linear discriminant analysis, and autoencoder deep neural network stacked with the algorithm stochastic gradient boosting as the supervisor. The model achieved a good performance trade-off for a screening application with 6% false-positive rate (FPR) at 95% sensitivity, 35% FPR at 99% sensitivity, and 39% FPR at 100% sensitivity.

Conclusions: The classification results and approach of this research can be utilized by clinicians globally, to improve the overall discovery of MMA in pediatric patients. The improved method, when adjusted to 100% precision, can be used to further inform the diagnostic process journey of MMA and help reduce the burden for patients and their families.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

World Journal of Pediatrics 医学-小儿科

CiteScore

10.50

自引率

1.10%

发文量

592

审稿时长

2.5 months

期刊介绍： The World Journal of Pediatrics, a monthly publication, is dedicated to disseminating peer-reviewed original papers, reviews, and special reports focusing on clinical practice and research in pediatrics. We welcome contributions from pediatricians worldwide on new developments across all areas of pediatrics, including pediatric surgery, preventive healthcare, pharmacology, stomatology, and biomedicine. The journal also covers basic sciences and experimental work, serving as a comprehensive academic platform for the international exchange of medical findings.