Improving methylmalonic acidemia (MMA) screening and MMA genotype prediction using random forest classifier in two Chinese populations.

IF 2.8 3区 医学 Q2 MEDICINE, RESEARCH & EXPERIMENTAL
Zhe Yin, Chuan Zhang, Rui Dong, Xinyuan Zhang, Yingnan Song, Shengju Hao, Zhongtao Gai, Bingbo Zhou, Ling Hui, Shifan Wang, Huiqin Xue, Zongfu Cao, Yi Liu, Xu Ma
{"title":"Improving methylmalonic acidemia (MMA) screening and MMA genotype prediction using random forest classifier in two Chinese populations.","authors":"Zhe Yin, Chuan Zhang, Rui Dong, Xinyuan Zhang, Yingnan Song, Shengju Hao, Zhongtao Gai, Bingbo Zhou, Ling Hui, Shifan Wang, Huiqin Xue, Zongfu Cao, Yi Liu, Xu Ma","doi":"10.1186/s40001-024-02115-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Methylmalonic acidemia (MMA) is one of the most common hereditary organic acid metabolism disorders that endangers the lives and health of infants and children. Early detection and intervention before the appearance of a newborn's clinical symptoms can control disease progression and prevent or mitigate its serious consequences.</p><p><strong>Methods: </strong>42,004 newborns from two Chinese populations were included in the study. The small molecular metabolite analytes were detected from the dried blood spot (DBS) samples by MS/MS. Genetic analysis of 68 Chinese MMA cases were performed by whole-exome sequencing and Sanger sequencing. Random forest classifiers (RFC) were constructed to improve the MMA screening performance and genotype prediction in two Chinese populations. Meanwhile, other six machine learning models were trained to separate MMA patients from normal newborns. Model performance was assessed using accuracy, sensitivity, specificity, false positive rate (FPR), and positive predictive value (PPV) and the area under the receiver operating characteristic curve (AUC).</p><p><strong>Results: </strong>In the total 42,004 newborn samples, 68 MMA cases were identified by genetic analysis, 42 cases of which were caused by variants in MMACHC, 24 cases by variants in MMUT, and two cases by variants in MMAA. Three novel variants including c.449T>G (p.I150R) of MMACHC, c.1151C>T (p.S384F) and c.1091_1108delins (p.Y364Sfs*4) in MMUT were identified in the MMA patients. RFC for newborn screening of MMA performed best as compared to several other classification models based on machine learning with 100% sensitivity, low FPR, excellent PPV and AUC. In addition, the subdivision RFC for MMA genotype prediction was constructed with superior performance.</p><p><strong>Conclusions: </strong>It can be seen that RFC is extremely helpful for detection and genotype prediction in the newborn MMA screening. In addition, our findings extend the variant spectrum of genes related to MMA.</p>","PeriodicalId":11949,"journal":{"name":"European Journal of Medical Research","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11552112/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Medical Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40001-024-02115-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Methylmalonic acidemia (MMA) is one of the most common hereditary organic acid metabolism disorders that endangers the lives and health of infants and children. Early detection and intervention before the appearance of a newborn's clinical symptoms can control disease progression and prevent or mitigate its serious consequences.

Methods: 42,004 newborns from two Chinese populations were included in the study. The small molecular metabolite analytes were detected from the dried blood spot (DBS) samples by MS/MS. Genetic analysis of 68 Chinese MMA cases were performed by whole-exome sequencing and Sanger sequencing. Random forest classifiers (RFC) were constructed to improve the MMA screening performance and genotype prediction in two Chinese populations. Meanwhile, other six machine learning models were trained to separate MMA patients from normal newborns. Model performance was assessed using accuracy, sensitivity, specificity, false positive rate (FPR), and positive predictive value (PPV) and the area under the receiver operating characteristic curve (AUC).

Results: In the total 42,004 newborn samples, 68 MMA cases were identified by genetic analysis, 42 cases of which were caused by variants in MMACHC, 24 cases by variants in MMUT, and two cases by variants in MMAA. Three novel variants including c.449T>G (p.I150R) of MMACHC, c.1151C>T (p.S384F) and c.1091_1108delins (p.Y364Sfs*4) in MMUT were identified in the MMA patients. RFC for newborn screening of MMA performed best as compared to several other classification models based on machine learning with 100% sensitivity, low FPR, excellent PPV and AUC. In addition, the subdivision RFC for MMA genotype prediction was constructed with superior performance.

Conclusions: It can be seen that RFC is extremely helpful for detection and genotype prediction in the newborn MMA screening. In addition, our findings extend the variant spectrum of genes related to MMA.

在两个中国人群中使用随机森林分类器改进甲基丙二酸血症(MMA)筛查和 MMA 基因型预测。
背景:甲基丙二酸血症(MMA)是最常见的遗传性有机酸代谢疾病之一,危及婴幼儿的生命和健康。在新生儿出现临床症状之前进行早期发现和干预,可以控制疾病的发展,预防或减轻其严重后果。采用 MS/MS 方法检测干血斑样本中的小分子代谢物。通过全外显子组测序和桑格测序对 68 例中国 MMA 进行了遗传分析。通过构建随机森林分类器(RFC),提高了MMA在两个中国人群中的筛查性能和基因型预测能力。同时,还训练了其他六个机器学习模型来区分 MMA 患者和正常新生儿。用准确率、灵敏度、特异性、假阳性率(FPR)、阳性预测值(PPV)和接收者工作特征曲线下面积(AUC)来评估模型的性能:在总共 42004 份新生儿样本中,通过基因分析确定了 68 例 MMA 病例,其中 42 例由 MMACHC 变异引起,24 例由 MMUT 变异引起,2 例由 MMAA 变异引起。在 MMA 患者中发现了三个新变异,包括 MMACHC 的 c.449T>G (p.I150R)、MMUT 的 c.1151C>T (p.S384F) 和 c.1091_1108delins (p.Y364Sfs*4)。与其他几种基于机器学习的分类模型相比,RFC 在新生儿 MMA 筛查中表现最佳,灵敏度达 100%,FPR 低,PPV 和 AUC 极佳。此外,用于 MMA 基因型预测的细分 RFC 也表现出色:可以看出,RFC 对新生儿 MMA 筛查中的检测和基因型预测非常有帮助。此外,我们的研究结果还扩展了与 MMA 相关基因的变异谱。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
European Journal of Medical Research
European Journal of Medical Research 医学-医学:研究与实验
CiteScore
3.20
自引率
0.00%
发文量
247
审稿时长
>12 weeks
期刊介绍: European Journal of Medical Research publishes translational and clinical research of international interest across all medical disciplines, enabling clinicians and other researchers to learn about developments and innovations within these disciplines and across the boundaries between disciplines. The journal publishes high quality research and reviews and aims to ensure that the results of all well-conducted research are published, regardless of their outcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信