{"title":"Improving methylmalonic acidemia (MMA) screening and MMA genotype prediction using random forest classifier in two Chinese populations.","authors":"Zhe Yin, Chuan Zhang, Rui Dong, Xinyuan Zhang, Yingnan Song, Shengju Hao, Zhongtao Gai, Bingbo Zhou, Ling Hui, Shifan Wang, Huiqin Xue, Zongfu Cao, Yi Liu, Xu Ma","doi":"10.1186/s40001-024-02115-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Methylmalonic acidemia (MMA) is one of the most common hereditary organic acid metabolism disorders that endangers the lives and health of infants and children. Early detection and intervention before the appearance of a newborn's clinical symptoms can control disease progression and prevent or mitigate its serious consequences.</p><p><strong>Methods: </strong>42,004 newborns from two Chinese populations were included in the study. The small molecular metabolite analytes were detected from the dried blood spot (DBS) samples by MS/MS. Genetic analysis of 68 Chinese MMA cases were performed by whole-exome sequencing and Sanger sequencing. Random forest classifiers (RFC) were constructed to improve the MMA screening performance and genotype prediction in two Chinese populations. Meanwhile, other six machine learning models were trained to separate MMA patients from normal newborns. Model performance was assessed using accuracy, sensitivity, specificity, false positive rate (FPR), and positive predictive value (PPV) and the area under the receiver operating characteristic curve (AUC).</p><p><strong>Results: </strong>In the total 42,004 newborn samples, 68 MMA cases were identified by genetic analysis, 42 cases of which were caused by variants in MMACHC, 24 cases by variants in MMUT, and two cases by variants in MMAA. Three novel variants including c.449T>G (p.I150R) of MMACHC, c.1151C>T (p.S384F) and c.1091_1108delins (p.Y364Sfs*4) in MMUT were identified in the MMA patients. RFC for newborn screening of MMA performed best as compared to several other classification models based on machine learning with 100% sensitivity, low FPR, excellent PPV and AUC. In addition, the subdivision RFC for MMA genotype prediction was constructed with superior performance.</p><p><strong>Conclusions: </strong>It can be seen that RFC is extremely helpful for detection and genotype prediction in the newborn MMA screening. In addition, our findings extend the variant spectrum of genes related to MMA.</p>","PeriodicalId":11949,"journal":{"name":"European Journal of Medical Research","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11552112/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Medical Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40001-024-02115-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Methylmalonic acidemia (MMA) is one of the most common hereditary organic acid metabolism disorders that endangers the lives and health of infants and children. Early detection and intervention before the appearance of a newborn's clinical symptoms can control disease progression and prevent or mitigate its serious consequences.
Methods: 42,004 newborns from two Chinese populations were included in the study. The small molecular metabolite analytes were detected from the dried blood spot (DBS) samples by MS/MS. Genetic analysis of 68 Chinese MMA cases were performed by whole-exome sequencing and Sanger sequencing. Random forest classifiers (RFC) were constructed to improve the MMA screening performance and genotype prediction in two Chinese populations. Meanwhile, other six machine learning models were trained to separate MMA patients from normal newborns. Model performance was assessed using accuracy, sensitivity, specificity, false positive rate (FPR), and positive predictive value (PPV) and the area under the receiver operating characteristic curve (AUC).
Results: In the total 42,004 newborn samples, 68 MMA cases were identified by genetic analysis, 42 cases of which were caused by variants in MMACHC, 24 cases by variants in MMUT, and two cases by variants in MMAA. Three novel variants including c.449T>G (p.I150R) of MMACHC, c.1151C>T (p.S384F) and c.1091_1108delins (p.Y364Sfs*4) in MMUT were identified in the MMA patients. RFC for newborn screening of MMA performed best as compared to several other classification models based on machine learning with 100% sensitivity, low FPR, excellent PPV and AUC. In addition, the subdivision RFC for MMA genotype prediction was constructed with superior performance.
Conclusions: It can be seen that RFC is extremely helpful for detection and genotype prediction in the newborn MMA screening. In addition, our findings extend the variant spectrum of genes related to MMA.
期刊介绍:
European Journal of Medical Research publishes translational and clinical research of international interest across all medical disciplines, enabling clinicians and other researchers to learn about developments and innovations within these disciplines and across the boundaries between disciplines. The journal publishes high quality research and reviews and aims to ensure that the results of all well-conducted research are published, regardless of their outcome.