Thomas Minten , Sarah Bick , Sophia Adelson , Nils Gehlenborg , Laura M. Amendola , François Boemer , Alison J. Coffey , Nicolas Encina , Alessandra Ferlini , Janbernd Kirschner , Bianca E. Russell , Laurent Servais , Kristen L. Sund , Ryan J. Taft , Petros Tsipouras , Hana Zouk
{"title":"Data-driven consideration of genetic disorders for global genomic newborn screening programs","authors":"Thomas Minten , Sarah Bick , Sophia Adelson , Nils Gehlenborg , Laura M. Amendola , François Boemer , Alison J. Coffey , Nicolas Encina , Alessandra Ferlini , Janbernd Kirschner , Bianca E. Russell , Laurent Servais , Kristen L. Sund , Ryan J. Taft , Petros Tsipouras , Hana Zouk","doi":"10.1016/j.gim.2025.101443","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>Over 30 international studies are exploring newborn sequencing (NBSeq) to expand the range of genetic disorders included in newborn screening. Substantial variability in gene selection across programs exists, highlighting the need for a systematic approach to prioritize genes.</div></div><div><h3>Methods</h3><div>We assembled a data set comprising 25 characteristics about each of the 4390 genes included in 27 NBSeq programs. We used regression analysis to identify several predictors of inclusion and developed a machine learning model to rank genes for public health consideration.</div></div><div><h3>Results</h3><div>Among 27 NBSeq programs, the number of genes analyzed ranged from 134 to 4299, with only 74 (1.7%) genes included by over 80% of programs. The most significant associations with gene inclusion across programs were presence on the US Recommended Uniform Screening Panel (inclusion increase of 74.7%, CI: 71.0%-78.4%), robust evidence on the natural history (29.5%, CI: 24.6%-34.4%), and treatment efficacy (17.0%, CI: 12.3%-21.7%) of the associated genetic disease. A boosted trees machine learning model using 13 predictors achieved high accuracy in predicting gene inclusion across programs (area under the curve = 0.915, R<sup>2</sup> = 84%).</div></div><div><h3>Conclusion</h3><div>The machine learning model developed here provides a ranked list of genes that can adapt to emerging evidence and regional needs, enabling more consistent and informed gene selection in NBSeq initiatives.</div></div>","PeriodicalId":12717,"journal":{"name":"Genetics in Medicine","volume":"27 7","pages":"Article 101443"},"PeriodicalIF":6.6000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1098360025000905","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
Over 30 international studies are exploring newborn sequencing (NBSeq) to expand the range of genetic disorders included in newborn screening. Substantial variability in gene selection across programs exists, highlighting the need for a systematic approach to prioritize genes.
Methods
We assembled a data set comprising 25 characteristics about each of the 4390 genes included in 27 NBSeq programs. We used regression analysis to identify several predictors of inclusion and developed a machine learning model to rank genes for public health consideration.
Results
Among 27 NBSeq programs, the number of genes analyzed ranged from 134 to 4299, with only 74 (1.7%) genes included by over 80% of programs. The most significant associations with gene inclusion across programs were presence on the US Recommended Uniform Screening Panel (inclusion increase of 74.7%, CI: 71.0%-78.4%), robust evidence on the natural history (29.5%, CI: 24.6%-34.4%), and treatment efficacy (17.0%, CI: 12.3%-21.7%) of the associated genetic disease. A boosted trees machine learning model using 13 predictors achieved high accuracy in predicting gene inclusion across programs (area under the curve = 0.915, R2 = 84%).
Conclusion
The machine learning model developed here provides a ranked list of genes that can adapt to emerging evidence and regional needs, enabling more consistent and informed gene selection in NBSeq initiatives.
期刊介绍:
Genetics in Medicine (GIM) is the official journal of the American College of Medical Genetics and Genomics. The journal''s mission is to enhance the knowledge, understanding, and practice of medical genetics and genomics through publications in clinical and laboratory genetics and genomics, including ethical, legal, and social issues as well as public health.
GIM encourages research that combats racism, includes diverse populations and is written by authors from diverse and underrepresented backgrounds.