Data-driven consideration of genetic disorders for global genomic newborn screening programs

IF 6.6 1区 医学 Q1 GENETICS & HEREDITY
Thomas Minten , Sarah Bick , Sophia Adelson , Nils Gehlenborg , Laura M. Amendola , François Boemer , Alison J. Coffey , Nicolas Encina , Alessandra Ferlini , Janbernd Kirschner , Bianca E. Russell , Laurent Servais , Kristen L. Sund , Ryan J. Taft , Petros Tsipouras , Hana Zouk
{"title":"Data-driven consideration of genetic disorders for global genomic newborn screening programs","authors":"Thomas Minten ,&nbsp;Sarah Bick ,&nbsp;Sophia Adelson ,&nbsp;Nils Gehlenborg ,&nbsp;Laura M. Amendola ,&nbsp;François Boemer ,&nbsp;Alison J. Coffey ,&nbsp;Nicolas Encina ,&nbsp;Alessandra Ferlini ,&nbsp;Janbernd Kirschner ,&nbsp;Bianca E. Russell ,&nbsp;Laurent Servais ,&nbsp;Kristen L. Sund ,&nbsp;Ryan J. Taft ,&nbsp;Petros Tsipouras ,&nbsp;Hana Zouk","doi":"10.1016/j.gim.2025.101443","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>Over 30 international studies are exploring newborn sequencing (NBSeq) to expand the range of genetic disorders included in newborn screening. Substantial variability in gene selection across programs exists, highlighting the need for a systematic approach to prioritize genes.</div></div><div><h3>Methods</h3><div>We assembled a data set comprising 25 characteristics about each of the 4390 genes included in 27 NBSeq programs. We used regression analysis to identify several predictors of inclusion and developed a machine learning model to rank genes for public health consideration.</div></div><div><h3>Results</h3><div>Among 27 NBSeq programs, the number of genes analyzed ranged from 134 to 4299, with only 74 (1.7%) genes included by over 80% of programs. The most significant associations with gene inclusion across programs were presence on the US Recommended Uniform Screening Panel (inclusion increase of 74.7%, CI: 71.0%-78.4%), robust evidence on the natural history (29.5%, CI: 24.6%-34.4%), and treatment efficacy (17.0%, CI: 12.3%-21.7%) of the associated genetic disease. A boosted trees machine learning model using 13 predictors achieved high accuracy in predicting gene inclusion across programs (area under the curve = 0.915, R<sup>2</sup> = 84%).</div></div><div><h3>Conclusion</h3><div>The machine learning model developed here provides a ranked list of genes that can adapt to emerging evidence and regional needs, enabling more consistent and informed gene selection in NBSeq initiatives.</div></div>","PeriodicalId":12717,"journal":{"name":"Genetics in Medicine","volume":"27 7","pages":"Article 101443"},"PeriodicalIF":6.6000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1098360025000905","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

Over 30 international studies are exploring newborn sequencing (NBSeq) to expand the range of genetic disorders included in newborn screening. Substantial variability in gene selection across programs exists, highlighting the need for a systematic approach to prioritize genes.

Methods

We assembled a data set comprising 25 characteristics about each of the 4390 genes included in 27 NBSeq programs. We used regression analysis to identify several predictors of inclusion and developed a machine learning model to rank genes for public health consideration.

Results

Among 27 NBSeq programs, the number of genes analyzed ranged from 134 to 4299, with only 74 (1.7%) genes included by over 80% of programs. The most significant associations with gene inclusion across programs were presence on the US Recommended Uniform Screening Panel (inclusion increase of 74.7%, CI: 71.0%-78.4%), robust evidence on the natural history (29.5%, CI: 24.6%-34.4%), and treatment efficacy (17.0%, CI: 12.3%-21.7%) of the associated genetic disease. A boosted trees machine learning model using 13 predictors achieved high accuracy in predicting gene inclusion across programs (area under the curve = 0.915, R2 = 84%).

Conclusion

The machine learning model developed here provides a ranked list of genes that can adapt to emerging evidence and regional needs, enabling more consistent and informed gene selection in NBSeq initiatives.
数据驱动的考虑遗传疾病的全球基因组新生儿筛查计划。
目的:超过30项国际研究正在探索新生儿测序(NBSeq),以扩大新生儿筛查中包括的遗传疾病的范围。基因选择在不同的程序中存在大量的可变性,这突出了对基因优先排序的系统方法的需要。方法:我们收集了一个数据集,包括27个NBSeq程序中包含的4,390个基因中的每个基因的25个特征。我们使用回归分析来确定纳入的几个预测因素,并开发了一个机器学习模型来对公共卫生考虑的基因进行排序。结果:在27个NBSeq程序中,分析的基因数量从134到4,299不等,只有74个(1.7%)基因被超过80%的程序包含。在美国推荐的统一筛选小组中,基因包含最显著的关联存在(纳入增加74.7%,CI: 71.0%-78.4%),相关遗传疾病的自然史(29.5%,CI: 24.6%-34.4%)和治疗效果(17.0%,CI: 12.3%-21.7%)的有力证据。使用13个预测因子的增强树机器学习模型在预测跨程序的基因包含方面取得了很高的准确性(AUC = 0.915, R2 = 84%)。结论:本文开发的机器学习模型提供了一个基因排序列表,可以适应新出现的证据和区域需求,从而在NBSeq计划中实现更一致和更明智的基因选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Genetics in Medicine
Genetics in Medicine 医学-遗传学
CiteScore
15.20
自引率
6.80%
发文量
857
审稿时长
1.3 weeks
期刊介绍: Genetics in Medicine (GIM) is the official journal of the American College of Medical Genetics and Genomics. The journal''s mission is to enhance the knowledge, understanding, and practice of medical genetics and genomics through publications in clinical and laboratory genetics and genomics, including ethical, legal, and social issues as well as public health. GIM encourages research that combats racism, includes diverse populations and is written by authors from diverse and underrepresented backgrounds.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信