geneEX: An Integrated Phenotype-Driven Algorithm for Rapid Identification of Causative Variants in Monogenic Disorders.

IF 1.6 4区 医学 Q4 GENETICS & HEREDITY
Junyu Zhang, Dongyun Liu, Mei Chen, Yunqian Fang, Kun Dai, Xiaoxi Zhu, Qingqing Xu, Meiling Hou, Li Wang, Jianfeng Wang, Jun Zhang, Bo Liang, Xiaoming Teng
{"title":"geneEX: An Integrated Phenotype-Driven Algorithm for Rapid Identification of Causative Variants in Monogenic Disorders.","authors":"Junyu Zhang, Dongyun Liu, Mei Chen, Yunqian Fang, Kun Dai, Xiaoxi Zhu, Qingqing Xu, Meiling Hou, Li Wang, Jianfeng Wang, Jun Zhang, Bo Liang, Xiaoming Teng","doi":"10.1002/mgg3.70139","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>In the diagnostic process of monogenic genetic disorders, identifying pathogenic variants is a crucial step. Thanks to the widespread adoption of Next-Generation Sequencing (NGS) technology, diagnostic efficiency has been significantly enhanced. However, with the increasing demand for diagnostic accuracy in clinical practice for monogenic genetic diseases, accurately and swiftly pinpointing pathogenic variants among numerous candidate variants remains a significant challenge. The complexity of data analysis and interpretation continues to limit both the efficiency and accuracy of diagnosis.</p><p><strong>Methods: </strong>In this study, we have developed an innovative phenotype-driven algorithm, geneEX. This algorithm integrates large language model technology to accurately extract phenotypes from clinical information and automatically acquire Human Phenotype Ontology (HPO) information through a semantic vector representation model, thereby identifying HPO-associated genes. Additionally, it supports semantic matching between patients' free-text phenotypic descriptions and disease phenotypes, further enhancing the identification of pathogenic genes. The algorithm can rank candidate causative variants, enabling rapid and precise identification of potential pathogenic variants in rare genetic disorders.</p><p><strong>Results: </strong>geneEX demonstrates commendable performance in ranking pathogenic variants across both virtual and clinical datasets. The supplementary matching of phenotypes in free-text form significantly enhances the precision of candidate variant prioritization for samples.</p><p><strong>Conclusion: </strong>geneEX has achieved automated HPO acquisition through its independently developed phenotype extraction and standardization methods, thereby enabling the full-process automated identification from clinical samples to pathogenic variants. Additionally, by integrating free-text phenotypic descriptions with disease phenotype matching, it enhances the accuracy of pathogenic gene identification. This innovative approach significantly improves the precision and efficiency of identifying pathogenic variants in rare genetic disorders, providing robust support for the diagnosis of monogenic diseases.</p>","PeriodicalId":18852,"journal":{"name":"Molecular Genetics & Genomic Medicine","volume":"13 9","pages":"e70139"},"PeriodicalIF":1.6000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12451470/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Genetics & Genomic Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/mgg3.70139","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: In the diagnostic process of monogenic genetic disorders, identifying pathogenic variants is a crucial step. Thanks to the widespread adoption of Next-Generation Sequencing (NGS) technology, diagnostic efficiency has been significantly enhanced. However, with the increasing demand for diagnostic accuracy in clinical practice for monogenic genetic diseases, accurately and swiftly pinpointing pathogenic variants among numerous candidate variants remains a significant challenge. The complexity of data analysis and interpretation continues to limit both the efficiency and accuracy of diagnosis.

Methods: In this study, we have developed an innovative phenotype-driven algorithm, geneEX. This algorithm integrates large language model technology to accurately extract phenotypes from clinical information and automatically acquire Human Phenotype Ontology (HPO) information through a semantic vector representation model, thereby identifying HPO-associated genes. Additionally, it supports semantic matching between patients' free-text phenotypic descriptions and disease phenotypes, further enhancing the identification of pathogenic genes. The algorithm can rank candidate causative variants, enabling rapid and precise identification of potential pathogenic variants in rare genetic disorders.

Results: geneEX demonstrates commendable performance in ranking pathogenic variants across both virtual and clinical datasets. The supplementary matching of phenotypes in free-text form significantly enhances the precision of candidate variant prioritization for samples.

Conclusion: geneEX has achieved automated HPO acquisition through its independently developed phenotype extraction and standardization methods, thereby enabling the full-process automated identification from clinical samples to pathogenic variants. Additionally, by integrating free-text phenotypic descriptions with disease phenotype matching, it enhances the accuracy of pathogenic gene identification. This innovative approach significantly improves the precision and efficiency of identifying pathogenic variants in rare genetic disorders, providing robust support for the diagnosis of monogenic diseases.

Abstract Image

Abstract Image

Abstract Image

geneEX:用于快速识别单基因疾病致病变异的综合表型驱动算法。
背景:在单基因遗传病的诊断过程中,识别致病变异是至关重要的一步。由于新一代测序(NGS)技术的广泛采用,诊断效率得到了显著提高。然而,随着临床对单基因遗传病诊断准确性的要求越来越高,在众多候选变异中准确、快速地确定致病变异仍然是一个重大挑战。数据分析和解释的复杂性继续限制了诊断的效率和准确性。方法:在本研究中,我们开发了一种创新的表型驱动算法geneEX。该算法结合大型语言模型技术,从临床信息中准确提取表型,并通过语义向量表示模型自动获取人类表型本体(Human Phenotype Ontology, HPO)信息,从而识别HPO相关基因。此外,它支持患者的自由文本表型描述与疾病表型之间的语义匹配,进一步增强了致病基因的识别。该算法可以对候选致病变异进行排序,从而能够快速准确地识别罕见遗传疾病的潜在致病变异。结果:geneEX在对虚拟和临床数据集的致病变异进行排名方面表现出值得称赞的性能。自由文本形式的表型互补匹配显著提高了样本候选变异优先排序的精度。结论:geneEX通过自主开发的表型提取和标准化方法,实现了HPO的自动化获取,实现了从临床样品到致病变异的全过程自动化鉴定。此外,通过将自由文本表型描述与疾病表型匹配相结合,提高了致病基因鉴定的准确性。这种创新的方法显著提高了罕见遗传疾病病原变异识别的准确性和效率,为单基因疾病的诊断提供了强有力的支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular Genetics & Genomic Medicine
Molecular Genetics & Genomic Medicine Biochemistry, Genetics and Molecular Biology-Genetics
CiteScore
4.20
自引率
0.00%
发文量
241
审稿时长
14 weeks
期刊介绍: Molecular Genetics & Genomic Medicine is a peer-reviewed journal for rapid dissemination of quality research related to the dynamically developing areas of human, molecular and medical genetics. The journal publishes original research articles covering findings in phenotypic, molecular, biological, and genomic aspects of genomic variation, inherited disorders and birth defects. The broad publishing spectrum of Molecular Genetics & Genomic Medicine includes rare and common disorders from diagnosis to treatment. Examples of appropriate articles include reports of novel disease genes, functional studies of genetic variants, in-depth genotype-phenotype studies, genomic analysis of inherited disorders, molecular diagnostic methods, medical bioinformatics, ethical, legal, and social implications (ELSI), and approaches to clinical diagnosis. Molecular Genetics & Genomic Medicine provides a scientific home for next generation sequencing studies of rare and common disorders, which will make research in this fascinating area easily and rapidly accessible to the scientific community. This will serve as the basis for translating next generation sequencing studies into individualized diagnostics and therapeutics, for day-to-day medical care. Molecular Genetics & Genomic Medicine publishes original research articles, reviews, and research methods papers, along with invited editorials and commentaries. Original research papers must report well-conducted research with conclusions supported by the data presented.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信