Junyu Zhang, Dongyun Liu, Mei Chen, Yunqian Fang, Kun Dai, Xiaoxi Zhu, Qingqing Xu, Meiling Hou, Li Wang, Jianfeng Wang, Jun Zhang, Bo Liang, Xiaoming Teng
{"title":"geneEX:用于快速识别单基因疾病致病变异的综合表型驱动算法。","authors":"Junyu Zhang, Dongyun Liu, Mei Chen, Yunqian Fang, Kun Dai, Xiaoxi Zhu, Qingqing Xu, Meiling Hou, Li Wang, Jianfeng Wang, Jun Zhang, Bo Liang, Xiaoming Teng","doi":"10.1002/mgg3.70139","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>In the diagnostic process of monogenic genetic disorders, identifying pathogenic variants is a crucial step. Thanks to the widespread adoption of Next-Generation Sequencing (NGS) technology, diagnostic efficiency has been significantly enhanced. However, with the increasing demand for diagnostic accuracy in clinical practice for monogenic genetic diseases, accurately and swiftly pinpointing pathogenic variants among numerous candidate variants remains a significant challenge. The complexity of data analysis and interpretation continues to limit both the efficiency and accuracy of diagnosis.</p><p><strong>Methods: </strong>In this study, we have developed an innovative phenotype-driven algorithm, geneEX. This algorithm integrates large language model technology to accurately extract phenotypes from clinical information and automatically acquire Human Phenotype Ontology (HPO) information through a semantic vector representation model, thereby identifying HPO-associated genes. Additionally, it supports semantic matching between patients' free-text phenotypic descriptions and disease phenotypes, further enhancing the identification of pathogenic genes. The algorithm can rank candidate causative variants, enabling rapid and precise identification of potential pathogenic variants in rare genetic disorders.</p><p><strong>Results: </strong>geneEX demonstrates commendable performance in ranking pathogenic variants across both virtual and clinical datasets. The supplementary matching of phenotypes in free-text form significantly enhances the precision of candidate variant prioritization for samples.</p><p><strong>Conclusion: </strong>geneEX has achieved automated HPO acquisition through its independently developed phenotype extraction and standardization methods, thereby enabling the full-process automated identification from clinical samples to pathogenic variants. Additionally, by integrating free-text phenotypic descriptions with disease phenotype matching, it enhances the accuracy of pathogenic gene identification. This innovative approach significantly improves the precision and efficiency of identifying pathogenic variants in rare genetic disorders, providing robust support for the diagnosis of monogenic diseases.</p>","PeriodicalId":18852,"journal":{"name":"Molecular Genetics & Genomic Medicine","volume":"13 9","pages":"e70139"},"PeriodicalIF":1.6000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12451470/pdf/","citationCount":"0","resultStr":"{\"title\":\"geneEX: An Integrated Phenotype-Driven Algorithm for Rapid Identification of Causative Variants in Monogenic Disorders.\",\"authors\":\"Junyu Zhang, Dongyun Liu, Mei Chen, Yunqian Fang, Kun Dai, Xiaoxi Zhu, Qingqing Xu, Meiling Hou, Li Wang, Jianfeng Wang, Jun Zhang, Bo Liang, Xiaoming Teng\",\"doi\":\"10.1002/mgg3.70139\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>In the diagnostic process of monogenic genetic disorders, identifying pathogenic variants is a crucial step. Thanks to the widespread adoption of Next-Generation Sequencing (NGS) technology, diagnostic efficiency has been significantly enhanced. However, with the increasing demand for diagnostic accuracy in clinical practice for monogenic genetic diseases, accurately and swiftly pinpointing pathogenic variants among numerous candidate variants remains a significant challenge. The complexity of data analysis and interpretation continues to limit both the efficiency and accuracy of diagnosis.</p><p><strong>Methods: </strong>In this study, we have developed an innovative phenotype-driven algorithm, geneEX. This algorithm integrates large language model technology to accurately extract phenotypes from clinical information and automatically acquire Human Phenotype Ontology (HPO) information through a semantic vector representation model, thereby identifying HPO-associated genes. Additionally, it supports semantic matching between patients' free-text phenotypic descriptions and disease phenotypes, further enhancing the identification of pathogenic genes. The algorithm can rank candidate causative variants, enabling rapid and precise identification of potential pathogenic variants in rare genetic disorders.</p><p><strong>Results: </strong>geneEX demonstrates commendable performance in ranking pathogenic variants across both virtual and clinical datasets. The supplementary matching of phenotypes in free-text form significantly enhances the precision of candidate variant prioritization for samples.</p><p><strong>Conclusion: </strong>geneEX has achieved automated HPO acquisition through its independently developed phenotype extraction and standardization methods, thereby enabling the full-process automated identification from clinical samples to pathogenic variants. Additionally, by integrating free-text phenotypic descriptions with disease phenotype matching, it enhances the accuracy of pathogenic gene identification. This innovative approach significantly improves the precision and efficiency of identifying pathogenic variants in rare genetic disorders, providing robust support for the diagnosis of monogenic diseases.</p>\",\"PeriodicalId\":18852,\"journal\":{\"name\":\"Molecular Genetics & Genomic Medicine\",\"volume\":\"13 9\",\"pages\":\"e70139\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12451470/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Genetics & Genomic Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/mgg3.70139\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Genetics & Genomic Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/mgg3.70139","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
geneEX: An Integrated Phenotype-Driven Algorithm for Rapid Identification of Causative Variants in Monogenic Disorders.
Background: In the diagnostic process of monogenic genetic disorders, identifying pathogenic variants is a crucial step. Thanks to the widespread adoption of Next-Generation Sequencing (NGS) technology, diagnostic efficiency has been significantly enhanced. However, with the increasing demand for diagnostic accuracy in clinical practice for monogenic genetic diseases, accurately and swiftly pinpointing pathogenic variants among numerous candidate variants remains a significant challenge. The complexity of data analysis and interpretation continues to limit both the efficiency and accuracy of diagnosis.
Methods: In this study, we have developed an innovative phenotype-driven algorithm, geneEX. This algorithm integrates large language model technology to accurately extract phenotypes from clinical information and automatically acquire Human Phenotype Ontology (HPO) information through a semantic vector representation model, thereby identifying HPO-associated genes. Additionally, it supports semantic matching between patients' free-text phenotypic descriptions and disease phenotypes, further enhancing the identification of pathogenic genes. The algorithm can rank candidate causative variants, enabling rapid and precise identification of potential pathogenic variants in rare genetic disorders.
Results: geneEX demonstrates commendable performance in ranking pathogenic variants across both virtual and clinical datasets. The supplementary matching of phenotypes in free-text form significantly enhances the precision of candidate variant prioritization for samples.
Conclusion: geneEX has achieved automated HPO acquisition through its independently developed phenotype extraction and standardization methods, thereby enabling the full-process automated identification from clinical samples to pathogenic variants. Additionally, by integrating free-text phenotypic descriptions with disease phenotype matching, it enhances the accuracy of pathogenic gene identification. This innovative approach significantly improves the precision and efficiency of identifying pathogenic variants in rare genetic disorders, providing robust support for the diagnosis of monogenic diseases.
期刊介绍:
Molecular Genetics & Genomic Medicine is a peer-reviewed journal for rapid dissemination of quality research related to the dynamically developing areas of human, molecular and medical genetics. The journal publishes original research articles covering findings in phenotypic, molecular, biological, and genomic aspects of genomic variation, inherited disorders and birth defects. The broad publishing spectrum of Molecular Genetics & Genomic Medicine includes rare and common disorders from diagnosis to treatment. Examples of appropriate articles include reports of novel disease genes, functional studies of genetic variants, in-depth genotype-phenotype studies, genomic analysis of inherited disorders, molecular diagnostic methods, medical bioinformatics, ethical, legal, and social implications (ELSI), and approaches to clinical diagnosis. Molecular Genetics & Genomic Medicine provides a scientific home for next generation sequencing studies of rare and common disorders, which will make research in this fascinating area easily and rapidly accessible to the scientific community. This will serve as the basis for translating next generation sequencing studies into individualized diagnostics and therapeutics, for day-to-day medical care.
Molecular Genetics & Genomic Medicine publishes original research articles, reviews, and research methods papers, along with invited editorials and commentaries. Original research papers must report well-conducted research with conclusions supported by the data presented.