{"title":"Identifying Disease-Gene Associations by Topological and Biological Feature-based Data Augmentation and Graph Neural Networks.","authors":"Yuan Zhang, Juan Wang, Jiajie Xing, Xiaomin Chen","doi":"10.1109/JBHI.2025.3549509","DOIUrl":null,"url":null,"abstract":"<p><p>Predicting gene-disease associations is essential for understanding disease pathogenesis and determining therapeutic targets. While prior methods have integrated diverse biological information to make predictions, they still encounter several challenges. First, incomplete and sparse gene-disease association data constrain model performance. Second, integrating heterogeneous data sources is not straightforward. To address these challenges, we propose a novel method, DAVGAE, which combines data augmentation, Variational Graph Auto-Encoders (VGAE), and attention mechanisms. DAVGAE integrates both the biological and topological features of genes and diseases to address challenges such as data sparsity and heterogeneity. By leveraging these features, it calculates cosine similarity scores for gene-disease pairs and applies a novel data augmentation strategy to enhance association data by selecting gene-disease associations with higher similarity scores. Using a four-layer Graph Neural Network (GNN) encoder, DAVGAE effectively learns robust and discriminative representations for genes and diseases within the association network. Finally, an inner product decoder predicts association scores for all gene-disease pairs. Comprehensive experiments on three gene-disease association datasets reveal that DAVGAE outperforms baseline models in predicting gene-disease associations. DAVGAE is freely available at https://github.com/imustu/DAVGAE.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2025.3549509","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Predicting gene-disease associations is essential for understanding disease pathogenesis and determining therapeutic targets. While prior methods have integrated diverse biological information to make predictions, they still encounter several challenges. First, incomplete and sparse gene-disease association data constrain model performance. Second, integrating heterogeneous data sources is not straightforward. To address these challenges, we propose a novel method, DAVGAE, which combines data augmentation, Variational Graph Auto-Encoders (VGAE), and attention mechanisms. DAVGAE integrates both the biological and topological features of genes and diseases to address challenges such as data sparsity and heterogeneity. By leveraging these features, it calculates cosine similarity scores for gene-disease pairs and applies a novel data augmentation strategy to enhance association data by selecting gene-disease associations with higher similarity scores. Using a four-layer Graph Neural Network (GNN) encoder, DAVGAE effectively learns robust and discriminative representations for genes and diseases within the association network. Finally, an inner product decoder predicts association scores for all gene-disease pairs. Comprehensive experiments on three gene-disease association datasets reveal that DAVGAE outperforms baseline models in predicting gene-disease associations. DAVGAE is freely available at https://github.com/imustu/DAVGAE.
期刊介绍:
IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.