CSGDN：预测作物基因表型关联的对比签名图扩散网络。

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2024-11-22 DOI:10.1093/bib/bbaf062

Yiru Pan, Xingyu Ji, Jiaqi You, Lu Li, Zhenping Liu, Xianlong Zhang, Zeyu Zhang, Maojun Wang

{"title":"CSGDN：预测作物基因表型关联的对比签名图扩散网络。","authors":"Yiru Pan, Xingyu Ji, Jiaqi You, Lu Li, Zhenping Liu, Xianlong Zhang, Zeyu Zhang, Maojun Wang","doi":"10.1093/bib/bbaf062","DOIUrl":null,"url":null,"abstract":"Positive and negative association prediction between gene and phenotype helps to illustrate the underlying mechanism of complex traits in organisms. The transcription and regulation activity of specific genes will be adjusted accordingly in different cell types, developmental timepoints, and physiological states. There are the following two problems in obtaining the positive/negative associations between gene and phenotype: (1) high-throughput DNA/RNA sequencing and phenotyping are expensive and time-consuming due to the need to process large sample sizes; (2) experiments introduce both random and systematic errors, and, meanwhile, calculations or predictions using software or models may produce noise. To address these two issues, we propose a Contrastive Signed Graph Diffusion Network, CSGDN, to learn robust node representations with fewer training samples to achieve higher link prediction accuracy. CSGDN uses a signed graph diffusion method to uncover the underlying regulatory associations between genes and phenotypes. Then, stochastic perturbation strategies are used to create two views for both original and diffusive graphs. Lastly, a multiview contrastive learning paradigm loss is designed to unify the node presentations learned from the two views to resist interference and reduce noise. We perform experiments to validate the performance of CSGDN in three crop datasets: Gossypium hirsutum, Brassica napus, and Triticum turgidum. The results show that the proposed model outperforms state-of-the-art methods by up to 9. 28% AUC for the prediction of link sign in the G. hirsutum dataset. The source code of our model is available at https://github.com/Erican-Ji/CSGDN.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11840565/pdf/","citationCount":"0","resultStr":"{\"title\":\"CSGDN: contrastive signed graph diffusion network for predicting crop gene-phenotype associations.\",\"authors\":\"Yiru Pan, Xingyu Ji, Jiaqi You, Lu Li, Zhenping Liu, Xianlong Zhang, Zeyu Zhang, Maojun Wang\",\"doi\":\"10.1093/bib/bbaf062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Positive and negative association prediction between gene and phenotype helps to illustrate the underlying mechanism of complex traits in organisms. The transcription and regulation activity of specific genes will be adjusted accordingly in different cell types, developmental timepoints, and physiological states. There are the following two problems in obtaining the positive/negative associations between gene and phenotype: (1) high-throughput DNA/RNA sequencing and phenotyping are expensive and time-consuming due to the need to process large sample sizes; (2) experiments introduce both random and systematic errors, and, meanwhile, calculations or predictions using software or models may produce noise. To address these two issues, we propose a Contrastive Signed Graph Diffusion Network, CSGDN, to learn robust node representations with fewer training samples to achieve higher link prediction accuracy. CSGDN uses a signed graph diffusion method to uncover the underlying regulatory associations between genes and phenotypes. Then, stochastic perturbation strategies are used to create two views for both original and diffusive graphs. Lastly, a multiview contrastive learning paradigm loss is designed to unify the node presentations learned from the two views to resist interference and reduce noise. We perform experiments to validate the performance of CSGDN in three crop datasets: Gossypium hirsutum, Brassica napus, and Triticum turgidum. The results show that the proposed model outperforms state-of-the-art methods by up to 9. 28% AUC for the prediction of link sign in the G. hirsutum dataset. The source code of our model is available at https://github.com/Erican-Ji/CSGDN.\",\"PeriodicalId\":9209,\"journal\":{\"name\":\"Briefings in bioinformatics\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11840565/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Briefings in bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bib/bbaf062\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf062","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

基因与表型之间的正相关和负相关预测有助于阐明生物体复杂性状的潜在机制。特定基因的转录和调控活性会在不同的细胞类型、发育时间点和生理状态下进行相应的调整。获得基因与表型正相关或负相关存在以下两个问题：(1)由于需要处理大样本量，高通量DNA/RNA测序和表型分型昂贵且耗时；(2)实验存在随机误差和系统误差，同时，使用软件或模型进行计算或预测可能会产生噪声。为了解决这两个问题，我们提出了一种对比签名图扩散网络（CSGDN），以更少的训练样本学习鲁棒节点表示，以达到更高的链路预测精度。CSGDN使用签名图扩散方法来揭示基因和表型之间潜在的调节关联。然后，利用随机摄动策略对原始图和扩散图分别建立两个视图。最后，设计了一个多视图对比学习范式损失，将从两个视图中学习到的节点表示统一起来，以抵抗干扰和降低噪声。我们通过实验验证了CSGDN在三种作物数据集上的性能：棉花、甘蓝型油菜和小麦。结果表明，所提出的模型比目前最先进的方法高出9。G. hirsutum数据集链接符号预测的AUC为28%。我们的模型的源代码可以在https://github.com/Erican-Ji/CSGDN上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CSGDN: contrastive signed graph diffusion network for predicting crop gene-phenotype associations.

Positive and negative association prediction between gene and phenotype helps to illustrate the underlying mechanism of complex traits in organisms. The transcription and regulation activity of specific genes will be adjusted accordingly in different cell types, developmental timepoints, and physiological states. There are the following two problems in obtaining the positive/negative associations between gene and phenotype: (1) high-throughput DNA/RNA sequencing and phenotyping are expensive and time-consuming due to the need to process large sample sizes; (2) experiments introduce both random and systematic errors, and, meanwhile, calculations or predictions using software or models may produce noise. To address these two issues, we propose a Contrastive Signed Graph Diffusion Network, CSGDN, to learn robust node representations with fewer training samples to achieve higher link prediction accuracy. CSGDN uses a signed graph diffusion method to uncover the underlying regulatory associations between genes and phenotypes. Then, stochastic perturbation strategies are used to create two views for both original and diffusive graphs. Lastly, a multiview contrastive learning paradigm loss is designed to unify the node presentations learned from the two views to resist interference and reduce noise. We perform experiments to validate the performance of CSGDN in three crop datasets: Gossypium hirsutum, Brassica napus, and Triticum turgidum. The results show that the proposed model outperforms state-of-the-art methods by up to 9. 28% AUC for the prediction of link sign in the G. hirsutum dataset. The source code of our model is available at https://github.com/Erican-Ji/CSGDN.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.