学习固有的遗传模式和性状关联与深度生成模型的离散基因型模拟。

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience Pub Date : 2026-04-14 DOI:10.1093/gigascience/giag044

Sihan Xie, Thierry Tribout, Didier Boichard, Blaise Hanczar, Julien Chiquet, Eric Barrey

{"title":"学习固有的遗传模式和性状关联与深度生成模型的离散基因型模拟。","authors":"Sihan Xie, Thierry Tribout, Didier Boichard, Blaise Hanczar, Julien Chiquet, Eric Barrey","doi":"10.1093/gigascience/giag044","DOIUrl":null,"url":null,"abstract":"Background: Deep generative models open new avenues for simulating realistic genomic data while preserving privacy and addressing data accessibility constraints. While previous studies have primarily focused on generating gene expression or haplotype data, this study explores generating genotype data in both unconditioned and phenotype-conditioned settings, which is inherently more challenging due to the discrete nature of genotype data.Results: We developed and evaluated commonly used generative models, including Variational Autoencoders (VAEs), Diffusion Models, and Generative Adversarial Networks (GANs), and proposed adaptation tailored to discrete genotype data. We conducted extensive experiments on large-scale datasets, including all chromosomes from cow and multiple chromosomes from human. Model performance was assessed using a well-established set of metrics drawn from both deep learning and quantitative genetics literature. Our results show that these models can effectively capture genetic patterns and preserve genotype-phenotype association.Conclusions: As deep generative models are able to reproduce key characteristics of genotype data, they can serve as direct tools for genotype-phenotype simulation, while also enabling privacy-preserving data sharing. Our findings provide a comprehensive evaluation of these models and offer practical guidance for future research in genotype simulation.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2026-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning inherent genetic patterns and trait associations with deep generative models for discrete genotype simulation.\",\"authors\":\"Sihan Xie, Thierry Tribout, Didier Boichard, Blaise Hanczar, Julien Chiquet, Eric Barrey\",\"doi\":\"10.1093/gigascience/giag044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Deep generative models open new avenues for simulating realistic genomic data while preserving privacy and addressing data accessibility constraints. While previous studies have primarily focused on generating gene expression or haplotype data, this study explores generating genotype data in both unconditioned and phenotype-conditioned settings, which is inherently more challenging due to the discrete nature of genotype data.Results: We developed and evaluated commonly used generative models, including Variational Autoencoders (VAEs), Diffusion Models, and Generative Adversarial Networks (GANs), and proposed adaptation tailored to discrete genotype data. We conducted extensive experiments on large-scale datasets, including all chromosomes from cow and multiple chromosomes from human. Model performance was assessed using a well-established set of metrics drawn from both deep learning and quantitative genetics literature. Our results show that these models can effectively capture genetic patterns and preserve genotype-phenotype association.Conclusions: As deep generative models are able to reproduce key characteristics of genotype data, they can serve as direct tools for genotype-phenotype simulation, while also enabling privacy-preserving data sharing. Our findings provide a comprehensive evaluation of these models and offer practical guidance for future research in genotype simulation.\",\"PeriodicalId\":12581,\"journal\":{\"name\":\"GigaScience\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":11.8000,\"publicationDate\":\"2026-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GigaScience\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/gigascience/giag044\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giag044","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

背景：深度生成模型在保护隐私和解决数据可访问性限制的同时，为模拟现实基因组数据开辟了新的途径。虽然以前的研究主要集中在生成基因表达或单倍型数据，但本研究探索了在无条件和表型条件下生成基因型数据，由于基因型数据的离散性，这本身更具挑战性。结果：我们开发并评估了常用的生成模型，包括变分自编码器（VAEs）、扩散模型和生成对抗网络（GANs），并提出了针对离散基因型数据的适应方案。我们在大规模的数据集上进行了广泛的实验，包括奶牛的所有染色体和人类的多条染色体。使用一套完善的指标来评估模型的性能，这些指标来自深度学习和定量遗传学文献。我们的研究结果表明，这些模型可以有效地捕获遗传模式并保持基因型-表型关联。结论：由于深度生成模型能够再现基因型数据的关键特征，因此它们可以作为基因型-表型模拟的直接工具，同时还可以实现保护隐私的数据共享。我们的研究结果为这些模型提供了全面的评价，并为未来的基因型模拟研究提供了实用的指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning inherent genetic patterns and trait associations with deep generative models for discrete genotype simulation.

Background: Deep generative models open new avenues for simulating realistic genomic data while preserving privacy and addressing data accessibility constraints. While previous studies have primarily focused on generating gene expression or haplotype data, this study explores generating genotype data in both unconditioned and phenotype-conditioned settings, which is inherently more challenging due to the discrete nature of genotype data.

Results: We developed and evaluated commonly used generative models, including Variational Autoencoders (VAEs), Diffusion Models, and Generative Adversarial Networks (GANs), and proposed adaptation tailored to discrete genotype data. We conducted extensive experiments on large-scale datasets, including all chromosomes from cow and multiple chromosomes from human. Model performance was assessed using a well-established set of metrics drawn from both deep learning and quantitative genetics literature. Our results show that these models can effectively capture genetic patterns and preserve genotype-phenotype association.

Conclusions: As deep generative models are able to reproduce key characteristics of genotype data, they can serve as direct tools for genotype-phenotype simulation, while also enabling privacy-preserving data sharing. Our findings provide a comprehensive evaluation of these models and offer practical guidance for future research in genotype simulation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

GigaScience MULTIDISCIPLINARY SCIENCES-

CiteScore

15.50

自引率

1.10%

发文量

119

审稿时长

1 weeks

期刊介绍： GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.