bioRxiv - Genomics最新文献

筛选
英文 中文
Iso-Seq enables discovery of novel isoform variants in human retina at single cell resolution Iso-Seq 能够以单细胞分辨率发现人类视网膜中的新型同工酶变体
bioRxiv - Genomics Pub Date : 2024-08-09 DOI: 10.1101/2024.08.08.607267
Luozixian Wang, Daniel Urrutia-Cabrera, Sandy Shen-Chi Hung, Alex W Hewitt, Samuel W Lukowski, Careen Foord, Peng-Yuan Wang, Hagen Tilgner, Raymond Wong
{"title":"Iso-Seq enables discovery of novel isoform variants in human retina at single cell resolution","authors":"Luozixian Wang, Daniel Urrutia-Cabrera, Sandy Shen-Chi Hung, Alex W Hewitt, Samuel W Lukowski, Careen Foord, Peng-Yuan Wang, Hagen Tilgner, Raymond Wong","doi":"10.1101/2024.08.08.607267","DOIUrl":"https://doi.org/10.1101/2024.08.08.607267","url":null,"abstract":"Recent single cell transcriptomic profiling of the human retina provided important insights into the genetic signals in heterogeneous retinal cell populations that enable vision. However, conventional single cell RNAseq with 3' short-read sequencing is not suitable to identify isoform variants. Here we utilized Iso-Seq with full-length sequencing to profile the human retina at single cell resolution for isoform discovery. We generated a retina transcriptome dataset consisting of 25,302 nuclei from three donor retina, and detected 49,710 known transcripts and 241,949 novel transcripts across major retinal cell types. We surveyed the use of alternative promoters to drive transcript variant expression, and showed that 1-8% of genes utilized multiple promoters across major retinal cell types. Also, our results enabled gene expression profiling of novel transcript variants for inherited retinal disease (IRD) genes, and identified differential usage of exon splicing in major retinal cell types. Altogether, we generated a human retina transcriptome dataset at single cell resolution with full-length sequencing. Our study highlighted the potential of Iso-Seq to map the isoform diversity in the human retina, providing an expanded view of the complex transcriptomic landscape in the retina.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"370 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Draft Pacific Ancestry Pangenome Reference 太平洋祖先庞基因组参考文献草案
bioRxiv - Genomics Pub Date : 2024-08-09 DOI: 10.1101/2024.08.07.606392
Connor C Littlefield, Jose M Lazaro-Guevara, Devorah Stucki, Michael Lansford, Melissa H Pezzolesi, Emma J Taylor, Etoni Ma'asi C Wolfgramm, Jacob Taloa, Kime Lao, C Dave Dumaguit, Perry G Ridge, Justina P Tavana, William L Holland, Kalani L Raphael, Marcus G. Pezzolesi
{"title":"A Draft Pacific Ancestry Pangenome Reference","authors":"Connor C Littlefield, Jose M Lazaro-Guevara, Devorah Stucki, Michael Lansford, Melissa H Pezzolesi, Emma J Taylor, Etoni Ma'asi C Wolfgramm, Jacob Taloa, Kime Lao, C Dave Dumaguit, Perry G Ridge, Justina P Tavana, William L Holland, Kalani L Raphael, Marcus G. Pezzolesi","doi":"10.1101/2024.08.07.606392","DOIUrl":"https://doi.org/10.1101/2024.08.07.606392","url":null,"abstract":"Individuals of Pacific ancestry suffer some of the highest rates of health disparities yet remain vastly underrepresented in genomic research, including currently available linear and pangenome references. To begin addressing this, we developed the first Pacific ancestry pangenome reference using 23 individuals with diverse Pacific ancestry. We assembled 46 haploid genomes from these 23 individuals, resulting in highly accurate and contiguous genome assemblies with an average quality value of 55.0 and an average N50 of 40.7 Mb, marking the first de novo assembly of highly accurate Pacific ancestry genomes. We combined these assemblies to create a pangenome reference, which added 30.6 Mb of novel sequence missing from the Human Pangenome Reference Consortium (HPRC) reference. Mapping short-reads to this pangenome reduced variant call errors and yielded more true-positive variants compared to the HPRC and T2T-CHM13 references. This Pacific ancestry pangenome reference serves as a resource to enhance genetic analyses for this underserved population.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The repertoire of short tandem repeats across the tree of life 生命树上的短串联重复序列
bioRxiv - Genomics Pub Date : 2024-08-09 DOI: 10.1101/2024.08.08.607201
Nikol Chantzi, Ilias Georgakopoulos-Soares
{"title":"The repertoire of short tandem repeats across the tree of life","authors":"Nikol Chantzi, Ilias Georgakopoulos-Soares","doi":"10.1101/2024.08.08.607201","DOIUrl":"https://doi.org/10.1101/2024.08.08.607201","url":null,"abstract":"Short tandem repeats (STRs) are widespread, dynamic repetitive elements with a number of biological functions and relevance to human diseases. However, their prevalence across taxa remains poorly characterized. Here we examined the impact of STRs in the genomes of 117,253 organisms spanning the tree of life. We find that there are large differences in the frequencies of STRs between organismal genomes and these differences are largely driven by the taxonomic group an organism belongs to. Using simulated genomes, we find that on average there is no enrichment of STRs in bacterial and archaeal genomes, suggesting that these genomes are not particularly repetitive. In contrast, we find that eukaryotic genomes are orders of magnitude more repetitive than expected. STRs are preferentially located at functional loci at specific taxa. Finally, we utilize the recently completed Telomere-to-Telomere genomes of human and other great apes, and find that STRs are highly abundant and variable between primate species, particularly in peri/centromeric regions. We conclude that STRs have expanded in eukaryotic and viral lineages and not in archaea or bacteria, resulting in large discrepancies in genomic composition.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining Sex-Specific DNA Methylation and Variability Post In Vitro Fertilization 研究体外受精后的性别特异性 DNA 甲基化和变异性
bioRxiv - Genomics Pub Date : 2024-08-09 DOI: 10.1101/2024.08.08.604307
Melanie Lemaire, Keaton Warrick Smith, Samantha L Wilson
{"title":"Examining Sex-Specific DNA Methylation and Variability Post In Vitro Fertilization","authors":"Melanie Lemaire, Keaton Warrick Smith, Samantha L Wilson","doi":"10.1101/2024.08.08.604307","DOIUrl":"https://doi.org/10.1101/2024.08.08.604307","url":null,"abstract":"Infertility impacts up to 17.5% of reproductive-aged couples worldwide. To aid in conception, many couples turn to assisted reproductive technology, such as in vitro fertilization (IVF). IVF can introduce both physical and environmental stressors that may alter DNA methylation regulation, an important and dynamic process during early fetal development. This meta-analysis aims to assess the differences in the placental DNA methylome between spontaneous and IVF pregnancies. We identified three studies from NCBI GEO that measured DNA methylation with an Illumina Infinium Microarray in post-delivery placental tissue from both IVF and spontaneous pregnancies with a total of 575 samples for analysis (n = 96 IVF, n = 479 spontaneous). While there were no significant or differentially methylated CpGs in mixed or female stratified populations, we identified 9 CpGs that reached statistical significance (FDR <0.05) between IVF (n = 56) and spontaneous (n = 238) placentae. 7 autosomal CpGs and 1 X chromosome CpG was hypermethylated and 2 autosomal CpGs were hypomethylated in the IVF placentae compared to spontaneous. Autosomal CpGs closest to LIPJ, EEF1A2, and FBRSL1 also met our criteria to be classified as biologically differentially methylated CpGs (FDR <0.05; δ β|>0.05|). When analyzing variability differences in δβ values between IVF females, IVF males, spontaneous females and spontaneous males, we found a significant shift to greater variability in the both IVF males and females compared to spontaneous (p <2.2e-16, p <2.2e-16). Trends of variability were further analyzed in the biologically differentially methylated autosomal CpGs near LIPJ EEF1A2, and FBRSL1, and while these regions were statistically significant in males, the female δβ and δCoVs followed a similar trend that differed in magnitude. In males and females there was a statistically significant difference in proportions of endothelial cells, hofbauer cells, stromal cells and syncytiotrophoblasts between spontaneous and IVF populations. We also observed significant differences between sex within reproduction type in syncytiotrophoblasts and trophoblasts. The results of this study are critical to further understand the impact of IVF on tissue epigenetics which may help to investigate the connections between IVF and negative pregnancy outcomes. Additionally, our study supports sex specific differences in placental DNA methylation and cell composition should be considered as factors for future placental DNA methylation analyses.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Epigenetic clock and lifespan prediction in the short-lived killifish Nothobranchius furzeri 短寿鳉鱼Nothobranchius furzeri的表观遗传时钟和寿命预测
bioRxiv - Genomics Pub Date : 2024-08-09 DOI: 10.1101/2024.08.07.606986
Chiara Giannuzzi, Mario Baumgart, Francesco Neri, Alessandro Cellerino
{"title":"Epigenetic clock and lifespan prediction in the short-lived killifish Nothobranchius furzeri","authors":"Chiara Giannuzzi, Mario Baumgart, Francesco Neri, Alessandro Cellerino","doi":"10.1101/2024.08.07.606986","DOIUrl":"https://doi.org/10.1101/2024.08.07.606986","url":null,"abstract":"Aging, characterized by a gradual decline in organismal fitness, is the primary risk factor for numerous diseases including cancer, cardiovascular, and neurodegenerative disorders. The inter-individual variability in aging and disease susceptibility has led to the concept of biological age an indirect measure of an individual relative fitness. Epigenetic changes, particularly DNA methylation, have emerged as reliable biomarkers for estimating biological age, leading to the development of predictive models known as epigenetic clocks. Initially created for humans, these clocks have been extended to various mammalian species. Here we set to expand these tools for the short-lived killifish, Nothobranchius furzeri. This species, with its remarkably short lifespan and expression of canonical aging hallmarks, offers a unique model for experimental aging studies.\u0000We developed an epigenetic clock for N. furzeri using reduced-representation bisulfite sequencing (RRBS) to analyze DNA methylation in brain and caudal fin tissues across different ages. Our study involved generating comprehensive DNA methylation datasets and employing machine learning to create predictive models based on individual CpG sites and co-methylation modules. These models demonstrated high accuracy in estimating chronological age, with a median absolute error of 3 weeks (7.5% of median lifespan) for a clock based on methylation of individual CpG and 1.5 weeks (3.7% of median lifespan) for an eigenvector-based clock. Our investigation extended to assessing epigenetic age acceleration in different strains and the potential resetting effect of regeneration on fin tissue. Notably, our models indicated that a shorter-lived strain has accelerated epigenetic aging and that regeneration does not reset, but may decelerate epigenetic aging. Additionally, we used longitudinal data to develop an \"epigenetic timer\" for direct prediction of individual lifespan based on fin biopsies and eigenvector-based method, achieving a median absolute error of 4.5 weeks in the prediction of actual age of death. This surprising result underscores the existence of intrinsic determinants of lifespan established early in life.\u0000This study presents the first epigenetic clocks and lifespan predictors for N. furzeri, highlighting their potential as aging biomarkers and sets the stage for future research on life-extending interventions in this model organism.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"199 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transposable elements impact the regulatory landscape through cell type specific epigenomic associations 可转座元件通过细胞类型特异性表观基因组关联影响调控格局
bioRxiv - Genomics Pub Date : 2024-08-07 DOI: 10.1101/2024.08.07.606967
Jeffrey Hyacinthe, Guillaume Bourque
{"title":"Transposable elements impact the regulatory landscape through cell type specific epigenomic associations","authors":"Jeffrey Hyacinthe, Guillaume Bourque","doi":"10.1101/2024.08.07.606967","DOIUrl":"https://doi.org/10.1101/2024.08.07.606967","url":null,"abstract":"Transposable elements (TEs) are DNA sequences able to create copies of themselves within the genome. Despite their limited expression due to silencing, TEs still manage to impact the host genome. For instance, some TEs have been shown to act as cis-regulatory elements and be co-opted in the human genome. This highlights that the contributions of TEs to the host might come from their relationship with the epigenome rather than their expression. However, a systematic analysis that relates TEs in the human genome directly with chromatin histone marks across distinct cell types remains lacking. Here we leverage a new dataset from the International Human Epigenome Consortium with 4867 uniformly processed ChIP-seq experiments for 6 histone marks across 175 annotated cell labels and show that TEs have drastically different enrichments levels across marks. Overall, we find that TEs are generally depleted in H3K9me3 histone modification, except for L1s, while MIRs were highly enriched in H3K4me1, H3K27ac and H3K27me3 and Alus were enriched in H3K36me3. Furthermore, we present a generalised profile of the relationship between TEs enrichment and TE age which reveals a few TE families (Alu, MIR, L2) as diverging from expected dynamics. We also find some significant differences in TE enrichment between cell types and that in 20% of the cases, these enrichments were cell-type specific. We report that at least 4% of cell types with healthy and cancer samples featured significant differences. Notably, we identify 456 TE-Cell Type-histone triplet candidates with the strongest cell-type specific enrichments. We show that many of these candidates are associated with relevant biological processes and genes expressed in the relevant cell type. These results further support a role for TE in genome regulation and highlight novel associations between TEs and histone marks across cell types.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome-Wide Association Study and transcriptome analysis reveals a complex gene network that regulates opsin gene expression and cell fate determination in Drosophila R7 photoreceptor cells 全基因组关联研究和转录组分析揭示了调控果蝇 R7 感光细胞中视蛋白基因表达和细胞命运决定的复杂基因网络
bioRxiv - Genomics Pub Date : 2024-08-07 DOI: 10.1101/2024.08.05.606616
John C. Aldrich, Lauren A. Vanderlinden, Thomas L. Jacobsen, Cheyret Wood, Laura M. Saba, Steven G. Britt
{"title":"Genome-Wide Association Study and transcriptome analysis reveals a complex gene network that regulates opsin gene expression and cell fate determination in Drosophila R7 photoreceptor cells","authors":"John C. Aldrich, Lauren A. Vanderlinden, Thomas L. Jacobsen, Cheyret Wood, Laura M. Saba, Steven G. Britt","doi":"10.1101/2024.08.05.606616","DOIUrl":"https://doi.org/10.1101/2024.08.05.606616","url":null,"abstract":"<strong>Background</strong> An animal’s ability to discriminate between differing wavelengths of light (i.e., color vision) is mediated, in part, by a subset of photoreceptor cells that express opsins with distinct absorption spectra. In <em>Drosophila</em> R7 photoreceptors, expression of the rhodopsin molecules, Rh3 or Rh4, is determined by a stochastic process mediated by the transcription factor <em>spineless</em>. The goal of this study was to identify additional factors that regulate R7 cell fate and opsin choice using a Genome Wide Association Study (GWAS) paired with transcriptome analysis via RNA-Seq.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"78 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extensively acquired antimicrobial resistant bacteria restructure the individual microbial community in post-antibiotic conditions 广泛获得的抗菌细菌在后抗生素条件下重组个体微生物群落
bioRxiv - Genomics Pub Date : 2024-08-07 DOI: 10.1101/2024.08.07.606955
Jae Woo Baek, Songwon Lim, Nayeon Park, Byeongsop Song, Nikhil Kirtipal, Jens Nielsen, Adil Mardinoglu, Saeed Shoaie, Jae-il Kim, Jang Won Son, Ara Koh, Sunjae Lee
{"title":"Extensively acquired antimicrobial resistant bacteria restructure the individual microbial community in post-antibiotic conditions","authors":"Jae Woo Baek, Songwon Lim, Nayeon Park, Byeongsop Song, Nikhil Kirtipal, Jens Nielsen, Adil Mardinoglu, Saeed Shoaie, Jae-il Kim, Jang Won Son, Ara Koh, Sunjae Lee","doi":"10.1101/2024.08.07.606955","DOIUrl":"https://doi.org/10.1101/2024.08.07.606955","url":null,"abstract":"In recent years, the overuse of antibiotics has led to the emergence of antimicrobial resistant (AMR) bacteria. To evaluate the spread of AMR bacteria, the reservoir of AMR genes (resistome) has traditionally been identified from environmental samples, hospital environments, and human populations; however, the functional role of AMR bacteria in the human gut microbiome and their persistency within individuals has not been fully investigated. Here, we performed a strain-resolved in-depth analysis of the resistome changes by reconstructing a large number of metagenome-assembled genomes (MAGs) of antibiotics- treated individual’s gut microbiome. Interestingly, we identified two bacterial populations with different resistome profiles, extensively acquired antimicrobial resistant bacteria (EARB) and sporadically acquired antimicrobial resistant bacteria (SARB), and found that EARB showed broader drug resistance and a significant functional role in shaping individual microbiome composition after antibiotic treatment. Furthermore, longitudinal strain analysis revealed that EARB bacteria were inherently carried by individuals and can reemerge through strain switching in the human gut microbiome. Our data on the presence of AMR bacteria in the human gut microbiome provides a new avenue for controlling the spread of AMR bacteria in the human community.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable imaging-free spatial genomics through computational reconstruction 通过计算重建实现可扩展的无成像空间基因组学
bioRxiv - Genomics Pub Date : 2024-08-07 DOI: 10.1101/2024.08.05.606465
Chenlei Hu, Mehdi Borji, Giovanni J. Marrero, Vipin Kumar, Jackson A. Weir, Sachin V. Kammula, Evan Z. Macosko, Fei Chen
{"title":"Scalable imaging-free spatial genomics through computational reconstruction","authors":"Chenlei Hu, Mehdi Borji, Giovanni J. Marrero, Vipin Kumar, Jackson A. Weir, Sachin V. Kammula, Evan Z. Macosko, Fei Chen","doi":"10.1101/2024.08.05.606465","DOIUrl":"https://doi.org/10.1101/2024.08.05.606465","url":null,"abstract":"Tissue organization arises from the coordinated molecular programs of cells. Spatial genomics maps cells and their molecular programs within the spatial context of tissues. However, current methods measure spatial information through imaging or direct registration, which often require specialized equipment and are limited in scale. Here, we developed an imaging-free spatial transcriptomics method that uses molecular diffusion patterns to computationally reconstruct spatial data. To do so, we utilize a simple experimental protocol on two dimensional barcode arrays to establish an interaction network between barcodes via molecular diffusion. Sequencing these interactions generates a high dimensional matrix of interactions between different spatial barcodes. Then, we perform dimensionality reduction to regenerate a two-dimensional manifold, which represents the spatial locations of the barcode arrays. Surprisingly, we found that the UMAP algorithm, with minimal modifications can faithfully successfully reconstruct the arrays. We demonstrated that this method is compatible with capture array based spatial transcriptomics/genomics methods, Slide-seq and Slide-tags, with high fidelity. We systematically explore the fidelity of the reconstruction through comparisons with experimentally derived ground truth data, and demonstrate that reconstruction generates high quality spatial genomics data. We also scaled this technique to reconstruct high-resolution spatial information over areas up to 1.2 centimeters. This computational reconstruction method effectively converts spatial genomics measurements to molecular biology, enabling spatial transcriptomics with high accessibility, and scalability.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"112 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latent generative modeling of long genetic sequences with GANs 利用 GAN 对长遗传序列进行潜在生成建模
bioRxiv - Genomics Pub Date : 2024-08-07 DOI: 10.1101/2024.08.07.607012
Antoine Szatkownik, Cyril Furtlehner, Guillaume Charpiat, Burak Yelmen, Flora Jay
{"title":"Latent generative modeling of long genetic sequences with GANs","authors":"Antoine Szatkownik, Cyril Furtlehner, Guillaume Charpiat, Burak Yelmen, Flora Jay","doi":"10.1101/2024.08.07.607012","DOIUrl":"https://doi.org/10.1101/2024.08.07.607012","url":null,"abstract":"Synthetic data generation via generative modeling has recently become a prominent research field in genomics, with applications ranging from functional sequence design to high-quality, privacy-preserving artificial in silico genomes. Following a body of work on Artificial Genomes (AGs) created via various generative models trained with raw genomic input, we propose a conceptually different approach to address the issues of scalability and complexity of genomic data generation in very high dimensions. Our method combines dimensionality reduction, achieved by Principal Component Analysis (PCA), and a Generative Adversarial Network (GAN) learning in this reduced space. Using this framework, we generated genomic proxy datasets for very diverse human populations around the world. We compared the quality of AGs generated by our approach with AGs generated by the established models and report improvements in capturing population structure, linkage disequilibrium, and metrics related to privacy leakage. Furthermore, we developed a frugal model with orders of magnitude fewer parameters and comparable performance to larger models. For quality assessment, we also implemented a new evaluation metric based on information theory to measure local haplotypic diversity, showing that generative models yield higher diversity than real genomes. In addition, we addressed the shrinkage issue associated with PCA and generative modeling, examined its relation to the nearest neighbor resemblance metric, and proposed a resolution. Finally, we evaluated the effect of different binarization methods on the quality of the output AGs.","PeriodicalId":501161,"journal":{"name":"bioRxiv - Genomics","volume":"199 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信