arXiv - QuanBio - Genomics最新文献_第3页

Insights, opportunities and challenges provided by large cell atlases 大型细胞图谱提供的见解、机遇和挑战

arXiv - QuanBio - Genomics Pub Date : 2024-08-13 DOI: arxiv-2408.06563

Martin Hemberg, Federico Marini, Shila Ghazanfar, Ahmad Al Ajami, Najla Abassi, Benedict Anchang, Bérénice A. Benayoun, Yue Cao, Ken Chen, Yesid Cuesta-Astroz, Zach DeBruine, Calliope A. Dendrou, Iwijn De Vlaminck, Katharina Imkeller, Ilya Korsunsky, Alex R. Lederer, Pieter Meysman, Clint Miller, Kerry Mullan, Uwe Ohler, Nikolaos Patikas, Jonas Schuck, Jacqueline HY Siu, Timothy J. Triche Jr., Alex Tsankov, Sander W. van der Laan, Masanao Yajima, Jean Yang, Fabio Zanini, Ivana Jelic

{"title":"Insights, opportunities and challenges provided by large cell atlases","authors":"Martin Hemberg, Federico Marini, Shila Ghazanfar, Ahmad Al Ajami, Najla Abassi, Benedict Anchang, Bérénice A. Benayoun, Yue Cao, Ken Chen, Yesid Cuesta-Astroz, Zach DeBruine, Calliope A. Dendrou, Iwijn De Vlaminck, Katharina Imkeller, Ilya Korsunsky, Alex R. Lederer, Pieter Meysman, Clint Miller, Kerry Mullan, Uwe Ohler, Nikolaos Patikas, Jonas Schuck, Jacqueline HY Siu, Timothy J. Triche Jr., Alex Tsankov, Sander W. van der Laan, Masanao Yajima, Jean Yang, Fabio Zanini, Ivana Jelic","doi":"arxiv-2408.06563","DOIUrl":"https://doi.org/arxiv-2408.06563","url":null,"abstract":"The field of single-cell biology is growing rapidly and is generating large\u0000amounts of data from a variety of species, disease conditions, tissues, and\u0000organs. Coordinated efforts such as CZI CELLxGENE, HuBMAP, Broad Institute\u0000Single Cell Portal, and DISCO, allow researchers to access large volumes of\u0000curated datasets. Although the majority of the data is from scRNAseq\u0000experiments, a wide range of other modalities are represented as well. These\u0000resources have created an opportunity to build and expand the computational\u0000biology ecosystem to develop tools necessary for data reuse, and for extracting\u0000novel biological insights. Here, we highlight achievements made so far, areas\u0000where further development is needed, and specific challenges that need to be\u0000overcome.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pretrained-Guided Conditional Diffusion Models for Microbiome Data Analysis 用于微生物组数据分析的预训练引导条件扩散模型

arXiv - QuanBio - Genomics Pub Date : 2024-08-10 DOI: arxiv-2408.07709

Xinyuan Shi, Fangfang Zhu, Wenwen Min

引用次数: 0

scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data scASDC：单细胞 RNA-seq 数据的注意力增强型结构深度聚类

arXiv - QuanBio - Genomics Pub Date : 2024-08-09 DOI: arxiv-2408.05258

Wenwen Min, Zhen Wang, Fangfang Zhu, Taosheng Xu, Shunfang Wang

{"title":"scASDC: Attention Enhanced Structural Deep Clustering for Single-cell RNA-seq Data","authors":"Wenwen Min, Zhen Wang, Fangfang Zhu, Taosheng Xu, Shunfang Wang","doi":"arxiv-2408.05258","DOIUrl":"https://doi.org/arxiv-2408.05258","url":null,"abstract":"Single-cell RNA sequencing (scRNA-seq) data analysis is pivotal for\u0000understanding cellular heterogeneity. However, the high sparsity and complex\u0000noise patterns inherent in scRNA-seq data present significant challenges for\u0000traditional clustering methods. To address these issues, we propose a deep\u0000clustering method, Attention-Enhanced Structural Deep Embedding Graph\u0000Clustering (scASDC), which integrates multiple advanced modules to improve\u0000clustering accuracy and robustness.Our approach employs a multi-layer graph\u0000convolutional network (GCN) to capture high-order structural relationships\u0000between cells, termed as the graph autoencoder module. To mitigate the\u0000oversmoothing issue in GCNs, we introduce a ZINB-based autoencoder module that\u0000extracts content information from the data and learns latent representations of\u0000gene expression. These modules are further integrated through an attention\u0000fusion mechanism, ensuring effective combination of gene expression and\u0000structural information at each layer of the GCN. Additionally, a\u0000self-supervised learning module is incorporated to enhance the robustness of\u0000the learned embeddings. Extensive experiments demonstrate that scASDC\u0000outperforms existing state-of-the-art methods, providing a robust and effective\u0000solution for single-cell clustering tasks. Our method paves the way for more\u0000accurate and meaningful analysis of single-cell RNA sequencing data,\u0000contributing to better understanding of cellular heterogeneity and biological\u0000processes. All code and public datasets used in this paper are available at\u0000url{https://github.com/wenwenmin/scASDC} and\u0000url{https://zenodo.org/records/12814320}.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"86 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142180991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Masked Graph Autoencoders with Contrastive Augmentation for Spatially Resolved Transcriptomics Data 用于空间解析转录组学数据的具有对比增强功能的屏蔽图自动编码器

arXiv - QuanBio - Genomics Pub Date : 2024-08-09 DOI: arxiv-2408.06377

Donghai Fang, Fangfang Zhu, Dongting Xie, Wenwen Min

引用次数: 0

Heterogeneous graph attention network improves cancer multiomics integration 异构图注意网络改进了癌症多组学整合

arXiv - QuanBio - Genomics Pub Date : 2024-08-05 DOI: arxiv-2408.02845

Sina Tabakhi, Charlotte Vandermeulen, Ian Sudbery, Haiping Lu

{"title":"Heterogeneous graph attention network improves cancer multiomics integration","authors":"Sina Tabakhi, Charlotte Vandermeulen, Ian Sudbery, Haiping Lu","doi":"arxiv-2408.02845","DOIUrl":"https://doi.org/arxiv-2408.02845","url":null,"abstract":"The increase in high-dimensional multiomics data demands advanced integration\u0000models to capture the complexity of human diseases. Graph-based deep learning\u0000integration models, despite their promise, struggle with small patient cohorts\u0000and high-dimensional features, often applying independent feature selection\u0000without modeling relationships among omics. Furthermore, conventional\u0000graph-based omics models focus on homogeneous graphs, lacking multiple types of\u0000nodes and edges to capture diverse structures. We introduce a Heterogeneous\u0000Graph ATtention network for omics integration (HeteroGATomics) to improve\u0000cancer diagnosis. HeteroGATomics performs joint feature selection through a\u0000multi-agent system, creating dedicated networks of feature and patient\u0000similarity for each omic modality. These networks are then combined into one\u0000heterogeneous graph for learning holistic omic-specific representations and\u0000integrating predictions across modalities. Experiments on three cancer\u0000multiomics datasets demonstrate HeteroGATomics' superior performance in cancer\u0000diagnosis. Moreover, HeteroGATomics enhances interpretability by identifying\u0000important biomarkers contributing to the diagnosis outcomes.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141945884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Refinement of genetic variants needs attention 需要关注基因变异的完善

arXiv - QuanBio - Genomics Pub Date : 2024-08-01 DOI: arxiv-2408.00659

Omar Abdelwahab, Davoud Torkamaneh

{"title":"Refinement of genetic variants needs attention","authors":"Omar Abdelwahab, Davoud Torkamaneh","doi":"arxiv-2408.00659","DOIUrl":"https://doi.org/arxiv-2408.00659","url":null,"abstract":"Variant calling refinement is crucial for distinguishing true genetic\u0000variants from technical artifacts in high-throughput sequencing data. Manual\u0000review is time-consuming while heuristic filtering often lacks optimal\u0000solutions. Traditional variant calling methods often struggle with accuracy,\u0000especially in regions of low read coverage, leading to false-positive or\u0000false-negative calls. Here, we introduce VariantTransformer, a\u0000Transformer-based deep learning model, designed to automate variant calling\u0000refinement directly from VCF files in low-coverage data (10-15X).\u0000VariantTransformer, trained on two million variants, including SNPs and short\u0000InDels, from low-coverage sequencing data, achieved an accuracy of 89.26% and a\u0000ROC AUC of 0.88. When integrated into conventional variant calling pipelines,\u0000VariantTransformer outperformed traditional heuristic filters and approached\u0000the performance of state-of-the-art AI-based variant callers like DeepVariant.\u0000Comparative analysis demonstrated VariantTransformer's superiority in\u0000functionality, variant type coverage, training size, and input data type.\u0000VariantTransformer represents a significant advancement in variant calling\u0000refinement for low-coverage genomic studies.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141881680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrating spatially-resolved transcriptomics data across tissues and individuals: challenges and opportunities 整合跨组织和个体的空间分辨转录组学数据：挑战与机遇

arXiv - QuanBio - Genomics Pub Date : 2024-08-01 DOI: arxiv-2408.00367

Boyi Guo, Wodan Ling, Sang Ho Kwon, Pratibha Panwar, Shila Ghazanfar, Keri Martinowich, Stephanie C. Hicks

引用次数: 0

UnPaSt: unsupervised patient stratification by differentially expressed biclusters in omics data UnPaSt：通过 omics 数据中的差异表达双簇对患者进行无监督分层

arXiv - QuanBio - Genomics Pub Date : 2024-07-31 DOI: arxiv-2408.00200

Michael Hartung, Andreas Maier, Fernando Delgado-Chaves, Yuliya Burankova, Olga I. Isaeva, Fábio Malta de Sá Patroni, Daniel He, Casey Shannon, Katharina Kaufmann, Jens Lohmann, Alexey Savchik, Anne Hartebrodt, Zoe Chervontseva, Farzaneh Firoozbakht, Niklas Probul, Evgenia Zotova, Olga Tsoy, David B. Blumenthal, Martin Ester, Tanja Laske, Jan Baumbach, Olga Zolotareva

{"title":"UnPaSt: unsupervised patient stratification by differentially expressed biclusters in omics data","authors":"Michael Hartung, Andreas Maier, Fernando Delgado-Chaves, Yuliya Burankova, Olga I. Isaeva, Fábio Malta de Sá Patroni, Daniel He, Casey Shannon, Katharina Kaufmann, Jens Lohmann, Alexey Savchik, Anne Hartebrodt, Zoe Chervontseva, Farzaneh Firoozbakht, Niklas Probul, Evgenia Zotova, Olga Tsoy, David B. Blumenthal, Martin Ester, Tanja Laske, Jan Baumbach, Olga Zolotareva","doi":"arxiv-2408.00200","DOIUrl":"https://doi.org/arxiv-2408.00200","url":null,"abstract":"Most complex diseases, including cancer and non-malignant diseases like\u0000asthma, have distinct molecular subtypes that require distinct clinical\u0000approaches. However, existing computational patient stratification methods have\u0000been benchmarked almost exclusively on cancer omics data and only perform well\u0000when mutually exclusive subtypes can be characterized by many biomarkers. Here,\u0000we contribute with a massive evaluation attempt, quantitatively exploring the\u0000power of 22 unsupervised patient stratification methods using both, simulated\u0000and real transcriptome data. From this experience, we developed UnPaSt\u0000(https://apps.cosy.bio/unpast/) optimizing unsupervised patient stratification,\u0000working even with only a limited number of subtype-predictive biomarkers. We\u0000evaluated all 23 methods on real-world breast cancer and asthma transcriptomics\u0000data. Although many methods reliably detected major breast cancer subtypes,\u0000only few identified Th2-high asthma, and UnPaSt significantly outperformed its\u0000closest competitors in both test datasets. Essentially, we showed that UnPaSt\u0000can detect many biologically insightful and reproducible patterns in omic\u0000datasets.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141881662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Are gene-by-environment interactions leveraged in multi-modality neural networks for breast cancer prediction? 多模态神经网络在预测乳腺癌时是否利用了基因与环境的相互作用？

arXiv - QuanBio - Genomics Pub Date : 2024-07-30 DOI: arxiv-2407.20978

Monica Isgut, Andrew Hornback, Yunan Luo, Asma Khimani, Neha Jain, May D. Wang

{"title":"Are gene-by-environment interactions leveraged in multi-modality neural networks for breast cancer prediction?","authors":"Monica Isgut, Andrew Hornback, Yunan Luo, Asma Khimani, Neha Jain, May D. Wang","doi":"arxiv-2407.20978","DOIUrl":"https://doi.org/arxiv-2407.20978","url":null,"abstract":"Polygenic risk scores (PRSs) can significantly enhance breast cancer risk\u0000prediction when combined with clinical risk factor data. While many studies\u0000have explored the value-add of PRSs, little is known about the potential impact\u0000of gene-by-gene or gene-by-environment interactions towards enhancing the risk\u0000discrimination capabilities of multi-modal models combining PRSs with clinical\u0000data. In this study, we integrated data on 318 individual genotype variants\u0000along with clinical data in a neural network to explore whether gene-by-gene\u0000(i.e., between individual variants) and/or gene-by-environment (between\u0000clinical risk factors and variants) interactions could be leveraged jointly\u0000during training to improve breast cancer risk prediction performance. We\u0000benchmarked our approach against a baseline model combining traditional\u0000univariate PRSs with clinical data in a logistic regression model and ran an\u0000interpretability analysis to identify feature interactions. While our model did not demonstrate improved performance over the baseline,\u0000we discovered 248 (<1%) statistically significant gene-by-gene and\u0000gene-by-environment interactions out of the ~53.6k possible feature pairs, the\u0000most contributory of which included rs6001930 (MKL1) and rs889312 (MAP3K1),\u0000with age and menopause being the most heavily interacting non-genetic risk\u0000factors. We also modeled the significant interactions as a network of highly\u0000connected features, suggesting that potential higher-order interactions are\u0000captured by the model. Although gene-by-environment (or gene-by-gene)\u0000interactions did not enhance breast cancer risk prediction performance in\u0000neural networks, our study provides evidence that these interactions can be\u0000leveraged by these models to inform their predictions. This study represents\u0000the first application of neural networks to screen for interactions impacting\u0000breast cancer risk using real-world data.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"76 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141864772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PyamilySeq: A Python Tool for Interpretable Gene (Re)Clustering and Pangenomic Inference Across Species and Genera PyamilySeq：用于跨物种和属的可解释基因（再）聚类和泛基因组推断的 Python 工具

arXiv - QuanBio - Genomics Pub Date : 2024-07-27 DOI: arxiv-2407.19328

Nicholas J. Dimonaco

{"title":"PyamilySeq: A Python Tool for Interpretable Gene (Re)Clustering and Pangenomic Inference Across Species and Genera","authors":"Nicholas J. Dimonaco","doi":"arxiv-2407.19328","DOIUrl":"https://doi.org/arxiv-2407.19328","url":null,"abstract":"PyamilySeq is a Python-based tool designed for interpretable gene clustering\u0000and pangenomic inference, supporting analyses at both species and genus levels.\u0000It facilitates the clustering of gene sequences into families based on sequence\u0000similarity using CD-HIT, and can take the output of tried-and-tested sequence\u0000clustering tools such as CD-HIT, BLAST, DIAMOND, and MMseqs2. PyamilySeq is\u0000distinctive in its ability to integrate new sequences into existing clusters,\u0000providing a robust framework for iterative analysis while preserving the\u0000original clusters, useful when reannotating genomes. In addition to the\u0000standard Species mode which as with other tools performs core-gene analysis\u0000across a species range, PyamilySeq can be run in Genus mode where it detects\u0000the presence of gene families shared across multiple genera. These features\u0000enhance the tools applicability for ongoing and past genomic studies and\u0000comparative analyses. PyamilySeq generates comprehensive outputs, including\u0000gene presence-absence matrices and aligned sequence data, enabling downstream\u0000analysis and interpretation of the identified gene groups and pangenomic data.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141864773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0