arXiv - QuanBio - Genomics最新文献_第9页

F5C-finder: An Explainable and Ensemble Biological Language Model for Predicting 5-Formylcytidine Modifications on mRNA F5C-finder：用于预测 mRNA 上 5-甲酰基胞嘧啶修饰的可解释和集合生物语言模型

arXiv - QuanBio - Genomics Pub Date : 2024-04-20 DOI: arxiv-2404.13265

Guohao Wang, Ting Liu, Hongqiang Lyu, Ze Liu

{"title":"F5C-finder: An Explainable and Ensemble Biological Language Model for Predicting 5-Formylcytidine Modifications on mRNA","authors":"Guohao Wang, Ting Liu, Hongqiang Lyu, Ze Liu","doi":"arxiv-2404.13265","DOIUrl":"https://doi.org/arxiv-2404.13265","url":null,"abstract":"As a prevalent and dynamically regulated epigenetic modification,\u00005-formylcytidine (f5C) is crucial in various biological processes. However,\u0000traditional experimental methods for f5C detection are often laborious and\u0000time-consuming, limiting their ability to map f5C sites across the\u0000transcriptome comprehensively. While computational approaches offer a\u0000cost-effective and high-throughput alternative, no recognition model for f5C\u0000has been developed to date. Drawing inspiration from language models in natural\u0000language processing, this study presents f5C-finder, an ensemble neural\u0000network-based model utilizing multi-head attention for the identification of\u0000f5C. Five distinct feature extraction methods were employed to construct five\u0000individual artificial neural networks, and these networks were subsequently\u0000integrated through ensemble learning to create f5C-finder. 10-fold\u0000cross-validation and independent tests demonstrate that f5C-finder achieves\u0000state-of-the-art (SOTA) performance with AUC of 0.807 and 0.827, respectively.\u0000The result highlights the effectiveness of biological language model in\u0000capturing both the order (sequential) and functional meaning (semantics) within\u0000genomes. Furthermore, the built-in interpretability allows us to understand\u0000what the model is learning, creating a bridge between identifying key\u0000sequential elements and a deeper exploration of their biological functions.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140800817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Wasserstein Wormhole: Scalable Optimal Transport Distance with Transformers 瓦瑟斯坦虫洞利用变压器实现可扩展的最佳传输距离

arXiv - QuanBio - Genomics Pub Date : 2024-04-15 DOI: arxiv-2404.09411

Doron Haviv, Russell Zhang Kunes, Thomas Dougherty, Cassandra Burdziak, Tal Nawy, Anna Gilbert, Dana Pe'er

{"title":"Wasserstein Wormhole: Scalable Optimal Transport Distance with Transformers","authors":"Doron Haviv, Russell Zhang Kunes, Thomas Dougherty, Cassandra Burdziak, Tal Nawy, Anna Gilbert, Dana Pe'er","doi":"arxiv-2404.09411","DOIUrl":"https://doi.org/arxiv-2404.09411","url":null,"abstract":"Optimal transport (OT) and the related Wasserstein metric (W) are powerful\u0000and ubiquitous tools for comparing distributions. However, computing pairwise\u0000Wasserstein distances rapidly becomes intractable as cohort size grows. An\u0000attractive alternative would be to find an embedding space in which pairwise\u0000Euclidean distances map to OT distances, akin to standard multidimensional\u0000scaling (MDS). We present Wasserstein Wormhole, a transformer-based autoencoder\u0000that embeds empirical distributions into a latent space wherein Euclidean\u0000distances approximate OT distances. Extending MDS theory, we show that our\u0000objective function implies a bound on the error incurred when embedding\u0000non-Euclidean distances. Empirically, distances between Wormhole embeddings\u0000closely match Wasserstein distances, enabling linear time computation of OT\u0000distances. Along with an encoder that maps distributions to embeddings,\u0000Wasserstein Wormhole includes a decoder that maps embeddings back to\u0000distributions, allowing for operations in the embedding space to generalize to\u0000OT spaces, such as Wasserstein barycenter estimation and OT interpolation. By\u0000lending scalability and interpretability to OT approaches, Wasserstein Wormhole\u0000unlocks new avenues for data analysis in the fields of computational geometry\u0000and single-cell biology.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140561737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Systematic Overview of Single-Cell Transcriptomics Databases, their Use cases, and Limitations 单细胞转录组学数据库、用例和局限性系统概述

arXiv - QuanBio - Genomics Pub Date : 2024-04-15 DOI: arxiv-2404.10545

Mahnoor N. Gondal, Saad Ur Rehman Shah, Arul M. Chinnaiyan, Marcin Cieslik

{"title":"A Systematic Overview of Single-Cell Transcriptomics Databases, their Use cases, and Limitations","authors":"Mahnoor N. Gondal, Saad Ur Rehman Shah, Arul M. Chinnaiyan, Marcin Cieslik","doi":"arxiv-2404.10545","DOIUrl":"https://doi.org/arxiv-2404.10545","url":null,"abstract":"Rapid advancements in high-throughput single-cell RNA-seq (scRNA-seq)\u0000technologies and experimental protocols have led to the generation of vast\u0000amounts of genomic data that populates several online databases and\u0000repositories. Here, we systematically examined large-scale scRNA-seq databases,\u0000categorizing them based on their scope and purpose such as general,\u0000tissue-specific databases, disease-specific databases, cancer-focused\u0000databases, and cell type-focused databases. Next, we discuss the technical and\u0000methodological challenges associated with curating large-scale scRNA-seq\u0000databases, along with current computational solutions. We argue that\u0000understanding scRNA-seq databases, including their limitations and assumptions,\u0000is crucial for effectively utilizing this data to make robust discoveries and\u0000identify novel biological insights. Furthermore, we propose that bridging the\u0000gap between computational and wet lab scientists through user-friendly\u0000web-based platforms is needed for democratizing access to single-cell data.\u0000These platforms would facilitate interdisciplinary research, enabling\u0000researchers from various disciplines to collaborate effectively. This review\u0000underscores the importance of leveraging computational approaches to unravel\u0000the complexities of single-cell data and offers a promising direction for\u0000future research in the field.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"230 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140615422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling scRDiT：通过扩散变换器和加速采样生成单细胞 RNA-seq 数据

arXiv - QuanBio - Genomics Pub Date : 2024-04-09 DOI: arxiv-2404.06153

Shengze Dong, Zhuorui Cui, Ding Liu, Jinzhi Lei

{"title":"scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling","authors":"Shengze Dong, Zhuorui Cui, Ding Liu, Jinzhi Lei","doi":"arxiv-2404.06153","DOIUrl":"https://doi.org/arxiv-2404.06153","url":null,"abstract":"Motivation: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking\u0000technology extensively utilized in biological research, facilitating the\u0000examination of gene expression at the individual cell level within a given\u0000tissue sample. While numerous tools have been developed for scRNA-seq data\u0000analysis, the challenge persists in capturing the distinct features of such\u0000data and replicating virtual datasets that share analogous statistical\u0000properties. Results: Our study introduces a generative approach termed\u0000scRNA-seq Diffusion Transformer (scRDiT). This method generates virtual\u0000scRNA-seq data by leveraging a real dataset. The method is a neural network\u0000constructed based on Denoising Diffusion Probabilistic Models (DDPMs) and\u0000Diffusion Transformers (DiTs). This involves subjecting Gaussian noises to the\u0000real dataset through iterative noise-adding steps and ultimately restoring the\u0000noises to form scRNA-seq samples. This scheme allows us to learn data features\u0000from actual scRNA-seq samples during model training. Our experiments, conducted\u0000on two distinct scRNA-seq datasets, demonstrate superior performance.\u0000Additionally, the model sampling process is expedited by incorporating\u0000Denoising Diffusion Implicit Models (DDIM). scRDiT presents a unified\u0000methodology empowering users to train neural network models with their unique\u0000scRNA-seq datasets, enabling the generation of numerous high-quality scRNA-seq\u0000samples. Availability and implementation: https://github.com/DongShengze/scRDiT","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140561708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

scCDCG: Efficient Deep Structural Clustering for single-cell RNA-seq via Deep Cut-informed Graph Embedding scCDCG：通过深度切分信息图嵌入为单细胞 RNA-seq 进行高效深度结构聚类

arXiv - QuanBio - Genomics Pub Date : 2024-04-09 DOI: arxiv-2404.06167

Ping Xu, Zhiyuan Ning, Meng Xiao, Guihai Feng, Xin Li, Yuanchun Zhou, Pengfei Wang

{"title":"scCDCG: Efficient Deep Structural Clustering for single-cell RNA-seq via Deep Cut-informed Graph Embedding","authors":"Ping Xu, Zhiyuan Ning, Meng Xiao, Guihai Feng, Xin Li, Yuanchun Zhou, Pengfei Wang","doi":"arxiv-2404.06167","DOIUrl":"https://doi.org/arxiv-2404.06167","url":null,"abstract":"Single-cell RNA sequencing (scRNA-seq) is essential for unraveling cellular\u0000heterogeneity and diversity, offering invaluable insights for bioinformatics\u0000advancements. Despite its potential, traditional clustering methods in\u0000scRNA-seq data analysis often neglect the structural information embedded in\u0000gene expression profiles, crucial for understanding cellular correlations and\u0000dependencies. Existing strategies, including graph neural networks, face\u0000challenges in handling the inefficiency due to scRNA-seq data's intrinsic\u0000high-dimension and high-sparsity. Addressing these limitations, we introduce\u0000scCDCG (single-cell RNA-seq Clustering via Deep Cut-informed Graph), a novel\u0000framework designed for efficient and accurate clustering of scRNA-seq data that\u0000simultaneously utilizes intercellular high-order structural information. scCDCG\u0000comprises three main components: (i) A graph embedding module utilizing deep\u0000cut-informed techniques, which effectively captures intercellular high-order\u0000structural information, overcoming the over-smoothing and inefficiency issues\u0000prevalent in prior graph neural network methods. (ii) A self-supervised\u0000learning module guided by optimal transport, tailored to accommodate the unique\u0000complexities of scRNA-seq data, specifically its high-dimension and\u0000high-sparsity. (iii) An autoencoder-based feature learning module that\u0000simplifies model complexity through effective dimension reduction and feature\u0000extraction. Our extensive experiments on 6 datasets demonstrate scCDCG's\u0000superior performance and efficiency compared to 7 established models,\u0000underscoring scCDCG's potential as a transformative tool in scRNA-seq data\u0000analysis. Our code is available at: https://github.com/XPgogogo/scCDCG.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140561945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Guide to k-mer approaches for genomics across the tree of life 跨生命树基因组学 k-mer 方法指南

arXiv - QuanBio - Genomics Pub Date : 2024-04-01 DOI: arxiv-2404.01519

Katharine M. Jenike, Lucía Campos-Domínguez, Marilou Boddé, José Cerca, Christina N. Hodson, Michael C. Schatz, Kamil S. Jaron

{"title":"Guide to k-mer approaches for genomics across the tree of life","authors":"Katharine M. Jenike, Lucía Campos-Domínguez, Marilou Boddé, José Cerca, Christina N. Hodson, Michael C. Schatz, Kamil S. Jaron","doi":"arxiv-2404.01519","DOIUrl":"https://doi.org/arxiv-2404.01519","url":null,"abstract":"The wide array of currently available genomes display a wonderful diversity\u0000in size, composition and structure with many more to come thanks to several\u0000global biodiversity genomics initiatives starting in recent years. However,\u0000sequencing of genomes, even with all the recent advances, can still be\u0000challenging for both technical (e.g. small physical size, contaminated samples,\u0000or access to appropriate sequencing platforms) and biological reasons (e.g.\u0000germline restricted DNA, variable ploidy levels, sex chromosomes, or very large\u0000genomes). In recent years, k-mer-based techniques have become popular to\u0000overcome some of these challenges. They are based on the simple process of\u0000dividing the analysed sequences (e.g. raw reads or genomes) into a set of\u0000sub-sequences of length k, called k-mers. Despite this apparent simplicity,\u0000k-mer-based analysis allows for a rapid and intuitive assessment of complex\u0000sequencing datasets. Here, we provide the first comprehensive review to the\u0000theoretical properties and practical applications of k-mers in biodiversity\u0000genomics, serving as a reference manual for this powerful approach.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140561827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Just-DNA-Seq, open-source personal genomics platform: longevity science for everyone 开源个人基因组学平台 Just-DNA-Seq：人人享有的长寿科学

arXiv - QuanBio - Genomics Pub Date : 2024-03-28 DOI: arxiv-2403.19087

Kulaga AntonInstitute for Biostatistics and Informatics in Medicine and Ageing ResearchInstitute of Biochemistry of the Romanian AcademyInternational Longevity Alliance, Borysova OlgaInternational Longevity AllianceCellFabrik SRL, Karmazin AlexeyInternational Longevity AllianceMitoSpace, Koval MariaInstitute of Biochemistry of the Romanian AcademyInternational Longevity Alliance, Usanov NikolayInstitute of Biochemistry of the Romanian AcademyInternational Longevity Alliance, Fedorova AlinaInstitute of Biochemistry of the Romanian Academy, Evfratov SergeyInstitute of Biochemistry of the Romanian Academy, Pushkareva MalvinaInstitute of Biochemistry of the Romanian Academy, Ryangguk KimOak Bioinformatics LLC, Tacutu RobiSecvADN SRL

{"title":"Just-DNA-Seq, open-source personal genomics platform: longevity science for everyone","authors":"Kulaga AntonInstitute for Biostatistics and Informatics in Medicine and Ageing ResearchInstitute of Biochemistry of the Romanian AcademyInternational Longevity Alliance, Borysova OlgaInternational Longevity AllianceCellFabrik SRL, Karmazin AlexeyInternational Longevity AllianceMitoSpace, Koval MariaInstitute of Biochemistry of the Romanian AcademyInternational Longevity Alliance, Usanov NikolayInstitute of Biochemistry of the Romanian AcademyInternational Longevity Alliance, Fedorova AlinaInstitute of Biochemistry of the Romanian Academy, Evfratov SergeyInstitute of Biochemistry of the Romanian Academy, Pushkareva MalvinaInstitute of Biochemistry of the Romanian Academy, Ryangguk KimOak Bioinformatics LLC, Tacutu RobiSecvADN SRL","doi":"arxiv-2403.19087","DOIUrl":"https://doi.org/arxiv-2403.19087","url":null,"abstract":"Genomic data has become increasingly accessible to the general public with\u0000the advent of companies offering whole genome sequencing at a relatively low\u0000cost. However, their reports are not verifiable due to a lack of crucial\u0000details and transparency: polygenic risk scores do not always mention all the\u0000polymorphisms involved. Simultaneously, tackling the manual investigation and\u0000interpretation of data proves challenging for individuals lacking a background\u0000in genetics. Currently, there is no open-source or commercial solution that\u0000provides comprehensive longevity reports surpassing a limited number of\u0000polymorphisms. Additionally, there are no ready-made, out-of-the-box solutions\u0000available that require minimal expertise to generate reports independently. To\u0000address these issues, we have developed the Just-DNA-Seq open-source genomic\u0000platform. Just-DNA-Seq aims to provide a user-friendly solution to genome\u0000annotation by allowing users to upload their own VCF files and receive\u0000annotations of their genetic variants and polygenic risk scores related to\u0000longevity. We also created GeneticsGenie custom GPT that can answer genetics\u0000questions based on our modules. With the Just-DNA-Seq platform, we want to\u0000provide full information regarding the genetics of long life:\u0000disease-predisposing variants, that can reduce lifespan and manifest at\u0000different age (cardiovascular, oncological, neurodegenerative diseases, etc.),\u0000pro-longevity variants and longevity drug pharmacokinetics. In this research\u0000article, we will discuss the features and capabilities of Just-DNA-Seq, and how\u0000it can benefit individuals looking to understand and improve their health. It's\u0000crucial to note that the Just-DNA-Seq platform is exclusively intended for\u0000scientific and informational purposes and is not suitable for medical\u0000applications.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140324580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Navigating Eukaryotic Genome Annotation Pipelines: A Route Map to BRAKER, Galba, and TSEBRA 导航真核生物基因组注释管道：通往 BRAKER、Galba 和 TSEBRA 的路线图

arXiv - QuanBio - Genomics Pub Date : 2024-03-28 DOI: arxiv-2403.19416

Tomáš Brůna, Lars Gabriel, Katharina J. Hoff

{"title":"Navigating Eukaryotic Genome Annotation Pipelines: A Route Map to BRAKER, Galba, and TSEBRA","authors":"Tomáš Brůna, Lars Gabriel, Katharina J. Hoff","doi":"arxiv-2403.19416","DOIUrl":"https://doi.org/arxiv-2403.19416","url":null,"abstract":"Annotating the structure of protein-coding genes represents a major challenge\u0000in the analysis of eukaryotic genomes. This task sets the groundwork for\u0000subsequent genomic studies aimed at understanding the functions of individual\u0000genes. BRAKER and Galba are two fully automated and containerized pipelines\u0000designed to perform accurate genome annotation. BRAKER integrates the\u0000GeneMark-ETP and AUGUSTUS gene finders, employing the TSEBRA combiner to attain\u0000high sensitivity and precision. BRAKER is adept at handling genomes of any\u0000size, provided that it has access to both transcript expression sequencing data\u0000and an extensive protein database from the target clade. In particular, BRAKER\u0000demonstrates high accuracy even with only one type of these extrinsic evidence\u0000sources, although it should be noted that accuracy diminishes for larger\u0000genomes under such conditions. In contrast, Galba adopts a distinct methodology\u0000utilizing the outcomes of direct protein-to-genome spliced alignments using\u0000miniprot to generate training genes and evidence for gene prediction in\u0000AUGUSTUS. Galba has superior accuracy in large genomes if protein sequences are\u0000the only source of evidence. This chapter provides practical guidelines for\u0000employing both pipelines in the annotation of eukaryotic genomes, with a focus\u0000on insect genomes.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140324767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Genetic diversity of barley accessions and their response under abiotic stresses using different approaches 采用不同方法研究大麦品种的遗传多样性及其在非生物胁迫下的反应

arXiv - QuanBio - Genomics Pub Date : 2024-03-21 DOI: arxiv-2403.14181

Djshwar Dhahir Lateef, Nawroz Abdul-razzak Tahir

引用次数: 0

Path-GPTOmic: A Balanced Multi-modal Learning Framework for Survival Outcome Prediction Path-GPTOmic：用于生存结果预测的平衡多模态学习框架

arXiv - QuanBio - Genomics Pub Date : 2024-03-18 DOI: arxiv-2403.11375

Hongxiao Wang, Yang Yang, Zhuo Zhao, Pengfei Gu, Nishchal Sapkota, Danny Z. Chen

{"title":"Path-GPTOmic: A Balanced Multi-modal Learning Framework for Survival Outcome Prediction","authors":"Hongxiao Wang, Yang Yang, Zhuo Zhao, Pengfei Gu, Nishchal Sapkota, Danny Z. Chen","doi":"arxiv-2403.11375","DOIUrl":"https://doi.org/arxiv-2403.11375","url":null,"abstract":"For predicting cancer survival outcomes, standard approaches in clinical\u0000research are often based on two main modalities: pathology images for observing\u0000cell morphology features, and genomic (e.g., bulk RNA-seq) for quantifying gene\u0000expressions. However, existing pathology-genomic multi-modal algorithms face\u0000significant challenges: (1) Valuable biological insights regarding genes and\u0000gene-gene interactions are frequently overlooked; (2) one modality often\u0000dominates the optimization process, causing inadequate training for the other\u0000modality. In this paper, we introduce a new multi-modal ``Path-GPTOmic\"\u0000framework for cancer survival outcome prediction. First, to extract valuable\u0000biological insights, we regulate the embedding space of a foundation model,\u0000scGPT, initially trained on single-cell RNA-seq data, making it adaptable for\u0000bulk RNA-seq data. Second, to address the imbalance-between-modalities problem,\u0000we propose a gradient modulation mechanism tailored to the Cox partial\u0000likelihood loss for survival prediction. The contributions of the modalities\u0000are dynamically monitored and adjusted during the training process, encouraging\u0000that both modalities are sufficiently trained. Evaluated on two TCGA(The Cancer\u0000Genome Atlas) datasets, our model achieves substantially improved survival\u0000prediction accuracy.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140169327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0