Genome Biology最新文献

筛选
英文 中文
Benchmarking clustering, alignment, and integration methods for spatial transcriptomics 空间转录组学聚类、配准和整合方法的基准测试
IF 12.3 1区 生物学
Genome Biology Pub Date : 2024-08-09 DOI: 10.1186/s13059-024-03361-0
Yunfei Hu, Manfei Xie, Yikang Li, Mingxing Rao, Wenjun Shen, Can Luo, Haoran Qin, Jihoon Baek, Xin Maizie Zhou
{"title":"Benchmarking clustering, alignment, and integration methods for spatial transcriptomics","authors":"Yunfei Hu, Manfei Xie, Yikang Li, Mingxing Rao, Wenjun Shen, Can Luo, Haoran Qin, Jihoon Baek, Xin Maizie Zhou","doi":"10.1186/s13059-024-03361-0","DOIUrl":"https://doi.org/10.1186/s13059-024-03361-0","url":null,"abstract":"Spatial transcriptomics (ST) is advancing our understanding of complex tissues and organisms. However, building a robust clustering algorithm to define spatially coherent regions in a single tissue slice and aligning or integrating multiple tissue slices originating from diverse sources for essential downstream analyses remains challenging. Numerous clustering, alignment, and integration methods have been specifically designed for ST data by leveraging its spatial information. The absence of comprehensive benchmark studies complicates the selection of methods and future method development. In this study, we systematically benchmark a variety of state-of-the-art algorithms with a wide range of real and simulated datasets of varying sizes, technologies, species, and complexity. We analyze the strengths and weaknesses of each method using diverse quantitative and qualitative metrics and analyses, including eight metrics for spatial clustering accuracy and contiguity, uniform manifold approximation and projection visualization, layer-wise and spot-to-spot alignment accuracy, and 3D reconstruction, which are designed to assess method performance as well as data quality. The code used for evaluation is available on our GitHub. Additionally, we provide online notebook tutorials and documentation to facilitate the reproduction of all benchmarking results and to support the study of new methods and new datasets. Our analyses lead to comprehensive recommendations that cover multiple aspects, helping users to select optimal tools for their specific needs and guide future method development.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141909014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Creating large-scale genetic diversity in Arabidopsis via base editing-mediated deep artificial evolution 通过碱基编辑介导的深度人工进化在拟南芥中创造大规模遗传多样性
IF 12.3 1区 生物学
Genome Biology Pub Date : 2024-08-09 DOI: 10.1186/s13059-024-03358-9
Xiang Wang, Wenbo Pan, Chao Sun, Hong Yang, Zhentao Cheng, Fei Yan, Guojing Ma, Yun Shang, Rui Zhang, Caixia Gao, Lijing Liu, Huawei Zhang
{"title":"Creating large-scale genetic diversity in Arabidopsis via base editing-mediated deep artificial evolution","authors":"Xiang Wang, Wenbo Pan, Chao Sun, Hong Yang, Zhentao Cheng, Fei Yan, Guojing Ma, Yun Shang, Rui Zhang, Caixia Gao, Lijing Liu, Huawei Zhang","doi":"10.1186/s13059-024-03358-9","DOIUrl":"https://doi.org/10.1186/s13059-024-03358-9","url":null,"abstract":"Base editing is a powerful tool for artificial evolution to create allelic diversity and improve agronomic traits. However, the great evolutionary potential for every sgRNA target has been overlooked. And there is currently no high-throughput method for generating and characterizing as many changes in a single target as possible based on large mutant pools to permit rapid gene directed evolution in plants. In this study, we establish an efficient germline-specific evolution system to screen beneficial alleles in Arabidopsis which could be applied for crop improvement. This system is based on a strong egg cell-specific cytosine base editor and the large seed production of Arabidopsis, which enables each T1 plant with unedited wild type alleles to produce thousands of independent T2 mutant lines. It has the ability of creating a wide range of mutant lines, including those containing atypical base substitutions, and as well providing a space- and labor-saving way to store and screen the resulting mutant libraries. Using this system, we efficiently generate herbicide-resistant EPSPS, ALS, and HPPD variants that could be used in crop breeding. Here, we demonstrate the significant potential of base editing-mediated artificial evolution for each sgRNA target and devised an efficient system for conducting deep evolution to harness this potential.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141909082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferring clonal somatic mutations directed by X chromosome inactivation status in single cells 推断单细胞中由 X 染色体失活状态引导的克隆性体细胞突变
IF 12.3 1区 生物学
Genome Biology Pub Date : 2024-08-09 DOI: 10.1186/s13059-024-03360-1
Ilke Demirci, Anton J. M. Larsson, Xinsong Chen, Johan Hartman, Rickard Sandberg, Jonas Frisén
{"title":"Inferring clonal somatic mutations directed by X chromosome inactivation status in single cells","authors":"Ilke Demirci, Anton J. M. Larsson, Xinsong Chen, Johan Hartman, Rickard Sandberg, Jonas Frisén","doi":"10.1186/s13059-024-03360-1","DOIUrl":"https://doi.org/10.1186/s13059-024-03360-1","url":null,"abstract":"Analysis of clonal dynamics in human tissues is enabled by somatic genetic variation. Here, we show that analysis of mitochondrial mutations in single cells is dramatically improved in females when using X chromosome inactivation to select informative clonal mutations. Applying this strategy to human peripheral mononuclear blood cells reveals clonal structures within T cells that otherwise are blurred by non-informative mutations, including the separation of gamma-delta T cells, suggesting this approach can be used to decipher clonal dynamics of cells in human tissues.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141909012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transcriptional and epigenetic characterization of a new in vitro platform to model the formation of human pharyngeal endoderm 模拟人类咽部内胚层形成的新型体外平台的转录和表观遗传学特征
IF 12.3 1区 生物学
Genome Biology Pub Date : 2024-08-08 DOI: 10.1186/s13059-024-03354-z
Andrea Cipriano, Alessio Colantoni, Alessandro Calicchio, Jonathan Fiorentino, Danielle Gomes, Mahdi Moqri, Alexander Parker, Sajede Rasouli, Matthew Caldwell, Francesca Briganti, Maria Grazia Roncarolo, Antonio Baldini, Katja G. Weinacht, Gian Gaetano Tartaglia, Vittorio Sebastiano
{"title":"Transcriptional and epigenetic characterization of a new in vitro platform to model the formation of human pharyngeal endoderm","authors":"Andrea Cipriano, Alessio Colantoni, Alessandro Calicchio, Jonathan Fiorentino, Danielle Gomes, Mahdi Moqri, Alexander Parker, Sajede Rasouli, Matthew Caldwell, Francesca Briganti, Maria Grazia Roncarolo, Antonio Baldini, Katja G. Weinacht, Gian Gaetano Tartaglia, Vittorio Sebastiano","doi":"10.1186/s13059-024-03354-z","DOIUrl":"https://doi.org/10.1186/s13059-024-03354-z","url":null,"abstract":"The Pharyngeal Endoderm (PE) is an extremely relevant developmental tissue, serving as the progenitor for the esophagus, parathyroids, thyroids, lungs, and thymus. While several studies have highlighted the importance of PE cells, a detailed transcriptional and epigenetic characterization of this important developmental stage is still missing, especially in humans, due to technical and ethical constraints pertaining to its early formation. Here we fill this knowledge gap by developing an in vitro protocol for the derivation of PE-like cells from human Embryonic Stem Cells (hESCs) and by providing an integrated multi-omics characterization. Our PE-like cells robustly express PE markers and are transcriptionally homogenous and similar to in vivo mouse PE cells. In addition, we define their epigenetic landscape and dynamic changes in response to Retinoic Acid by combining ATAC-Seq and ChIP-Seq of histone modifications. The integration of multiple high-throughput datasets leads to the identification of new putative regulatory regions and to the inference of a Retinoic Acid-centered transcription factor network orchestrating the development of PE-like cells. By combining hESCs differentiation with computational genomics, our work reveals the epigenetic dynamics that occur during human PE differentiation, providing a solid resource and foundation for research focused on the development of PE derivatives and the modeling of their developmental defects in genetic syndromes.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141904320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Microsatellite instability at U2AF-binding polypyrimidic tract sites perturbs alternative splicing during colorectal cancer initiation U2AF结合多嘧啶束位点的微卫星不稳定性扰乱了结直肠癌发病过程中的替代剪接
IF 12.3 1区 生物学
Genome Biology Pub Date : 2024-08-06 DOI: 10.1186/s13059-024-03340-5
Vincent Jonchère, Hugo Montémont, Enora Le Scanf, Aurélie Siret, Quentin Letourneur, Emmanuel Tubacher, Christophe Battail, Assane Fall, Karim Labreche, Victor Renault, Toky Ratovomanana, Olivier Buhard, Ariane Jolly, Philippe Le Rouzic, Cody Feys, Emmanuelle Despras, Habib Zouali, Rémy Nicolle, Pascale Cervera, Magali Svrcek, Pierre Bourgoin, Hélène Blanché, Anne Boland, Jérémie Lefèvre, Yann Parc, Mehdi Touat, Franck Bielle, Danielle Arzur, Gwennina Cueff, Catherine Le Jossic-Corcos, Gaël Quéré, Gwendal Dujardin, Marc Blondel, Cédric Le Maréchal, Romain Cohen, Thierry André, Florence Coulet, Pierre de la Grange, Aurélien de Reyniès, Jean-François Fléjou, Florence Renaud, Agusti Alentorn, Laurent Corcos, Jean-François Deleuze, Ada Collura, Alex Duval
{"title":"Microsatellite instability at U2AF-binding polypyrimidic tract sites perturbs alternative splicing during colorectal cancer initiation","authors":"Vincent Jonchère, Hugo Montémont, Enora Le Scanf, Aurélie Siret, Quentin Letourneur, Emmanuel Tubacher, Christophe Battail, Assane Fall, Karim Labreche, Victor Renault, Toky Ratovomanana, Olivier Buhard, Ariane Jolly, Philippe Le Rouzic, Cody Feys, Emmanuelle Despras, Habib Zouali, Rémy Nicolle, Pascale Cervera, Magali Svrcek, Pierre Bourgoin, Hélène Blanché, Anne Boland, Jérémie Lefèvre, Yann Parc, Mehdi Touat, Franck Bielle, Danielle Arzur, Gwennina Cueff, Catherine Le Jossic-Corcos, Gaël Quéré, Gwendal Dujardin, Marc Blondel, Cédric Le Maréchal, Romain Cohen, Thierry André, Florence Coulet, Pierre de la Grange, Aurélien de Reyniès, Jean-François Fléjou, Florence Renaud, Agusti Alentorn, Laurent Corcos, Jean-François Deleuze, Ada Collura, Alex Duval","doi":"10.1186/s13059-024-03340-5","DOIUrl":"https://doi.org/10.1186/s13059-024-03340-5","url":null,"abstract":"Microsatellite instability (MSI) due to mismatch repair deficiency (dMMR) is common in colorectal cancer (CRC). These cancers are associated with somatic coding events, but the noncoding pathophysiological impact of this genomic instability is yet poorly understood. Here, we perform an analysis of coding and noncoding MSI events at the different steps of colorectal tumorigenesis using whole exome sequencing and search for associated splicing events via RNA sequencing at the bulk-tumor and single-cell levels. Our results demonstrate that MSI leads to hundreds of noncoding DNA mutations, notably at polypyrimidine U2AF RNA-binding sites which are endowed with cis-activity in splicing, while higher frequency of exon skipping events are observed in the mRNAs of MSI compared to non-MSI CRC. At the DNA level, these noncoding MSI mutations occur very early prior to cell transformation in the dMMR colonic crypt, accounting for only a fraction of the exon skipping in MSI CRC. At the RNA level, the aberrant exon skipping signature is likely to impair colonic cell differentiation in MSI CRC affecting the expression of alternative exons encoding protein isoforms governing cell fate, while also targeting constitutive exons, making dMMR cells immunogenic in early stage before the onset of coding mutations. This signature is characterized by its similarity to the oncogenic U2AF1-S34F splicing mutation observed in several other non-MSI cancer. Overall, these findings provide evidence that a very early RNA splicing signature partly driven by MSI impairs cell differentiation and promotes MSI CRC initiation, far before coding mutations which accumulate later during MSI tumorigenesis.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DNA-binding factor footprints and enhancer RNAs identify functional non-coding genetic variants DNA 结合因子足迹和增强子 RNA 识别功能性非编码基因变体
IF 12.3 1区 生物学
Genome Biology Pub Date : 2024-08-06 DOI: 10.1186/s13059-024-03352-1
Simon C. Biddie, Giovanna Weykopf, Elizabeth F. Hird, Elias T. Friman, Wendy A. Bickmore
{"title":"DNA-binding factor footprints and enhancer RNAs identify functional non-coding genetic variants","authors":"Simon C. Biddie, Giovanna Weykopf, Elizabeth F. Hird, Elias T. Friman, Wendy A. Bickmore","doi":"10.1186/s13059-024-03352-1","DOIUrl":"https://doi.org/10.1186/s13059-024-03352-1","url":null,"abstract":"Genome-wide association studies (GWAS) have revealed a multitude of candidate genetic variants affecting the risk of developing complex traits and diseases. However, the highlighted regions are typically in the non-coding genome, and uncovering the functional causative single nucleotide variants (SNVs) is challenging. Prioritization of variants is commonly based on genomic annotation with markers of active regulatory elements, but current approaches still poorly predict functional variants. To address this, we systematically analyze six markers of active regulatory elements for their ability to identify functional variants. We benchmark against molecular quantitative trait loci (molQTL) from assays of regulatory element activity that identify allelic effects on DNA-binding factor occupancy, reporter assay expression, and chromatin accessibility. We identify the combination of DNase footprints and divergent enhancer RNA (eRNA) as markers for functional variants. This signature provides high precision, but with a trade-off of low recall, thus substantially reducing candidate variant sets to prioritize variants for functional validation. We present this as a framework called FINDER—Functional SNV IdeNtification using DNase footprints and eRNA. We demonstrate the utility to prioritize variants using leukocyte count trait and analyze variants in linkage disequilibrium with a lead variant to predict a functional variant in asthma. Our findings have implications for prioritizing variants from GWAS, in development of predictive scoring algorithms, and for functionally informed fine mapping approaches.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient inference of large prokaryotic pangenomes with PanTA 利用 PanTA 高效推断大型原核生物泛基因组
IF 12.3 1区 生物学
Genome Biology Pub Date : 2024-08-06 DOI: 10.1186/s13059-024-03362-z
Duc Quang Le, Tien Anh Nguyen, Son Hoang Nguyen, Tam Thi Nguyen, Canh Hao Nguyen, Huong Thanh Phung, Tho Huu Ho, Nam S. Vo, Trang Nguyen, Hoang Anh Nguyen, Minh Duc Cao
{"title":"Efficient inference of large prokaryotic pangenomes with PanTA","authors":"Duc Quang Le, Tien Anh Nguyen, Son Hoang Nguyen, Tam Thi Nguyen, Canh Hao Nguyen, Huong Thanh Phung, Tho Huu Ho, Nam S. Vo, Trang Nguyen, Hoang Anh Nguyen, Minh Duc Cao","doi":"10.1186/s13059-024-03362-z","DOIUrl":"https://doi.org/10.1186/s13059-024-03362-z","url":null,"abstract":"Pangenome inference is an indispensable step in bacterial genomics, yet its scalability poses a challenge due to the rapid growth of genomic collections. This paper presents PanTA, a software package designed for constructing pangenomes of large bacterial datasets, showing unprecedented efficiency levels multiple times higher than existing tools. PanTA introduces a novel mechanism to construct the pangenome progressively without rebuilding the accumulated collection from scratch. The progressive mode is shown to consume orders of magnitude less computational resources than existing solutions in managing growing datasets. The software is open source and is publicly available at https://github.com/amromics/panta and at 10.6084/m9.figshare.23724705 .","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STdGCN: spatial transcriptomic cell-type deconvolution using graph convolutional networks STdGCN:利用图卷积网络进行空间转录组细胞类型解卷积
IF 12.3 1区 生物学
Genome Biology Pub Date : 2024-08-05 DOI: 10.1186/s13059-024-03353-0
Yawei Li, Yuan Luo
{"title":"STdGCN: spatial transcriptomic cell-type deconvolution using graph convolutional networks","authors":"Yawei Li, Yuan Luo","doi":"10.1186/s13059-024-03353-0","DOIUrl":"https://doi.org/10.1186/s13059-024-03353-0","url":null,"abstract":"Spatially resolved transcriptomics integrates high-throughput transcriptome measurements with preserved spatial cellular organization information. However, many technologies cannot reach single-cell resolution. We present STdGCN, a graph model leveraging single-cell RNA sequencing (scRNA-seq) as reference for cell-type deconvolution in spatial transcriptomic (ST) data. STdGCN incorporates expression profiles from scRNA-seq and spatial localization from ST data for deconvolution. Extensive benchmarking on multiple datasets demonstrates that STdGCN outperforms 17 state-of-the-art models. In a human breast cancer Visium dataset, STdGCN delineates stroma, lymphocytes, and cancer cells, aiding tumor microenvironment analysis. In human heart ST data, STdGCN identifies changes in endothelial-cardiomyocyte communications during tissue development.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141891853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scPriorGraph: constructing biosemantic cell–cell graphs with prior gene set selection for cell type identification from scRNA-seq data scPriorGraph:利用先验基因组选择构建生物语义细胞-细胞图谱,以便从 scRNA-seq 数据中识别细胞类型
IF 12.3 1区 生物学
Genome Biology Pub Date : 2024-08-05 DOI: 10.1186/s13059-024-03357-w
Xiyue Cao, Yu-An Huang, Zhu-Hong You, Xuequn Shang, Lun Hu, Peng-Wei Hu, Zhi-An Huang
{"title":"scPriorGraph: constructing biosemantic cell–cell graphs with prior gene set selection for cell type identification from scRNA-seq data","authors":"Xiyue Cao, Yu-An Huang, Zhu-Hong You, Xuequn Shang, Lun Hu, Peng-Wei Hu, Zhi-An Huang","doi":"10.1186/s13059-024-03357-w","DOIUrl":"https://doi.org/10.1186/s13059-024-03357-w","url":null,"abstract":"Cell type identification is an indispensable analytical step in single-cell data analyses. To address the high noise stemming from gene expression data, existing computational methods often overlook the biologically meaningful relationships between genes, opting to reduce all genes to a unified data space. We assume that such relationships can aid in characterizing cell type features and improving cell type recognition accuracy. To this end, we introduce scPriorGraph, a dual-channel graph neural network that integrates multi-level gene biosemantics. Experimental results demonstrate that scPriorGraph effectively aggregates feature values of similar cells using high-quality graphs, achieving state-of-the-art performance in cell type identification.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141891854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Current genomic deep learning models display decreased performance in cell type-specific accessible regions 当前的基因组深度学习模型在细胞类型特异性可访问区域的性能下降
IF 12.3 1区 生物学
Genome Biology Pub Date : 2024-08-01 DOI: 10.1186/s13059-024-03335-2
Pooja Kathail, Richard W. Shuai, Ryan Chung, Chun Jimmie Ye, Gabriel B. Loeb, Nilah M. Ioannidis
{"title":"Current genomic deep learning models display decreased performance in cell type-specific accessible regions","authors":"Pooja Kathail, Richard W. Shuai, Ryan Chung, Chun Jimmie Ye, Gabriel B. Loeb, Nilah M. Ioannidis","doi":"10.1186/s13059-024-03335-2","DOIUrl":"https://doi.org/10.1186/s13059-024-03335-2","url":null,"abstract":"A number of deep learning models have been developed to predict epigenetic features such as chromatin accessibility from DNA sequence. Model evaluations commonly report performance genome-wide; however, cis regulatory elements (CREs), which play critical roles in gene regulation, make up only a small fraction of the genome. Furthermore, cell type-specific CREs contain a large proportion of complex disease heritability. We evaluate genomic deep learning models in chromatin accessibility regions with varying degrees of cell type specificity. We assess two modeling directions in the field: general purpose models trained across thousands of outputs (cell types and epigenetic marks) and models tailored to specific tissues and tasks. We find that the accuracy of genomic deep learning models, including two state-of-the-art general purpose models―Enformer and Sei―varies across the genome and is reduced in cell type-specific accessible regions. Using accessibility models trained on cell types from specific tissues, we find that increasing model capacity to learn cell type-specific regulatory syntax―through single-task learning or high capacity multi-task models―can improve performance in cell type-specific accessible regions. We also observe that improving reference sequence predictions does not consistently improve variant effect predictions, indicating that novel strategies are needed to improve performance on variants. Our results provide a new perspective on the performance of genomic deep learning models, showing that performance varies across the genome and is particularly reduced in cell type-specific accessible regions. We also identify strategies to maximize performance in cell type-specific accessible regions.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":null,"pages":null},"PeriodicalIF":12.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信