{"title":"Review and Evaluate the Bioinformatics Analysis Strategies of ATAC-seq and CUT&Tag Data.","authors":"Siyuan Cheng,Benpeng Miao,Tiandao Li,Guoyan Zhao,Bo Zhang","doi":"10.1093/gpbjnl/qzae054","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae054","url":null,"abstract":"Efficient and reliable profiling methods are essential to study epigenetics. Tn5, one of the first identified prokaryotic transposases with high DNA-binding and tagmentation efficiency, is widely adopted in different genomic and epigenomic protocols for high-throughputly exploring the genome and epigenome. Based on Tn5, the Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) and the Cleavage Under Targets and Tagmentation (CUT&Tag) were developed to measure chromatin accessibility and detect DNA-protein interactions. These methodologies can be applied to large amounts of biological samples with low-input levels, such as rare tissues, embryos, and sorted single cells. However, fast and proper processing of these epigenomic data has become a bottleneck because massive data production continues to increase quickly. Furthermore, inappropriate data analysis can generate biased or misleading conclusions. Therefore, it is essential to evaluate the performance of Tn5-based ATAC-seq and CUT&Tag data processing bioinformatics tools, many of which were developed mostly for analyzing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data. Here, we conducted a comprehensive benchmarking analysis to evaluate the performance of eight popular software for processing ATAC-seq and CUT&Tag data. We compared the sensitivity, specificity, and peak width distribution for both narrow-type and broad-type peak calling. We also tested the influence of the availability of control IgG input in CUT&Tag data analysis. Finally, we evaluated the differential analysis strategies commonly used for analyzing the CUT&Tag data. Our study provided comprehensive guidance for selecting bioinformatics tools and recommended analysis strategies, which were implemented into Docker/Singularity images for streamlined data analysis.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wanlin Zhou, Haoran Shi, Zhiqiang Wang, Yuxin Huang, Lin Ni, Xudong Chen, Yan Liu, Haojie Li, Caixia Li, Yaxi Liu
{"title":"Identification of highly repetitive barley enhancers with long-range regulation potential via STARR-seq","authors":"Wanlin Zhou, Haoran Shi, Zhiqiang Wang, Yuxin Huang, Lin Ni, Xudong Chen, Yan Liu, Haojie Li, Caixia Li, Yaxi Liu","doi":"10.1093/gpbjnl/qzae012","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae012","url":null,"abstract":"<jats:title>Abstract</jats:title> Enhancers are DNA sequences that can strengthen transcription initiation. However, the global identification of plant enhancers is complicated due to uncertainty in the distance and orientation of enhancers, especially in species with large genomes. In this study, we performed self-transcribing active regulatory region sequencing (STARR-seq) for the first time to identify enhancers across the barley genome. A total of 7323 enhancers were successfully identified, and among 45 randomly selected enhancers, over 75% were effective as validated by a dual-luciferase reporter assay system in the lower epidermis of tobacco leaves. Interestingly, up to 53.5% of the barley enhancers were repetitive sequences, especially transposable elements (TEs), thus reinforcing the vital role of repetitive enhancers in gene expression. Both the common active transcription mark H3K4me3 and repressive histone mark H3K27me3 were abundant among the barley STARR-seq enhancers. In addition, the functional range of barley STARR-seq enhancers seemed much broader than that of rice or maize and extended to ± 100 kb of the gene body, and this finding was consistent with the high expression levels of genes in the genome. This work specifically depicts the unique features of barley enhancers and provides available barley enhancers for further utilization.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ming Xiao, Rui Wei, Jun Yu, Chujie Gao, Fengyi Yang, Le Zhang
{"title":"CpG island definition and methylation mapping of the T2T-YAO genome","authors":"Ming Xiao, Rui Wei, Jun Yu, Chujie Gao, Fengyi Yang, Le Zhang","doi":"10.1093/gpbjnl/qzae009","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae009","url":null,"abstract":"<jats:title>Abstract</jats:title> Precisely defining and mapping all cytosine positions and their clusters, known as CpG islands (CGIs), as well as their methylation status are pivotal for genome-wide epigenetic studies, especially when population-centric reference genomes are ready for timely application. Here we first align the two high-quality reference genomes, T2T-YAO and T2T-CHM13, from different ethnic backgrounds in a base-by-base fashion and compute their genome-wide density-defined and position-defined CGIs. Second, mapping some representative genome-wide methylation data from selected organs onto the two genomes, we find that there are about 4.7–5.8% sequence divergency of variable categories depending on quality cutoffs. Genes among the divergent sequences are mostly associated with neurological functions. Moreover, CGIs associated with the divergent sequences are significantly different with respect to CpG density and observed CpG/expected CpG (O/E) ratio between the two genomes. Finally, we find that the T2T-YAO genome not only has a greater CpG site coverage than that of the T2T-CHM13 genome when whole-genome bisulfite sequencing (WGBS) data from the European and American populations are mapped to each reference, but also show more hyper-methylated CpG sites as compared to the T2T-CHM13 genome. Our study suggests that future genome-wide epigenetic studies of the Chinese populations rely on both acquisition of high-quality methylation data and subsequent precision CGI mapping based on the Chinese T2T reference.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaofei Yang, Gaoyang Zheng, Peng Jia, Songbo Wang, Kai Ye
{"title":"Pindel-TD: a tandem duplication detector based on a pattern growth approach","authors":"Xiaofei Yang, Gaoyang Zheng, Peng Jia, Songbo Wang, Kai Ye","doi":"10.1093/gpbjnl/qzae008","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae008","url":null,"abstract":"<jats:title>Abstract</jats:title> Tandem duplication (TD) is a major type of structural variation (SV) that plays an important role in novel gene formation and human diseases. However, TDs are often missed or incorrectly classified as insertions by most modern SV detection methods due to the lack of specialized operation on TD-related mutational signals. Herein, we developed a TD detection module for the Pindel tool, referred to as Pindel-TD, based on a TD-specific pattern growth approach. Pindel-TD is capable of detecting TDs with a wide size range at single nucleotide resolution. Using simulated and real read data from HG002, we demonstrated that Pindel-TD outperforms other leading methods in terms of precision, recall, F1-score, and robustness. Furthermore, by applying Pindel-TD to data generated from the K562 cancer cell line, we identified a TD located at the seventh exon of SAGE1, providing an explanation for its high expression. Pindel-TD is available for non-commercial use at https://github.com/xjtu-omics/pindel.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manickam Ashokkumar, Wenwen Mei, Jackson J Peterson, Yuriko Harigaya, David M Murdoch, David M Margolis, Caleb Kornfein, Alex Oesterling, Zhicheng Guo, Cynthia D Rudin, Yuchao Jiang, Edward P Browne
{"title":"Integrated Single-cell Multiomic Analysis of HIV Latency Reversal Reveals Novel Regulators of Viral Reactivation","authors":"Manickam Ashokkumar, Wenwen Mei, Jackson J Peterson, Yuriko Harigaya, David M Murdoch, David M Margolis, Caleb Kornfein, Alex Oesterling, Zhicheng Guo, Cynthia D Rudin, Yuchao Jiang, Edward P Browne","doi":"10.1093/gpbjnl/qzae003","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae003","url":null,"abstract":"<jats:title>Abstract</jats:title> Despite the success of antiretroviral therapy, human immunodeficiency virus (HIV) cannot be cured because of a reservoir of latently infected cells that evades therapy. To understand the mechanisms of HIV latency, we employed an integrated single-cell RNA sequencing (RNA-seq) and single-cell assay for transposase-accessible chromatin with sequencing (ATAC-seq) approach to simultaneously profile the transcriptomic and epigenomic characteristics of ∼ 125,000 latently infected primary CD4 cells after reactivation using three different latency reversing agents. Differentially expressed genes and differentially accessible motifs were used to examine transcriptional pathways and transcription factor (TF) activities across the cell population. We identified cellular transcripts and TFs whose expression/activity was correlated with viral reactivation and demonstrated that a machine learning model trained on these data was 75%–79% accurate at predicting viral reactivation. Finally, we validated the role of two candidate HIV-regulating factors, FOXP1 and GATA3, in viral transcription. These data demonstrate the power of integrated multimodal single-cell analysis to uncover novel relationships between host cell factors and HIV latency.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SMARTdb: An Integrated Database for Exploring Single-cell Multi-omics Data of Reproductive Medicine","authors":"Zekai Liu, Zhen Yuan, Yunlei Guo, Ruilin Wang, Yusheng Guan, Zhanglian Wang, Yunan Chen, Tianlu Wang, Meining Jiang, Shuhui Bian","doi":"10.1093/gpbjnl/qzae005","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae005","url":null,"abstract":"<jats:title>Abstract</jats:title> Single-cell multi-omics sequencing has greatly accelerated reproductive research in recent years, and the data are continually growing. However, utilizing these data resources is challenging for wet-lab researchers. A comprehensive platform for exploring single-cell multi-omics data related to reproduction is urgently needed. Here we introduce the single-cell multi-omics atlas of reproduction (SMARTdb), which is an integrative and user-friendly platform for exploring molecular dynamics of reproductive development, aging, and disease, covering multi-omics, multi-species, and multi-stage data. We have curated and analyzed single-cell transcriptome and epigenome data of over 2.0 million cells from 6 species across whole lifespan. A series of powerful functionalities are provided, such as “Query gene expression”, “DIY expression plot”, “DNA methylation plot”, and “Epigenome browser”. With SMARTdb, we found that the male germ-cell-specific expression pattern of RPL39L and RPL10L is conserved between human and other model animals. Moreover, DNA hypomethylation and open chromatin may regulate the specific expression pattern of RPL39L collectively in both male and female germ cells. In summary, SMARTdb is a powerful platform for convenient data mining and gaining novel insights into reproductive development, aging, and disease. SMARTdb is publicly available at https://smart-db.cn.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Jia, Xuanhao Yang, Xiaofei Yang, Tingjie Wang, Yu Xu, Kai Ye
{"title":"MSIsensor-RNA: Microsatellite Instability Detection for Bulk and Single-cell Gene Expression Data","authors":"Peng Jia, Xuanhao Yang, Xiaofei Yang, Tingjie Wang, Yu Xu, Kai Ye","doi":"10.1093/gpbjnl/qzae004","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae004","url":null,"abstract":"<jats:title>Abstract</jats:title> Microsatellite instability (MSI) is an indispensable biomarker in cancer immunotherapy. Currently, MSI scoring methods by high-throughput omics methods have gained popularity and demonstrated better performance than the gold standard method for MSI detection. However, the MSI detection method on expression data, especially single-cell expression data, is still lacking, limiting the scope of clinical application and prohibiting the investigation of MSI at a single-cell level. Herein, we developed MSIsensor-RNA, an accurate, robust, adaptable, and standalone software to detect MSI status based on expression values of MSI-associated genes. We demonstrated the favorable performance and promise of MSIsensor-RNA in both bulk and single-cell gene expression data in multiplatform technologies including RNA sequencing (RNA-seq), microarray, and single-cell RNA-seq. MSIsensor-RNA is a versatile, efficient, and robust method for MSI status detection from both bulk and single-cell gene expression data in clinical studies and applications. MSIsensor-RNA is available at https://github.com/xjtu-omics/msisensor-rna.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transcriptome Dynamics and Cell Dialogs Between Oocytes and Granulosa Cells in Mouse Follicle Development","authors":"Wenju Liu, Xinyu Cui, Yuhan Zhang, Liang Gu, Yuanlin He, Jing Li, Shaorong Gao, Rui Gao, Cizhong Jiang","doi":"10.1093/gpbjnl/qzad001","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzad001","url":null,"abstract":"<jats:title>Abstract</jats:title> The development and maturation of follicles is a sophisticated and multistage process. The dynamic gene expression of oocytes and the surrounding somatic cells and the dialogs between these cells are critical to this process. We accurately classified the follicle development into nine stages and profiled the gene expression of mouse oocytes, the companion granulosa cells, and cumulus cells. The clustering of the transcriptomes showed the trajectory to the two distinct development courses of oocytes and the surrounding somatic cells. Gene expression changes precipitously increased at Type 4 stage and drastically droped afterwards within both oocytes and granulosa cells. Moreover, the number of differentially expressed genes between oocytes and granulosa cells dramatically increased at Type 4 stage, most of which persistently passed on to the later stages. Strikingly, cell communications within and between oocytes and granulosa cells became active from Type 4 onwards. Cell dialogs connected oocytes and granulosa cells in both unidirectional and bidirectional manners. TGFB2/3, TGFBR2/3, INHBA/B, and ACVR1/1B/2B of TGF-β signaling pathway functioned in the follicle development. NOTCH signaling pathway regulated the development of granulosa cells. Additionally, many maternally DNA methylation- or H3K27me3-imprinted genes remained active in granulosa cells but silent in oocytes during oogenesis. Collectively, Type 4 is the key turning point when significant transcription changes diverge the fate of oocytes and granulosa cells, and the cell dialogs become active to assure follicle development. These findings shed new insights into transcriptomic dynamics and cell dialogs facilitating the development and maturation of oocytes and follicles.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DVsc: An Automated Framework for Efficiently Detecting Viral Infection from Single-Cell Transcriptomics Data","authors":"Fei Leng, Song Mei, Xiaolin Zhou, Xuanshi Liu, Yefeng Yuan, Wenjian Xu, Chongyi Hao, Ruolan Guo, Chanjuan Hao, Wei Li, Peng Zhang","doi":"10.1093/gpbjnl/qzad007","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzad007","url":null,"abstract":"<jats:title>Abstract</jats:title> Single-cell RNA sequencing (scRNA-seq) has emerged as a valuable tool for studying cellular heterogeneity in various fields, particularly in virological research. By studying the viral and cellular transcriptomes, the dynamics of viral infection can be investigated at a single-cell resolution. However, limited studies have been conducted to investigate whether RNA transcripts from clinical samples contain substantial amounts of viral RNAs, and a specific computational framework for efficiently detecting viral reads based on scRNA-seq data has not been developed. Hence, we introduce DVsc, an open-source framework for precise quantitative analysis of viral infection from single-cell transcriptomics data. When applied to approximately 200 diverse clinical samples that were infected by more than 10 different viruses, DVsc demonstrated high accuracy in systematically detecting viral infection across a wide array of cell types. This innovative bioinformatics pipeline could be crucial for addressing the potential effects of surreptitiously invading viruses on certain illnesses, as well as for designing novel medicines to target viruses in specific host cell subsets and evaluating the efficacy of treatment. DVsc supports the FASTQ format as an input and is compatible with multiple single-cell sequencing platforms. Moreover, it could also be applied to sequences from bulk RNA-sequencing data. DVsc is available at http://62.234.32.33:5000/DVsc.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139921998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianting An, Jing Wang, Siming Kong, Shi Song, Wei Chen, Peng Yuan, Qilong He, Yidong Chen, Ye Li, Yi Yang, Wei Wang, Rong Li, Liying Yan, Zhiqiang Yan, Jie Qiao
{"title":"GametesOmics: A Comprehensive Multi-omics Database for Exploring the Gametogenesis in Humans and Mice","authors":"Jianting An, Jing Wang, Siming Kong, Shi Song, Wei Chen, Peng Yuan, Qilong He, Yidong Chen, Ye Li, Yi Yang, Wei Wang, Rong Li, Liying Yan, Zhiqiang Yan, Jie Qiao","doi":"10.1093/gpbjnl/qzad004","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzad004","url":null,"abstract":"<jats:title>Abstract</jats:title> Gametogenesis plays an important role in the reproduction and evolution of species. The transcriptomic and epigenetic alterations in this process can influence the reproductive capacity, fertilization, and embryonic development. The rapidly increasing single-cell studies have provided valuable multi-omics resources. However, data from different layers and sequencing platforms have not been uniformed and integrated, which greatly limits their use for exploring the molecular mechanisms that underlie oogenesis and spermatogenesis. Here, we developed GametesOmics, a comprehensive database that integrated the data of gene expression, DNA methylation, and chromatin accessibility during oogenesis and spermatogenesis in humans and mice. GametesOmics provides a user-friendly website and various tools, including Search and Advanced Search for querying the expression and epigenetic modification of each gene; Tools with Differentially expressed genes (DEGs) analysis for identifying DEGs, Correlation analysis for demonstrating the genetic and epigenetic changes, Visualization for displaying single-cell cluster and screening marker genes as well as master transcription factors (TFs), and MethylView for studying the genomic distribution of epigenetic modifications. GametesOmics also provides Genome Browser and Orthologs for tracking and comparing gene expression, DNA methylations, as well as chromatin accessibilities between humans and mice. GametesOmics offers a comprehensive resource for biologists and clinicians to decipher the cell fate transition in germ cell development, and can be accessed at http://gametesomics.cn/.","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139922081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}