{"title":"MitoSort: Robust Demultiplexing of Pooled Single-cell Genomic Data Using Endogenous Mitochondrial Variants.","authors":"Zhongjie Tang, Weixing Zhang, Peiyu Shi, Sijun Li, Xinhui Li, Yueming Li, Yicong Xu, Yaqing Shu, Zheng Hu, Jin Xu","doi":"10.1093/gpbjnl/qzae073","DOIUrl":"10.1093/gpbjnl/qzae073","url":null,"abstract":"<p><p>Multiplexing across donors has emerged as a popular strategy to increase throughput, reduce costs, overcome technical batch effects, and improve doublet detection in single-cell genomic studies. To eliminate additional experimental steps, endogenous nuclear genome variants are used for demultiplexing pooled single-cell RNA sequencing (scRNA-seq) data by several computational tools. However, these tools have limitations when applied to single-cell sequencing methods that do not cover nuclear genomic regions well, such as single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq). Here, we demonstrate that mitochondrial germline variants are an alternative, robust, and computationally efficient endogenous barcode for sample demultiplexing. We propose MitoSort, a tool that uses mitochondrial germline variants to assign cells to their donor origins and identify cross-genotype doublets in single-cell genomic datasets. We evaluate its performance by using in silico pooled mitochondrial scATAC-seq (mtscATAC-seq) libraries and experimentally multiplexed data with cell hashtags. MitoSort achieves high accuracy and efficiency in genotype clustering and doublet detection for mtscATAC-seq data, addressing the limitations of current computational techniques tailored for scRNA-seq data. Moreover, MitoSort exhibits versatility, and can be applied to various single-cell sequencing approaches beyond mtscATAC-seq provided that the mitochondrial variants are reliably detected. Furthermore, we demonstrate the application of MitoSort in a case study where B cells from eight donors were pooled and assayed by single-cell multi-omics sequencing. Altogether, our results demonstrate the accuracy and efficiency of MitoSort, which enables reliable sample demultiplexing in various single-cell genomic applications. MitoSort is available at https://github.com/tangzhj/MitoSort.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11671100/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142484015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Virus Infection Induces Immune Gene Activation with CTCF-anchored Enhancers and Chromatin Interactions in Pig Genome.","authors":"Jianhua Cao, Ruimin Ren, Xiaolong Li, Xiaoqian Zhang, Yan Sun, Xiaohuan Tian, Ru Liu, Xiangdong Liu, Yijun Ruan, Guoliang Li, Shuhong Zhao","doi":"10.1093/gpbjnl/qzae062","DOIUrl":"10.1093/gpbjnl/qzae062","url":null,"abstract":"<p><p>Chromatin organization is important for gene transcription in pig genome. However, its three-dimensional (3D) structure and dynamics are much less investigated than those in human. Here, we applied the long-read chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) method to map the whole-genome chromatin interactions mediated by CCCTC-binding factor (CTCF) and RNA polymerase II (RNAPII) in porcine macrophage cells before and after polyinosinic-polycytidylic acid [Poly(I:C)] induction. Our results reveal that Poly(I:C) induction impacts the 3D genome organization in the 3D4/21 cells at the fine-scale chromatin loop level rather than at the large-scale domain level. Furthermore, our findings underscore the pivotal role of CTCF-anchored chromatin interactions in reshaping chromatin architecture during immune responses. Knockout of the CTCF-binding locus further confirms that the CTCF-anchored enhancers are associated with the activation of immune genes via long-range interactions. Notably, the ChIA-PET data also support the spatial relationship between single nucleotide polymorphisms (SNPs) and related gene transcription in 3D genome aspect. Our findings in this study provide new clues and potential targets to explore key elements related to diseases in pigs and are also likely to shed light on elucidating chromatin organization and dynamics underlying the process of mammalian infectious diseases.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11725346/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huimin Chen, Jiaxin Liu, Gege Tang, Gefei Hao, Guangfu Yang
{"title":"Bioinformatic Resources for Exploring Human-virus Protein-protein Interactions Based on Binding Modes.","authors":"Huimin Chen, Jiaxin Liu, Gege Tang, Gefei Hao, Guangfu Yang","doi":"10.1093/gpbjnl/qzae075","DOIUrl":"10.1093/gpbjnl/qzae075","url":null,"abstract":"<p><p>Historically, there have been many outbreaks of viral diseases that have continued to claim millions of lives. Research on human-virus protein-protein interactions (PPIs) is vital to understanding the principles of human-virus relationships, providing an essential foundation for developing virus control strategies to combat diseases. The rapidly accumulating data on human-virus PPIs offer unprecedented opportunities for bioinformatics research around human-virus PPIs. However, available detailed analyses and summaries to help use these resources systematically and efficiently are lacking. Here, we comprehensively review the bioinformatic resources used in human-virus PPI research, and discuss and compare their functions, performance, and limitations. This review aims to provide researchers with a bioinformatic toolbox that will hopefully better facilitate the exploration of human-virus PPIs based on binding modes.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142484009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenyong Du, Gregory Gelembiuk, Wynne Moss, Andrew Tritt, Carol Eunmi Lee
{"title":"The Genome Architecture of the Copepod Eurytemora carolleeae - the Highly Invasive Atlantic Clade of the Eurytemoraaffinis Species Complex.","authors":"Zhenyong Du, Gregory Gelembiuk, Wynne Moss, Andrew Tritt, Carol Eunmi Lee","doi":"10.1093/gpbjnl/qzae066","DOIUrl":"10.1093/gpbjnl/qzae066","url":null,"abstract":"<p><p>Copepods are among the most abundant organisms on the planet and play critical functions in aquatic ecosystems. Among copepods, populations of the Eurytemora affinis species complex are numerically dominant in many coastal habitats and serve as food sources for major fisheries. Intriguingly, certain populations possess the unusual capacity to invade novel salinities on rapid time scales. Despite their ecological importance, high-quality genomic resources have been absent for calanoid copepods, limiting our ability to comprehensively dissect the genome architecture underlying the highly invasive and adaptive capacity of certain populations. Here, we present the first chromosome-level genome of a calanoid copepod, from the Atlantic clade (Eurytemora carolleeae) of the E. affinis species complex. This genome was assembled using high-coverage PacBio long-read and Hi-C sequences of an inbred line, generated through 30 generations of full-sib mating. This genome, consisting of 529.3 Mb (contig N50 = 4.2 Mb, scaffold N50 = 140.6 Mb), was anchored onto four chromosomes. Genome annotation predicted 20,262 protein-coding genes, of which ion transport-related gene families were substantially expanded based on comparative analyses of 12 additional arthropod genomes. Also, we found genome-wide signatures of historical gene body methylation of the ion transport-related genes and the significant clustering of these genes on each chromosome. This genome represents one of the most contiguous copepod genomes to date and is among the highest quality marine invertebrate genomes. As such, this genome provides an invaluable resource to help yield fundamental insights into the ability of this copepod to adapt to rapidly changing environments.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11706791/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142335111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zheng Fang, Mingming Dong, Hongqiang Qin, Mingliang Ye
{"title":"GP-Plotter: Flexible Spectral Visualization for Proteomics Data with Emphasis on Glycoproteomics Analysis.","authors":"Zheng Fang, Mingming Dong, Hongqiang Qin, Mingliang Ye","doi":"10.1093/gpbjnl/qzae069","DOIUrl":"10.1093/gpbjnl/qzae069","url":null,"abstract":"<p><p>Identification evaluation and result dissemination are essential components in mass spectrometry-based proteomics analysis. The visualization of fragment ions in mass spectrum provides strong evidence for peptide identification and modification localization. Here, we present an easy-to-use tool, named GP-Plotter, for ion annotation of tandem mass spectra and corresponding image output. Identification result files of common searching tools in the community and user-customized files are supported as input of GP-Plotter. Multiple display modes and parameter customization can be achieved in GP-Plotter to present annotated spectra of interest. Different image formats, especially vector graphic formats, are available for image generation which is favorable for data publication. Notably, GP-Plotter is also well-suited for the visualization and evaluation of glycopeptide spectrum assignments with comprehensive annotation of glycan fragment ions. With a user-friendly graphical interface, GP-Plotter is expected to be a universal visualization tool for the community. GP-Plotter has been implemented in the latest version of Glyco-Decipher (v1.0.4) and the standalone GP-Plotter software is also freely available at https://github.com/DICP-1809.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11661977/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shenghan Gao, Yimeng Zhang, Stephen J Bush, Bo Wang, Xiaofei Yang, Kai Ye
{"title":"Centromere Landscapes Resolved from Hundreds of Human Genomes.","authors":"Shenghan Gao, Yimeng Zhang, Stephen J Bush, Bo Wang, Xiaofei Yang, Kai Ye","doi":"10.1093/gpbjnl/qzae071","DOIUrl":"10.1093/gpbjnl/qzae071","url":null,"abstract":"<p><p>High-fidelity (HiFi) sequencing has facilitated the assembly and analysis of the most repetitive region of the genome, the centromere. Nevertheless, our current understanding of human centromeres is based on a relatively small number of telomere-to-telomere assemblies, which have not yet captured its full diversity. In this study, we investigated the genomic diversity of human centromere higher order repeats (HORs) via both HiFi reads and haplotype-resolved assemblies from hundreds of samples drawn from ongoing pangenome-sequencing projects and reprocessed them via a novel HOR annotation pipeline, HiCAT-human. We used this wealth of data to provide a global survey of the centromeric HOR landscape; in particular, we found that 23 HORs presented significant copy number variability between populations. We detected three centromere genotypes with unbalanced population frequencies on chromosomes 5, 8, and 17. An inter-assembly comparison of HOR loci further revealed that while HOR array structures are diverse, they nevertheless tend to form a number of specific landscapes, each exhibiting different levels of HOR subunit expansion and possibly reflecting a cyclical evolutionary transition from homogeneous to nested structures and back.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11652271/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142484010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DeOri 10.0: An Updated Database of Experimentally Identified Eukaryotic Replication Origins.","authors":"Yu-Hao Zeng, Zhen-Ning Yin, Hao Luo, Feng Gao","doi":"10.1093/gpbjnl/qzae076","DOIUrl":"10.1093/gpbjnl/qzae076","url":null,"abstract":"<p><p>DNA replication is a complex and crucial biological process in eukaryotes. To facilitate the study of eukaryotic replication events, we present a database of eukaryotic DNA replication origins (DeOri), which collects genome-wide data on eukaryotic DNA replication origins currently available. With the rapid development of high-throughput experimental technology in recent years, the number of datasets in the new release of DeOri 10.0 increased from 10 to 151 and the number of sequences increased from 16,145 to 9,742,396. Besides nucleotide sequences and browser extensible data (BED) files, corresponding annotation files, such as coding sequences (CDSs), mRNAs, and other biological elements within replication origins, are also provided. The experimental techniques used for each dataset, as well as related statistical data, are also presented on web page. Differences in experimental methods, cell lines, and sequencing technologies have resulted in distinct replication origins, making it challenging to differentiate between cell-specific and non-specific replication origins. Based on multiple replication origin datasets at the species level, we scored and screened replication origins in Homo sapiens, Gallus gallus, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans. The screened regions with high scores were considered as species-conservative origins, which are integrated and presented as reference replication origins (rORIs). Additionally, we analyzed the distribution of relevant genomic elements associated with replication origins at the genome level, such as CpG island (CGI), transcription start site (TSS), and G-quadruplex (G4). These analysis results can be browsed and downloaded as needed at http://tubic.tju.edu.cn/deori/.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11652270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142484012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengting Shao, Kaiyang Chen, Shuting Zhang, Min Tian, Yan Shen, Chen Cao, Ning Gu
{"title":"Multiome-wide Association Studies: Novel Approaches for Understanding Diseases.","authors":"Mengting Shao, Kaiyang Chen, Shuting Zhang, Min Tian, Yan Shen, Chen Cao, Ning Gu","doi":"10.1093/gpbjnl/qzae077","DOIUrl":"10.1093/gpbjnl/qzae077","url":null,"abstract":"<p><p>The rapid development of multiome (transcriptome, proteome, cistrome, imaging, and regulome)-wide association study methods have opened new avenues for biologists to understand the susceptibility genes underlying complex diseases. Thorough comparisons of these methods are essential for selecting the most appropriate tool for a given research objective. This review provides a detailed categorization and summary of the statistical models, use cases, and advantages of recent multiome-wide association studies. In addition, to illustrate gene-disease association studies based on transcriptome-wide association study (TWAS), we collected 478 disease entries across 22 categories from 235 manually reviewed publications. Our analysis reveals that mental disorders are the most frequently studied diseases by TWAS, indicating its potential to deepen our understanding of the genetic architecture of complex diseases. In summary, this review underscores the importance of multiome-wide association studies in elucidating complex diseases and highlights the significance of selecting the appropriate method for each study.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11630051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qianpeng Li, Yang Zhang, Sicheng Luo, Zhang Zhang, Ann L Oberg, David E Kozono, Hua Lu, Jann N Sarkaria, Lina Ma, Liguo Wang
{"title":"Identify Non-mutational p53 Functional Deficiency in Human Cancers.","authors":"Qianpeng Li, Yang Zhang, Sicheng Luo, Zhang Zhang, Ann L Oberg, David E Kozono, Hua Lu, Jann N Sarkaria, Lina Ma, Liguo Wang","doi":"10.1093/gpbjnl/qzae064","DOIUrl":"10.1093/gpbjnl/qzae064","url":null,"abstract":"<p><p>An accurate assessment of p53's functional statuses is critical for cancer genomic medicine. However, there is a significant challenge in identifying tumors with non-mutational p53 inactivation which is not detectable through DNA sequencing. These undetected cases are often misclassified as p53-normal, leading to inaccurate prognosis and downstream association analyses. To address this issue, we built the support vector machine (SVM) models to systematically reassess p53's functional statuses in TP53 wild-type (TP53WT) tumors from multiple The Cancer Genome Atlas (TCGA) cohorts. Cross-validation demonstrated the good performance of the SVM models with a mean area under the receiver operating characteristic curve (AUROC) of 0.9822, precision of 0.9747, and recall of 0.9784. Our study revealed that a significant proportion (87%-99%) of TP53WT tumors actually had compromised p53 function. Additional analyses uncovered that these genetically intact but functionally impaired (termed as predictively reduced function of p53 or TP53WT-pRF) tumors exhibited genomic and pathophysiologic features akin to TP53-mutant tumors: heightened genomic instability and elevated levels of hypoxia. Clinically, patients with TP53WT-pRF tumors experienced significantly shortened overall survival or progression-free survival compared to those with predictively normal function of p53 (TP53WT-pN) tumors, and these patients also displayed increased sensitivity to platinum-based chemotherapy and radiation therapy.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11702981/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142335109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evolution of Plant Genome Size and Composition.","authors":"Bing He, Wanfei Liu, Jianyang Li, Siwei Xiong, Jing Jia, Qiang Lin, Hailin Liu, Peng Cui","doi":"10.1093/gpbjnl/qzae078","DOIUrl":"10.1093/gpbjnl/qzae078","url":null,"abstract":"<p><p>The rapid development of sequencing technology has led to an explosion of plant genome data, opening up more opportunities for research in the field of comparative evolutionary analysis of plant genomes. In this review, we focus on changes in plant genome size and composition, examining the effects of polyploidy, whole-genome duplication, and alternations in transposable elements on plant genome architecture and evolution, respectively. In addition, to address gaps in the available information, we also collected and analyzed 234 representative plant genome data as a supplement. We aim to provide a comprehensive, up-to-date summary of information on plant genome architecture and evolution in this review.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11630846/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142585409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}