Shenghan Gao, Yimeng Zhang, Stephen J Bush, Bo Wang, Xiaofei Yang, Kai Ye
{"title":"Centromere Landscapes Resolved from Hundreds of Human Genomes.","authors":"Shenghan Gao, Yimeng Zhang, Stephen J Bush, Bo Wang, Xiaofei Yang, Kai Ye","doi":"10.1093/gpbjnl/qzae071","DOIUrl":"10.1093/gpbjnl/qzae071","url":null,"abstract":"<p><p>High-fidelity (HiFi) sequencing has facilitated the assembly and analysis of the most repetitive region of the genome, the centromere. Nevertheless, our current understanding of human centromeres is based on a relatively small number of telomere-to-telomere assemblies, which have not yet captured its full diversity. In this study, we investigated the genomic diversity of human centromere higher order repeats (HORs) via both HiFi reads and haplotype-resolved assemblies from hundreds of samples drawn from ongoing pangenome-sequencing projects and reprocessed them via a novel HOR annotation pipeline, HiCAT-human. We used this wealth of data to provide a global survey of the centromeric HOR landscape; in particular, we found that 23 HORs presented significant copy number variability between populations. We detected three centromere genotypes with unbalanced population frequencies on chromosomes 5, 8, and 17. An inter-assembly comparison of HOR loci further revealed that while HOR array structures are diverse, they nevertheless tend to form a number of specific landscapes, each exhibiting different levels of HOR subunit expansion and possibly reflecting a cyclical evolutionary transition from homogeneous to nested structures and back.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11652271/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142484010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DeOri 10.0: An Updated Database of Experimentally Identified Eukaryotic Replication Origins.","authors":"Yu-Hao Zeng, Zhen-Ning Yin, Hao Luo, Feng Gao","doi":"10.1093/gpbjnl/qzae076","DOIUrl":"10.1093/gpbjnl/qzae076","url":null,"abstract":"<p><p>DNA replication is a complex and crucial biological process in eukaryotes. To facilitate the study of eukaryotic replication events, we present a database of eukaryotic DNA replication origins (DeOri), which collects genome-wide data on eukaryotic DNA replication origins currently available. With the rapid development of high-throughput experimental technology in recent years, the number of datasets in the new release of DeOri 10.0 increased from 10 to 151 and the number of sequences increased from 16,145 to 9,742,396. Besides nucleotide sequences and browser extensible data (BED) files, corresponding annotation files, such as coding sequences (CDSs), mRNAs, and other biological elements within replication origins, are also provided. The experimental techniques used for each dataset, as well as related statistical data, are also presented on web page. Differences in experimental methods, cell lines, and sequencing technologies have resulted in distinct replication origins, making it challenging to differentiate between cell-specific and non-specific replication origins. Based on multiple replication origin datasets at the species level, we scored and screened replication origins in Homo sapiens, Gallus gallus, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans. The screened regions with high scores were considered as species-conservative origins, which are integrated and presented as reference replication origins (rORIs). Additionally, we analyzed the distribution of relevant genomic elements associated with replication origins at the genome level, such as CpG island (CGI), transcription start site (TSS), and G-quadruplex (G4). These analysis results can be browsed and downloaded as needed at http://tubic.tju.edu.cn/deori/.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11652270/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142484012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengting Shao, Kaiyang Chen, Shuting Zhang, Min Tian, Yan Shen, Chen Cao, Ning Gu
{"title":"Multiome-wide Association Studies: Novel Approaches for Understanding Diseases.","authors":"Mengting Shao, Kaiyang Chen, Shuting Zhang, Min Tian, Yan Shen, Chen Cao, Ning Gu","doi":"10.1093/gpbjnl/qzae077","DOIUrl":"10.1093/gpbjnl/qzae077","url":null,"abstract":"<p><p>The rapid development of multiome (transcriptome, proteome, cistrome, imaging, and regulome)-wide association study methods have opened new avenues for biologists to understand the susceptibility genes underlying complex diseases. Thorough comparisons of these methods are essential for selecting the most appropriate tool for a given research objective. This review provides a detailed categorization and summary of the statistical models, use cases, and advantages of recent multiome-wide association studies. In addition, to illustrate gene-disease association studies based on transcriptome-wide association study (TWAS), we collected 478 disease entries across 22 categories from 235 manually reviewed publications. Our analysis reveals that mental disorders are the most frequently studied diseases by TWAS, indicating its potential to deepen our understanding of the genetic architecture of complex diseases. In summary, this review underscores the importance of multiome-wide association studies in elucidating complex diseases and highlights the significance of selecting the appropriate method for each study.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11630051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qianpeng Li, Yang Zhang, Sicheng Luo, Zhang Zhang, Ann L Oberg, David E Kozono, Hua Lu, Jann N Sarkaria, Lina Ma, Liguo Wang
{"title":"Identify Non-mutational p53 Functional Deficiency in Human Cancers.","authors":"Qianpeng Li, Yang Zhang, Sicheng Luo, Zhang Zhang, Ann L Oberg, David E Kozono, Hua Lu, Jann N Sarkaria, Lina Ma, Liguo Wang","doi":"10.1093/gpbjnl/qzae064","DOIUrl":"10.1093/gpbjnl/qzae064","url":null,"abstract":"<p><p>An accurate assessment of p53's functional statuses is critical for cancer genomic medicine. However, there is a significant challenge in identifying tumors with non-mutational p53 inactivation which is not detectable through DNA sequencing. These undetected cases are often misclassified as p53-normal, leading to inaccurate prognosis and downstream association analyses. To address this issue, we built the support vector machine (SVM) models to systematically reassess p53's functional statuses in TP53 wild-type (TP53WT) tumors from multiple The Cancer Genome Atlas (TCGA) cohorts. Cross-validation demonstrated the good performance of the SVM models with a mean area under the receiver operating characteristic curve (AUROC) of 0.9822, precision of 0.9747, and recall of 0.9784. Our study revealed that a significant proportion (87%-99%) of TP53WT tumors actually had compromised p53 function. Additional analyses uncovered that these genetically intact but functionally impaired (termed as predictively reduced function of p53 or TP53WT-pRF) tumors exhibited genomic and pathophysiologic features akin to TP53-mutant tumors: heightened genomic instability and elevated levels of hypoxia. Clinically, patients with TP53WT-pRF tumors experienced significantly shortened overall survival or progression-free survival compared to those with predictively normal function of p53 (TP53WT-pN) tumors, and these patients also displayed increased sensitivity to platinum-based chemotherapy and radiation therapy.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11702981/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142335109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evolution of Plant Genome Size and Composition.","authors":"Bing He, Wanfei Liu, Jianyang Li, Siwei Xiong, Jing Jia, Qiang Lin, Hailin Liu, Peng Cui","doi":"10.1093/gpbjnl/qzae078","DOIUrl":"10.1093/gpbjnl/qzae078","url":null,"abstract":"<p><p>The rapid development of sequencing technology has led to an explosion of plant genome data, opening up more opportunities for research in the field of comparative evolutionary analysis of plant genomes. In this review, we focus on changes in plant genome size and composition, examining the effects of polyploidy, whole-genome duplication, and alternations in transposable elements on plant genome architecture and evolution, respectively. In addition, to address gaps in the available information, we also collected and analyzed 234 representative plant genome data as a supplement. We aim to provide a comprehensive, up-to-date summary of information on plant genome architecture and evolution in this review.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11630846/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142585409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qingqing Shi, Min Dai, Yingke Ma, Jun Liu, Xiuying Liu, Xiu-Jie Wang
{"title":"DRED: A Comprehensive Database of Genes Related to Repeat Expansion Diseases.","authors":"Qingqing Shi, Min Dai, Yingke Ma, Jun Liu, Xiuying Liu, Xiu-Jie Wang","doi":"10.1093/gpbjnl/qzae068","DOIUrl":"10.1093/gpbjnl/qzae068","url":null,"abstract":"<p><p>Expansion of tandem repeats in genes often causes severe diseases, such as fragile X syndrome, Huntington's disease, and spinocerebellar ataxia. However, information on genes associated with repeat expansion diseases is scattered throughout the literature, systematic prediction of potential genes that may cause diseases via repeat expansion is also lacking. Here, we develop DRED, a Database of genes related to Repeat Expansion Diseases, as a manually-curated database that covers all known 61 genes related to repeat expansion diseases reported in PubMed and OMIM, along with the detailed repeat information for each gene. DRED also includes 516 genes with the potential to cause diseases via repeat expansion, which were predicted based on their repeat composition, genetic variations, genomic features, and disease associations. Various types of information on repeat expansion diseases and their corresponding genes/repeats are presented in DRED, together with links to external resources, such as NCBI and ClinVar. DRED provides user-friendly interfaces with comprehensive functions, and can serve as a central data resource for basic research and repeat expansion disease-related medical diagnosis. DRED is freely accessible at http://omicslab.genetics.ac.cn/dred, and will be frequently updated to include newly reported genes related to repeat expansion diseases.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11696699/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142335108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Variant Calling in Whole-exome Sequencing Data Using Population-matched Reference Genomes.","authors":"Shuming Guo, Zhuo Huang, Yanming Zhang, Yukun He, Xiangju Chen, Wenjuan Wang, Lansheng Li, Yu Kang, Zhancheng Gao, Jun Yu, Zhenglin Du, Yanan Chu","doi":"10.1093/gpbjnl/qzae070","DOIUrl":"10.1093/gpbjnl/qzae070","url":null,"abstract":"<p><p>Whole-exome sequencing (WES) data are frequently used for cancer diagnosis and genome-wide association studies (GWAS), based on high-coverage read mapping, informative variant calling, and high-quality reference genomes. The center position of the currently used genome assembly, GRCh38, is now challenged by two newly published telomere-to-telomere (T2T) genomes, T2T-CHM13 and T2T-YAO, and it becomes urgent to have a comparative study to test population specificity using the three reference genomes based on real case WES data. Here, we report our analysis along this line for 19 tumor samples collected from Chinese patients. The primary comparison of the exon regions among the three references reveals that the sequences in up to ∼ 1% of target regions in T2T-YAO are widely diversified from GRCh38 and may lead to off-target in sequence capture. However, T2T-YAO still outperforms GRCh38 by obtaining 7.41% of more mapped reads. Due to more reliable read-mapping and closer phylogenetic relationship with the samples than GRCh38, T2T-YAO reduces half of variant calls of clinical significance which are mostly benign, while maintaining sensitivity in identifying pathogenic variants. T2T-YAO also outperforms T2T-CHM13 in reducing calls of Chinese-specific variants. Our findings highlight the critical need for employing population-specific reference genomes in genomic analysis to ensure accurate variant analysis and the significant benefits of tailoring these approaches to the unique genetic background of each ethnic group.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11687947/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ping Xu, Zhiheng Yuan, Xiaohua Lu, Peng Zhou, Ding Qiu, Zhenghao Qiao, Zhongcheng Zhou, Li Guan, Yongkang Jia, Xuan He, Ling Sun, Youzhong Wan, Ming Wang, Yang Yu
{"title":"RAG-seq: NSR-primed and Transposase Tagmentation-mediated Strand-specific Total RNA Sequencing in Single Cells.","authors":"Ping Xu, Zhiheng Yuan, Xiaohua Lu, Peng Zhou, Ding Qiu, Zhenghao Qiao, Zhongcheng Zhou, Li Guan, Yongkang Jia, Xuan He, Ling Sun, Youzhong Wan, Ming Wang, Yang Yu","doi":"10.1093/gpbjnl/qzae072","DOIUrl":"10.1093/gpbjnl/qzae072","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) has transformed our understanding of cellular diversity with unprecedented resolution. However, many current methods are limited in capturing full-length transcripts and discerning strand orientation. Here, we present RAG-seq, an innovative strand-specific total RNA sequencing technique that combines not-so-random (NSR) primers with Tn5 transposase-mediated tagmentation. RAG-seq overcomes previous limitations by delivering comprehensive transcript coverage and maintaining strand orientation, which are essential for accurate quantification of overlapping genes and detection of antisense transcripts. Through optimized reverse transcription with oligo-dT primers, rRNA depletion via Depletion of Abundant Sequences by Hybridization (DASH), and linear amplification, RAG-seq enhances sensitivity and reproducibility, especially for low-input samples and single cells. Application to mouse oocytes and early embryos highlights RAG-seq's superior performance in identifying stage-specific antisense transcripts, shedding light on their regulatory roles during early development. This advancement represents a significant leap in transcriptome analysis within complex biological contexts.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658833/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142484016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanfang Lu, Liu Yang, Qi Feng, Yong Liu, Xiaohui Sun, Dongwei Liu, Long Qiao, Zhangsuo Liu
{"title":"RNA 5-Methylcytosine Modification: Regulatory Molecules, Biological Functions, and Human Diseases.","authors":"Yanfang Lu, Liu Yang, Qi Feng, Yong Liu, Xiaohui Sun, Dongwei Liu, Long Qiao, Zhangsuo Liu","doi":"10.1093/gpbjnl/qzae063","DOIUrl":"10.1093/gpbjnl/qzae063","url":null,"abstract":"<p><p>RNA methylation modifications influence gene expression, and disruptions of these processes are often associated with various human diseases. The common RNA methylation modification 5-methylcytosine (m5C), which is dynamically regulated by writers, erasers, and readers, widely occurs in transfer RNAs (tRNAs), messenger RNAs (mRNAs), ribosomal RNAs (rRNAs), enhancer RNAs (eRNAs), and other non-coding RNAs (ncRNAs). RNA m5C modification regulates metabolism, stability, nuclear export, and translation of RNA molecules. An increasing number of studies have revealed the critical roles of the m5C RNA modification and its regulators in the development, diagnosis, prognosis, and treatment of various human diseases. In this review, we summarized the recent studies on RNA m5C modification and discussed the advances in its detection methodologies, distribution, and regulators. Furthermore, we addressed the significance of RNAs modified with m5C marks in essential biological processes as well as in the development of various human disorders, from neurological diseases to cancers. This review provides a new perspective on the diagnosis, treatment, and monitoring of human diseases by elucidating the complex regulatory network of the epigenetic m5C modification.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11634542/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142335110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiawei Luo, Kejuan Zhao, Junjie Chen, Caihua Yang, Fuchuan Qu, Yumeng Liu, Xiaopeng Jin, Ke Yan, Yang Zhang, Bin Liu
{"title":"iMFP-LG: Identification of Novel Multi-Functional Peptides by Using Protein Language Models and Graph-Based Deep Learning.","authors":"Jiawei Luo, Kejuan Zhao, Junjie Chen, Caihua Yang, Fuchuan Qu, Yumeng Liu, Xiaopeng Jin, Ke Yan, Yang Zhang, Bin Liu","doi":"10.1093/gpbjnl/qzae084","DOIUrl":"https://doi.org/10.1093/gpbjnl/qzae084","url":null,"abstract":"<p><p>Functional peptides are short amino acid fragments that have a wide range of beneficial functions for living organisms. The majority of previous research focused on mono-functional peptides, but a growing number of multi-functional peptides have been discovered. Although there have been enormous experimental efforts to assay multi-functional peptides, only a small fraction of millions of known peptides have been explored. Effective and precise techniques for identifying multi-functional peptides can facilitate their discovery and mechanistic understanding. In this article, we presented a method iMFP-LG for identifying multi-functional peptides based on protein language models (pLMs) and graph attention networks (GATs). Comparison results showed that iMFP-LG outperforms state-of-the-art methods on both multi-functional bioactive peptides and multi-functional therapeutic peptides datasets. The interpretability of iMFP-LG was also illustrated by visualizing attention patterns in pLMs and GATs. Regarding the outstanding performance of iMFP-LG on the identification of multi-functional peptides, we employed iMFP-LG to screen novel candidate peptides with both ACP and AMP functions from millions of known peptides in the UniRef90. As a result, 8 candidate peptides were identified, and 1 candidate that exhibits both antibacterial and anticancer effects was confirmed through molecular structure alignment and biological experiments. We anticipate that iMFP-LG can assist in the discovery of multi-functional peptides and contribute to the advancement of peptide drug design.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142712263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}