A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq

IF 11.5 2区生物学 Q1 GENETICS & HEREDITY

Genomics, Proteomics & Bioinformatics Pub Date : 2023-02-01 DOI:10.1016/j.gpb.2022.09.005

Wenbin Ye , Qiwei Lian , Congting Ye , Xiaohui Wu

{"title":"A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq","authors":"Wenbin Ye , Qiwei Lian , Congting Ye , Xiaohui Wu","doi":"10.1016/j.gpb.2022.09.005","DOIUrl":null,"url":null,"abstract":"<div>Alternative polyadenylation (APA) plays important roles in modulating mRNA stability, translation, and subcellular localization, and contributes extensively to shaping eukaryotic transcriptome complexity and proteome diversity. Identification of poly(A) sites (pAs) on a genome-wide scale is a critical step toward understanding the underlying mechanism of APA-mediated gene regulation. A number of established computational tools have been proposed to predict pAs from diverse genomic data. Here we provided an exhaustive overview of computational approaches for predicting pAs from DNA sequences, bulk RNA sequencing (RNA-seq) data, and single-cell RNA sequencing (scRNA-seq) data. Particularly, we examined several representative tools using bulk RNA-seq and scRNA-seq data from peripheral blood mononuclear cells and put forward operable suggestions on how to assess the reliability of pAs predicted by different tools. We also proposed practical guidelines on choosing appropriate methods applicable to diverse scenarios. Moreover, we discussed in depth the challenges in improving the performance of pA prediction and benchmarking different methods. Additionally, we highlighted outstanding challenges and opportunities using new machine learning and integrative multi-omics techniques, and provided our perspective on how computational methodologies might evolve in the future for non-3′ untranslated region, tissue-specific, cross-species, and single-cell pA prediction.</div>","PeriodicalId":12528,"journal":{"name":"Genomics, Proteomics & Bioinformatics","volume":"21 1","pages":"Pages 67-83"},"PeriodicalIF":11.5000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/ff/97/main.PMC10372920.pdf","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics, Proteomics & Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1672022922001218","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 6

Abstract

Alternative polyadenylation (APA) plays important roles in modulating mRNA stability, translation, and subcellular localization, and contributes extensively to shaping eukaryotic transcriptome complexity and proteome diversity. Identification of poly(A) sites (pAs) on a genome-wide scale is a critical step toward understanding the underlying mechanism of APA-mediated gene regulation. A number of established computational tools have been proposed to predict pAs from diverse genomic data. Here we provided an exhaustive overview of computational approaches for predicting pAs from DNA sequences, bulk RNA sequencing (RNA-seq) data, and single-cell RNA sequencing (scRNA-seq) data. Particularly, we examined several representative tools using bulk RNA-seq and scRNA-seq data from peripheral blood mononuclear cells and put forward operable suggestions on how to assess the reliability of pAs predicted by different tools. We also proposed practical guidelines on choosing appropriate methods applicable to diverse scenarios. Moreover, we discussed in depth the challenges in improving the performance of pA prediction and benchmarking different methods. Additionally, we highlighted outstanding challenges and opportunities using new machine learning and integrative multi-omics techniques, and provided our perspective on how computational methodologies might evolve in the future for non-3′ untranslated region, tissue-specific, cross-species, and single-cell pA prediction.

Abstract Image

查看原文本刊更多论文

从DNA序列、体RNA-seq和单细胞RNA-seq预测多聚腺苷酸化位点的方法综述

选择性多腺苷酸化（APA）在调节mRNA稳定性、翻译和亚细胞定位方面发挥着重要作用，并对形成真核转录组复杂性和蛋白质组多样性做出了广泛贡献。在全基因组范围内鉴定poly（A）位点（pAs）是理解APA介导的基因调控的潜在机制的关键一步。已经提出了许多已建立的计算工具来从不同的基因组数据预测pAs。在这里，我们详尽地概述了根据DNA序列、批量RNA测序（RNA-seq）数据和单细胞RNA测序（scRNA-seq）数据预测pAs的计算方法。特别是，我们使用来自外周血单核细胞的大量RNA-seq和scRNA-seq数据检查了几种具有代表性的工具，并就如何评估不同工具预测的pAs的可靠性提出了可操作的建议。我们还提出了关于选择适用于不同场景的适当方法的实用指南。此外，我们深入讨论了在提高pA预测性能和基准测试不同方法方面的挑战。此外，我们强调了使用新的机器学习和综合多组学技术的突出挑战和机遇，并就未来非3'非翻译区域、组织特异性、跨物种和单细胞pA预测的计算方法可能如何发展提供了我们的观点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genomics, Proteomics & Bioinformatics Biochemistry, Genetics and Molecular Biology-Biochemistry

CiteScore

14.30

自引率

4.20%

发文量

844

审稿时长

61 days

期刊介绍： Genomics, Proteomics and Bioinformatics (GPB) is the official journal of the Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation and Genetics Society of China. It aims to disseminate new developments in the field of omics and bioinformatics, publish high-quality discoveries quickly, and promote open access and online publication. GPB welcomes submissions in all areas of life science, biology, and biomedicine, with a focus on large data acquisition, analysis, and curation. Manuscripts covering omics and related bioinformatics topics are particularly encouraged. GPB is indexed/abstracted by PubMed/MEDLINE, PubMed Central, Scopus, BIOSIS Previews, Chemical Abstracts, CSCD, among others.