Biosurfer for systematic tracking of regulatory mechanisms leading to protein isoform diversity

IF 6.2 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Mayank Murali, Jamie Saquing, Senbao Lu, Ziyang Gao, Emily F. Watts, Ben Jordan, Zachary Peters Wakefield, Ana Fiszbein, David R. Cooper, Peter J. Castaldi, Dmitry Korkin, Gloria M. Sheynkman
{"title":"Biosurfer for systematic tracking of regulatory mechanisms leading to protein isoform diversity","authors":"Mayank Murali, Jamie Saquing, Senbao Lu, Ziyang Gao, Emily F. Watts, Ben Jordan, Zachary Peters Wakefield, Ana Fiszbein, David R. Cooper, Peter J. Castaldi, Dmitry Korkin, Gloria M. Sheynkman","doi":"10.1101/gr.279317.124","DOIUrl":null,"url":null,"abstract":"Long-read RNA-seq has shed light on transcriptomic complexity, but questions remain about the functionality of downstream protein products. We introduce Biosurfer, a computational approach for comparing protein isoforms, while systematically tracking the transcriptional, splicing, and translational variations that underlie differences in the sequences of the protein products. Using Biosurfer, we analyzed the differences in 35,082 pairs of GENCODE annotated protein isoforms, finding a majority (70%) of variable N-termini are due to the alternative transcription start sites, while only 9% arise from 5′ UTR alternative splicing (AS). Biosurfer's detailed tracking of nucleotide-to-residue relationships helps reveal an uncommonly tracked source of single amino acid residue changes arising from the codon splits at junctions. For 17% of internal sequence changes, such split codon patterns lead to single residue differences, termed “ragged codons.” Of variable C-termini, 72% involve splice- or intron retention-induced reading frameshifts. We systematically characterize an unusual pattern of reading frame changes, in which the first frameshift is closely followed by a distinct second frameshift that restores the original frame, which we term a “snapback” frameshift. We analyze the long-read RNA-seq-predicted proteome of a human cell line and find similar trends as compared to our GENCODE analysis, with the exception of a higher proportion of transcripts predicted to undergo nonsense-mediated decay. Biosurfer's comprehensive characterization of long-read RNA-seq data sets should accelerate insights of the functional role of protein isoforms, providing mechanistic explanation of the origins of the proteomic diversity driven by the AS. Biosurfer is available as a Python package.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"32 1","pages":""},"PeriodicalIF":6.2000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.279317.124","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Long-read RNA-seq has shed light on transcriptomic complexity, but questions remain about the functionality of downstream protein products. We introduce Biosurfer, a computational approach for comparing protein isoforms, while systematically tracking the transcriptional, splicing, and translational variations that underlie differences in the sequences of the protein products. Using Biosurfer, we analyzed the differences in 35,082 pairs of GENCODE annotated protein isoforms, finding a majority (70%) of variable N-termini are due to the alternative transcription start sites, while only 9% arise from 5′ UTR alternative splicing (AS). Biosurfer's detailed tracking of nucleotide-to-residue relationships helps reveal an uncommonly tracked source of single amino acid residue changes arising from the codon splits at junctions. For 17% of internal sequence changes, such split codon patterns lead to single residue differences, termed “ragged codons.” Of variable C-termini, 72% involve splice- or intron retention-induced reading frameshifts. We systematically characterize an unusual pattern of reading frame changes, in which the first frameshift is closely followed by a distinct second frameshift that restores the original frame, which we term a “snapback” frameshift. We analyze the long-read RNA-seq-predicted proteome of a human cell line and find similar trends as compared to our GENCODE analysis, with the exception of a higher proportion of transcripts predicted to undergo nonsense-mediated decay. Biosurfer's comprehensive characterization of long-read RNA-seq data sets should accelerate insights of the functional role of protein isoforms, providing mechanistic explanation of the origins of the proteomic diversity driven by the AS. Biosurfer is available as a Python package.
长线程 RNA-seq 技术揭示了转录组的复杂性,但对下游蛋白质产物的功能仍存在疑问。我们介绍了 Biosurfer,这是一种比较蛋白质异构体的计算方法,同时系统地跟踪了导致蛋白质产物序列差异的转录、剪接和翻译变异。利用 Biosurfer,我们分析了 35082 对 GENCODE 注释的蛋白质异构体的差异,发现大多数(70%)可变的 N 端是由替代转录起始位点引起的,而只有 9% 是由 5′ UTR 替代剪接(AS)引起的。Biosurfer 对核苷酸与残基关系的详细跟踪有助于揭示因连接点上的密码子分裂而引起的单氨基酸残基变化这一罕见的跟踪来源。在 17% 的内部序列变化中,这种密码子分裂模式导致了单个残基差异,被称为 "粗糙密码子"。在可变的 C 端,72% 涉及剪接或内含子保留引起的阅读框架转换。我们系统地描述了一种不寻常的阅读框架变化模式,在这种模式中,第一次框架转换之后紧接着是明显的第二次框架转换,这种转换恢复了原来的框架,我们称之为 "快回 "框架转换。我们分析了一个人类细胞系的长读取 RNA-seq 预测蛋白质组,发现与我们的 GENCODE 分析有类似的趋势,但预测发生无义介导衰变的转录本比例更高。Biosurfer对长读程RNA-seq数据集的全面鉴定将加速人们对蛋白质同工酶功能作用的深入了解,为AS驱动的蛋白质组多样性的起源提供机理解释。Biosurfer 是一个 Python 软件包。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Genome research
Genome research 生物-生化与分子生物学
CiteScore
12.40
自引率
1.40%
发文量
140
审稿时长
6 months
期刊介绍: Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine. Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies. New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信