SHARK: web server for alignment-free homology assessment for intrinsically disordered and unalignable protein regions.

IF 16.6 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Chi Fung Willis Chow,Maxim Scheremetjew,HongKee Moon,Soumyadeep Ghosh,Anna Hadarovich,Lena Hersemann,Agnes Toth-Petroczy
{"title":"SHARK: web server for alignment-free homology assessment for intrinsically disordered and unalignable protein regions.","authors":"Chi Fung Willis Chow,Maxim Scheremetjew,HongKee Moon,Soumyadeep Ghosh,Anna Hadarovich,Lena Hersemann,Agnes Toth-Petroczy","doi":"10.1093/nar/gkaf408","DOIUrl":null,"url":null,"abstract":"Whereas alignment has been fundamental to sequence-based assessments of protein homology, it is ineffective for intrinsically disordered regions (IDRs) due to their lowered sequence conservation and unique sequence properties. Here, we present a web server implementation of SHARK (bio-shark.org), an alignment-free algorithm for homology classification that compares the overall amino acid composition and short regions (k-mers) shared between sequences (SHARK-scores). The output of such k-mer-based comparisons is used by SHARK-dive, a machine learning classifier to detect homology between unalignable, disordered sequences. SHARK-web provides sequence-versus-database assessment of protein sequence homology akin to conventional tools such as BLAST and HMMER. Additionally, we provide precomputed sets of IDR sequences from 16 model organism proteomes facilitating searches against species-specific IDR-omes. SHARK-dive offers superior overall homology detection performance to BLAST and HMMER, driven by a large increase in sensitivity to low sequence identity homologs, and can be used to facilitate the study of sequence-function relationships in disordered, difficult-to-align regions.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"18 1","pages":""},"PeriodicalIF":16.6000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkaf408","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Whereas alignment has been fundamental to sequence-based assessments of protein homology, it is ineffective for intrinsically disordered regions (IDRs) due to their lowered sequence conservation and unique sequence properties. Here, we present a web server implementation of SHARK (bio-shark.org), an alignment-free algorithm for homology classification that compares the overall amino acid composition and short regions (k-mers) shared between sequences (SHARK-scores). The output of such k-mer-based comparisons is used by SHARK-dive, a machine learning classifier to detect homology between unalignable, disordered sequences. SHARK-web provides sequence-versus-database assessment of protein sequence homology akin to conventional tools such as BLAST and HMMER. Additionally, we provide precomputed sets of IDR sequences from 16 model organism proteomes facilitating searches against species-specific IDR-omes. SHARK-dive offers superior overall homology detection performance to BLAST and HMMER, driven by a large increase in sensitivity to low sequence identity homologs, and can be used to facilitate the study of sequence-function relationships in disordered, difficult-to-align regions.
SHARK:用于对内在无序和不可对齐的蛋白质区域进行无比对同源性评估的web服务器。
虽然比对是基于序列的蛋白质同源性评估的基础,但由于内在无序区(idr)的序列保守性较低和序列特性独特,因此对其无效。在这里,我们提出了SHARK (bioshark.org)的web服务器实现,这是一种用于同源性分类的无比对算法,可以比较序列之间共享的总体氨基酸组成和短区域(k-mers) (SHARK-scores)。这种基于k-mer比较的输出被SHARK-dive(一种机器学习分类器)用于检测不对齐、无序序列之间的同源性。SHARK-web提供与BLAST和HMMER等传统工具类似的蛋白质序列同源性的序列与数据库评估。此外,我们还提供了来自16种模式生物蛋白质组的IDR序列的预先计算集,以促进对物种特异性IDR组的搜索。SHARK-dive对BLAST和HMMER的整体同源性检测性能优于BLAST和HMMER,对低序列同源性的敏感性大大提高,可用于促进无序、难以比对区域序列-函数关系的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nucleic Acids Research
Nucleic Acids Research 生物-生化与分子生物学
CiteScore
27.10
自引率
4.70%
发文量
1057
审稿时长
2 months
期刊介绍: Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信