SHARK: web server for alignment-free homology assessment for intrinsically disordered and unalignable protein regions.

IF 16.6 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Nucleic Acids Research Pub Date : 2025-05-21 DOI:10.1093/nar/gkaf408

Chi Fung Willis Chow,Maxim Scheremetjew,HongKee Moon,Soumyadeep Ghosh,Anna Hadarovich,Lena Hersemann,Agnes Toth-Petroczy

{"title":"SHARK: web server for alignment-free homology assessment for intrinsically disordered and unalignable protein regions.","authors":"Chi Fung Willis Chow,Maxim Scheremetjew,HongKee Moon,Soumyadeep Ghosh,Anna Hadarovich,Lena Hersemann,Agnes Toth-Petroczy","doi":"10.1093/nar/gkaf408","DOIUrl":null,"url":null,"abstract":"Whereas alignment has been fundamental to sequence-based assessments of protein homology, it is ineffective for intrinsically disordered regions (IDRs) due to their lowered sequence conservation and unique sequence properties. Here, we present a web server implementation of SHARK (bio-shark.org), an alignment-free algorithm for homology classification that compares the overall amino acid composition and short regions (k-mers) shared between sequences (SHARK-scores). The output of such k-mer-based comparisons is used by SHARK-dive, a machine learning classifier to detect homology between unalignable, disordered sequences. SHARK-web provides sequence-versus-database assessment of protein sequence homology akin to conventional tools such as BLAST and HMMER. Additionally, we provide precomputed sets of IDR sequences from 16 model organism proteomes facilitating searches against species-specific IDR-omes. SHARK-dive offers superior overall homology detection performance to BLAST and HMMER, driven by a large increase in sensitivity to low sequence identity homologs, and can be used to facilitate the study of sequence-function relationships in disordered, difficult-to-align regions.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"18 1","pages":""},"PeriodicalIF":16.6000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkaf408","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Whereas alignment has been fundamental to sequence-based assessments of protein homology, it is ineffective for intrinsically disordered regions (IDRs) due to their lowered sequence conservation and unique sequence properties. Here, we present a web server implementation of SHARK (bio-shark.org), an alignment-free algorithm for homology classification that compares the overall amino acid composition and short regions (k-mers) shared between sequences (SHARK-scores). The output of such k-mer-based comparisons is used by SHARK-dive, a machine learning classifier to detect homology between unalignable, disordered sequences. SHARK-web provides sequence-versus-database assessment of protein sequence homology akin to conventional tools such as BLAST and HMMER. Additionally, we provide precomputed sets of IDR sequences from 16 model organism proteomes facilitating searches against species-specific IDR-omes. SHARK-dive offers superior overall homology detection performance to BLAST and HMMER, driven by a large increase in sensitivity to low sequence identity homologs, and can be used to facilitate the study of sequence-function relationships in disordered, difficult-to-align regions.

查看原文本刊更多论文

SHARK：用于对内在无序和不可对齐的蛋白质区域进行无比对同源性评估的web服务器。

虽然比对是基于序列的蛋白质同源性评估的基础，但由于内在无序区（idr）的序列保守性较低和序列特性独特，因此对其无效。在这里，我们提出了SHARK （bioshark.org）的web服务器实现，这是一种用于同源性分类的无比对算法，可以比较序列之间共享的总体氨基酸组成和短区域（k-mers）（SHARK-scores）。这种基于k-mer比较的输出被SHARK-dive（一种机器学习分类器）用于检测不对齐、无序序列之间的同源性。SHARK-web提供与BLAST和HMMER等传统工具类似的蛋白质序列同源性的序列与数据库评估。此外，我们还提供了来自16种模式生物蛋白质组的IDR序列的预先计算集，以促进对物种特异性IDR组的搜索。SHARK-dive对BLAST和HMMER的整体同源性检测性能优于BLAST和HMMER，对低序列同源性的敏感性大大提高，可用于促进无序、难以比对区域序列-函数关系的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Nucleic Acids Research 生物-生化与分子生物学

CiteScore

27.10

自引率

4.70%

发文量

1057

审稿时长

2 months

期刊介绍： Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.