Pairwise Statistical Significance of Local Sequence Alignment Using Substitution Matrices with Sequence-Pair-Specific Distance

Ankit Agrawal, Xiaoqiu Huang
{"title":"Pairwise Statistical Significance of Local Sequence Alignment Using Substitution Matrices with Sequence-Pair-Specific Distance","authors":"Ankit Agrawal, Xiaoqiu Huang","doi":"10.1109/ICIT.2008.63","DOIUrl":null,"url":null,"abstract":"Pairwise sequence alignment forms the basis of numerous other applications in bioinformatics. The quality of an alignment is gauged by statistical significance rather than by alignment score alone. Therefore, accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, it was shown that pairwise statistical significance does better in practice than database statistical significance, and also provides quicker individual pairwise estimates of statistical significance without having to perform time-consuming database search. Under an evolutionary model, a substitution matrix can be derived using a rate matrix and a fixed distance. Although the commonly used substitution matrices like BLOSUM62, etc. were not originally derived from a rate matrix under an evolutionary model, the corresponding rate matrices can be back calculated. Many researchers have derived different rate matrices using different methods and data. In this paper, we show that pairwise statistical significance using rate matrices with sequence-pair-specific distance performs significantly better compared to using a fixed distance. Pairwise statistical significance using sequence-pair-specific distanced substitution matrices also outperforms database statistical significance reported by BLAST.","PeriodicalId":184201,"journal":{"name":"2008 International Conference on Information Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Conference on Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIT.2008.63","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Pairwise sequence alignment forms the basis of numerous other applications in bioinformatics. The quality of an alignment is gauged by statistical significance rather than by alignment score alone. Therefore, accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, it was shown that pairwise statistical significance does better in practice than database statistical significance, and also provides quicker individual pairwise estimates of statistical significance without having to perform time-consuming database search. Under an evolutionary model, a substitution matrix can be derived using a rate matrix and a fixed distance. Although the commonly used substitution matrices like BLOSUM62, etc. were not originally derived from a rate matrix under an evolutionary model, the corresponding rate matrices can be back calculated. Many researchers have derived different rate matrices using different methods and data. In this paper, we show that pairwise statistical significance using rate matrices with sequence-pair-specific distance performs significantly better compared to using a fixed distance. Pairwise statistical significance using sequence-pair-specific distanced substitution matrices also outperforms database statistical significance reported by BLAST.
利用序列对特定距离替代矩阵分析局部序列比对的两两统计显著性
成对序列比对构成了生物信息学中许多其他应用的基础。校准的质量是通过统计显著性来衡量的,而不仅仅是校准分数。因此,准确估计两两比对的统计显著性是序列比较中的一个重要问题。最近,研究表明,在实践中,两两统计显著性比数据库统计显著性表现得更好,并且还提供了更快的个体两两统计显著性估计,而无需执行耗时的数据库搜索。在进化模型下,可以用速率矩阵和固定距离推导出替换矩阵。虽然BLOSUM62等常用的替代矩阵最初不是由进化模型下的速率矩阵推导出来的,但是相应的速率矩阵是可以反求的。许多研究者使用不同的方法和数据推导出了不同的速率矩阵。在本文中,我们证明了使用具有序列对特定距离的速率矩阵的两两统计显著性比使用固定距离的两两统计显著性表现得更好。使用序列对特定距离替代矩阵的两两统计显著性也优于BLAST报告的数据库统计显著性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信