A generalized protein identification method for novel and diverse sequencing technologies.

IF 4 Q1 GENETICS & HEREDITY
NAR Genomics and Bioinformatics Pub Date : 2024-09-18 eCollection Date: 2024-09-01 DOI:10.1093/nargab/lqae126
Bikash Kumar Bhandari, Nick Goldman
{"title":"A generalized protein identification method for novel and diverse sequencing technologies.","authors":"Bikash Kumar Bhandari, Nick Goldman","doi":"10.1093/nargab/lqae126","DOIUrl":null,"url":null,"abstract":"<p><p>Protein sequencing is a rapidly evolving field with much progress towards the realization of a new generation of protein sequencers. The early devices, however, may not be able to reliably discriminate all 20 amino acids, resulting in a partial, noisy and possibly error-prone signature of a protein. Rather than achieving <i>de novo</i> sequencing, these devices may aim to identify target proteins by comparing such signatures to databases of known proteins. However, there are no broadly applicable methods for this identification problem. Here, we devise a hidden Markov model method to study the generalized problem of protein identification from noisy signature data. Based on a hypothetical sequencing device that can simulate several novel technologies, we show that on the human protein database (<i>N</i> = 20 181) our method has a good performance under many different operating conditions such as various levels of signal resolvability, different numbers of discriminated amino acids, sequence fragments, and insertion and deletion error rates. Our results demonstrate the possibility of protein identification with high accuracy on many early experimental devices. We anticipate our method to be applicable for a wide range of protein sequencing devices in the future.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 3","pages":"lqae126"},"PeriodicalIF":4.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11409062/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAR Genomics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/nargab/lqae126","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Protein sequencing is a rapidly evolving field with much progress towards the realization of a new generation of protein sequencers. The early devices, however, may not be able to reliably discriminate all 20 amino acids, resulting in a partial, noisy and possibly error-prone signature of a protein. Rather than achieving de novo sequencing, these devices may aim to identify target proteins by comparing such signatures to databases of known proteins. However, there are no broadly applicable methods for this identification problem. Here, we devise a hidden Markov model method to study the generalized problem of protein identification from noisy signature data. Based on a hypothetical sequencing device that can simulate several novel technologies, we show that on the human protein database (N = 20 181) our method has a good performance under many different operating conditions such as various levels of signal resolvability, different numbers of discriminated amino acids, sequence fragments, and insertion and deletion error rates. Our results demonstrate the possibility of protein identification with high accuracy on many early experimental devices. We anticipate our method to be applicable for a wide range of protein sequencing devices in the future.

适用于新型和多样化测序技术的通用蛋白质识别方法。
蛋白质测序是一个快速发展的领域,在实现新一代蛋白质测序仪方面取得了很大进展。然而,早期的设备可能无法可靠地辨别全部 20 个氨基酸,从而产生部分的、嘈杂的、可能容易出错的蛋白质特征。这些设备的目标可能不是实现从头测序,而是通过将这些特征与已知蛋白质数据库进行比较来识别目标蛋白质。然而,目前还没有广泛适用于这一识别问题的方法。在这里,我们设计了一种隐马尔可夫模型方法来研究从嘈杂的特征数据中识别蛋白质的一般问题。基于一个可以模拟多种新技术的假定测序设备,我们证明了在人类蛋白质数据库(N = 20 181)中,我们的方法在多种不同的操作条件下具有良好的性能,如不同水平的信号解析度、不同数量的被鉴别氨基酸、序列片段以及插入和删除错误率。我们的研究结果表明,在许多早期实验设备上都可以高精度地识别蛋白质。我们预计,我们的方法未来将适用于各种蛋白质测序设备。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.00
自引率
2.20%
发文量
95
审稿时长
15 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信