Deep Learning-based MSMS Spectra Reduction in Support of Running Multiple Protein Search Engines on Cloud.

Majdi Maabreh, Basheer Qolomany, Izzat Alsmadi, Ajay Gupta
{"title":"Deep Learning-based MSMS Spectra Reduction in Support of Running Multiple Protein Search Engines on Cloud.","authors":"Majdi Maabreh, Basheer Qolomany, Izzat Alsmadi, Ajay Gupta","doi":"10.1109/bibm.2017.8217951","DOIUrl":null,"url":null,"abstract":"<p><p>The diversity of the available protein search engines with respect to the utilized matching algorithms, the low overlap ratios among their results and the disparity of their coverage encourage the community of proteomics to utilize ensemble solutions of different search engines. The advancing in cloud computing technology and the availability of distributed processing clusters can also provide support to this task. However, data transferring and results' combining, in this case, could be the major bottleneck. The flood of billions of observed mass spectra, hundreds of Gigabytes or potentially Terabytes of data, could easily cause the congestions, increase the risk of failure, poor performance, add more computations' cost, and waste available resources. Therefore, in this study, we propose a deep learning model in order to mitigate the traffic over cloud network and, thus reduce the cost of cloud computing. The model, which depends on the top 50 intensities and their m/z values of each spectrum, removes any spectrum which is predicted not to pass the majority voting of the participated search engines. Our results using three search engines namely: pFind, Comet and X!Tandem, and four different datasets are promising and promote the investment in deep learning to solve such type of Big data problems.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1909-1914"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8382039/pdf/nihms-1728667.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/bibm.2017.8217951","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2017/12/18 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The diversity of the available protein search engines with respect to the utilized matching algorithms, the low overlap ratios among their results and the disparity of their coverage encourage the community of proteomics to utilize ensemble solutions of different search engines. The advancing in cloud computing technology and the availability of distributed processing clusters can also provide support to this task. However, data transferring and results' combining, in this case, could be the major bottleneck. The flood of billions of observed mass spectra, hundreds of Gigabytes or potentially Terabytes of data, could easily cause the congestions, increase the risk of failure, poor performance, add more computations' cost, and waste available resources. Therefore, in this study, we propose a deep learning model in order to mitigate the traffic over cloud network and, thus reduce the cost of cloud computing. The model, which depends on the top 50 intensities and their m/z values of each spectrum, removes any spectrum which is predicted not to pass the majority voting of the participated search engines. Our results using three search engines namely: pFind, Comet and X!Tandem, and four different datasets are promising and promote the investment in deep learning to solve such type of Big data problems.

Abstract Image

Abstract Image

Abstract Image

基于深度学习的 MSMS 光谱缩减,支持在云上运行多个蛋白质搜索引擎。
现有的蛋白质搜索引擎在所使用的匹配算法方面存在多样性,其结果的重叠率较低,覆盖范围也不尽相同,这促使蛋白质组学界利用不同搜索引擎的集合解决方案。云计算技术的发展和分布式处理集群的可用性也为这项任务提供了支持。然而,在这种情况下,数据传输和结果合并可能是主要瓶颈。数十亿条观测到的质谱数据、数百 GB 甚至数 TB 的数据洪流很容易造成拥塞,增加故障风险,降低性能,增加计算成本,浪费可用资源。因此,在本研究中,我们提出了一种深度学习模型,以减轻云网络的流量,从而降低云计算的成本。该模型依赖于每个频谱的前 50 个强度及其 m/z 值,删除任何预测不会通过参与搜索引擎多数投票的频谱。我们使用三个搜索引擎(即 pFind、Comet 和 X!Tandem)和四个不同的数据集得出的结果很有前景,促进了对深度学习的投资,以解决此类大数据问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信