Deep Learning-based MSMS Spectra Reduction in Support of Running Multiple Protein Search Engines on Cloud.

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine Pub Date : 2017-11-01 Epub Date: 2017-12-18 DOI:10.1109/bibm.2017.8217951

Majdi Maabreh, Basheer Qolomany, Izzat Alsmadi, Ajay Gupta

{"title":"Deep Learning-based MSMS Spectra Reduction in Support of Running Multiple Protein Search Engines on Cloud.","authors":"Majdi Maabreh, Basheer Qolomany, Izzat Alsmadi, Ajay Gupta","doi":"10.1109/bibm.2017.8217951","DOIUrl":null,"url":null,"abstract":"<p><p>The diversity of the available protein search engines with respect to the utilized matching algorithms, the low overlap ratios among their results and the disparity of their coverage encourage the community of proteomics to utilize ensemble solutions of different search engines. The advancing in cloud computing technology and the availability of distributed processing clusters can also provide support to this task. However, data transferring and results' combining, in this case, could be the major bottleneck. The flood of billions of observed mass spectra, hundreds of Gigabytes or potentially Terabytes of data, could easily cause the congestions, increase the risk of failure, poor performance, add more computations' cost, and waste available resources. Therefore, in this study, we propose a deep learning model in order to mitigate the traffic over cloud network and, thus reduce the cost of cloud computing. The model, which depends on the top 50 intensities and their m/z values of each spectrum, removes any spectrum which is predicted not to pass the majority voting of the participated search engines. Our results using three search engines namely: pFind, Comet and X!Tandem, and four different datasets are promising and promote the investment in deep learning to solve such type of Big data problems.</p>","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2017 ","pages":"1909-1914"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8382039/pdf/nihms-1728667.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/bibm.2017.8217951","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2017/12/18 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The diversity of the available protein search engines with respect to the utilized matching algorithms, the low overlap ratios among their results and the disparity of their coverage encourage the community of proteomics to utilize ensemble solutions of different search engines. The advancing in cloud computing technology and the availability of distributed processing clusters can also provide support to this task. However, data transferring and results' combining, in this case, could be the major bottleneck. The flood of billions of observed mass spectra, hundreds of Gigabytes or potentially Terabytes of data, could easily cause the congestions, increase the risk of failure, poor performance, add more computations' cost, and waste available resources. Therefore, in this study, we propose a deep learning model in order to mitigate the traffic over cloud network and, thus reduce the cost of cloud computing. The model, which depends on the top 50 intensities and their m/z values of each spectrum, removes any spectrum which is predicted not to pass the majority voting of the participated search engines. Our results using three search engines namely: pFind, Comet and X!Tandem, and four different datasets are promising and promote the investment in deep learning to solve such type of Big data problems.

Abstract Image

查看原文本刊更多论文

基于深度学习的 MSMS 光谱缩减，支持在云上运行多个蛋白质搜索引擎。

现有的蛋白质搜索引擎在所使用的匹配算法方面存在多样性，其结果的重叠率较低，覆盖范围也不尽相同，这促使蛋白质组学界利用不同搜索引擎的集合解决方案。云计算技术的发展和分布式处理集群的可用性也为这项任务提供了支持。然而，在这种情况下，数据传输和结果合并可能是主要瓶颈。数十亿条观测到的质谱数据、数百 GB 甚至数 TB 的数据洪流很容易造成拥塞，增加故障风险，降低性能，增加计算成本，浪费可用资源。因此，在本研究中，我们提出了一种深度学习模型，以减轻云网络的流量，从而降低云计算的成本。该模型依赖于每个频谱的前 50 个强度及其 m/z 值，删除任何预测不会通过参与搜索引擎多数投票的频谱。我们使用三个搜索引擎（即 pFind、Comet 和 X！Tandem）和四个不同的数据集得出的结果很有前景，促进了对深度学习的投资，以解决此类大数据问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

自引率

0.00%

发文量