{"title":"Chapter 14. R for Proteomics","authors":"L. Breckels, Sebastian Gibb, V. Petyuk, L. Gatto","doi":"10.1039/9781782626732-00321","DOIUrl":"https://doi.org/10.1039/9781782626732-00321","url":null,"abstract":"In this chapter, we introduce some R and Bioconductor software to process, analyse and interpret mass spectrometry and proteomics data. We describe how to programmatically access data, how to read various data formats into R, we review the existing infrastructure to reliably identify peptide-spectrum matches, describe how to analyse and process quantitative data, review MALDI and imaging mass spectrometry using Bioconductor packages and conclude with an overview of statistical and machine learning software applicable to proteomics data. All the use cases are accompanied by executable example code and further reproducible examples are provided in the companion RforProteomics package.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126617519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Schilling, B. MacLean, Jason M. Held, B. Gibson
{"title":"Chapter 9. Informatics Solutions for Selected Reaction Monitoring","authors":"B. Schilling, B. MacLean, Jason M. Held, B. Gibson","doi":"10.1039/9781782626732-00178","DOIUrl":"https://doi.org/10.1039/9781782626732-00178","url":null,"abstract":"Informatics solutions for SRM assays pose several specific bioinformatics challenges including assay development, generating acquisition methods, and data processing. Furthermore, SRM is often coupled to experimental designs using stable isotope dilution SRM mass spectrometry workflows (SID-SRM-MS) that utilize one or more stable isotope versions of the analyte as internal standards. Skyline, an open-source software suite of tools for targeted proteomics, has emerged as the most widely used platform for SRM-specific assays. Skyline is a freely-available, comprehensive tool with high versatility for SRM assay development and subsequent processing of data acquired on triple quadrupole mass spectrometers. Skyline can be used for peptide and transition selection, assay optimization, retention time scheduling, SRM instrument method export, peak detection/integration, post-acquisition signal processing, and integration with statistical tools and algorithms to generate quantitative results for peptides and proteins. To highlight some of the Skyline SRM functionalities, we describe features including important visual displays and statistical tools, including ‘External Tools’. We discuss Skyline features that are particularly valuable for system suitability assessments, as well as for data sets with posttranslational modifications. Finally, an easy, point-and-click strategy is presented that supports dissemination of SRM data processed in Skyline to the Panorama web data repositories.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129229349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chapter 3. Peptide Spectrum Matching via Database Search and Spectral Library Search","authors":"Brian Netzel, S. Dasari","doi":"10.1039/9781782626732-00039","DOIUrl":"https://doi.org/10.1039/9781782626732-00039","url":null,"abstract":"High-throughput shotgun proteomics is the mainstay of protein identification in biological samples. Efficient proteomic analysis requires streamlined and accurate workflows for protein identification. Database searching has been the most basic and reliable workflow for identifying the peptides and proteins that are present in the sample. This method derives peptides from a list of protein sequences and matches them against the experimental MS2 spectra. The resulting peptide spectrum matches are scored to quantify their goodness of fit. Spectral library searching has been recently developed as a fast, and viable, alternative to sequence database searching. This method attempts to identify the peptides by matching their corresponding experimental MS2 spectra to a library of curated MS2 peptide spectra. Each method has its own merit and application in the proteomics field. This chapter aims to highlight the foundations of peptide spectrum matching via protein sequence database and spectral library searching.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121688181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chapter 12. OpenMS: A Modular, Open-Source Workflow System for the Analysis of Quantitative Proteomics Data","authors":"L. Nilse","doi":"10.1039/9781782626732-00259","DOIUrl":"https://doi.org/10.1039/9781782626732-00259","url":null,"abstract":"OpenMS is a software framework for the analysis and visualisation of proteomics data. It consists of over 100 individual tools which can be combined to simple or more complex analysis workflows. The tools are based on a well-documented, open-source C++ library that can also be accessed via a Python interface. Besides these tools, OpenMS provides wrappers for many popular external software solutions such as search engines and protein inference algorithms. The workflows can be run on simple desktop computers as well as powerful computing clusters. In this chapter, we will discuss four workflows of increasing complexity and thereby introduce new users to the basic concepts of OpenMS.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131545413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chapter 5. Protein Inference and Grouping","authors":"A. Jones","doi":"10.1039/9781782626732-00093","DOIUrl":"https://doi.org/10.1039/9781782626732-00093","url":null,"abstract":"A key process in many proteomics workflows is the identification of proteins, following analysis of tandem MS (MS/MS) spectra, for example by a database search. The core unit of identification from a database search is the identification of peptides, yet most researchers wish to know which proteins have been confidently identified in their samples. As such, following peptide identification, a second stage of data analysis is performed, either internally in the search engine or in a second package, called protein inference. Protein inference is challenging in the common case that proteins have been digested into peptides early in the proteomics workflow, and thus there is no direct link between a peptide and its parent protein. Many peptides could theoretically have been derived from more than one protein in the database searched, and thus it is not straightforward to determine which is the correct assignment. A variety of algorithms and implementations have been developed, which are reviewed in this chapter. Most approaches now report “protein groups” as a the core unit of identification from protein inference, since it is common for more than one database protein to share the same-set of evidence, and thus be indistinguishable. The chapter also describes scoring and statistical values that can be assigned during the protein identification process, to give confidence in the resulting values.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133195277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chapter 15. Proteogenomics: Proteomics for Genome Annotation","authors":"F. Ghali, A. Jones","doi":"10.1039/9781782626732-00365","DOIUrl":"https://doi.org/10.1039/9781782626732-00365","url":null,"abstract":"One of major bottlenecks in omics biology is the generation of accurate gene models, including correct calling of the start codon, splicing of introns (taking account of alternative splicing), and the stop codon – collectively called genome annotation. Current genome annotation approaches for newly sequenced genomes are generally based on automated or semi-automated methods, usually involving gene finding software to look for intrinsic gene-like signatures (motifs) in the DNA sequence, the propagation of annotations from other (more well annotated) related species, and the mapping of experimental data sets, particularly from RNA Sequencing (RNA-Seq). Large scale proteomics data can also play an important role for confirming and correcting gene models. While proteomics approaches tend not to have the same level of sensitivity as RNA-Seq, they have the advantage that they can provide evidence that a predicted gene/transcript is indeed protein-coding. The use of proteomics data for genome annotation is called proteogenomics, and forms the basis for this chapter. We describe the theoretical underpinnings, different software packages that have been developed for proteogenomics, statistical approaches for validating the evidence, and support for proteogenomics data in file formats, standards and databases.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130957193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chapter 11. Data Formats of the Proteomics Standards Initiative","authors":"J. Vizcaíno, S. Perkins, A. Jones, E. Deutsch","doi":"10.1039/9781782626732-00229","DOIUrl":"https://doi.org/10.1039/9781782626732-00229","url":null,"abstract":"The existence and adoption of data standards in computational proteomics, as in any other field, is generally perceived to be crucial for the further development of the discipline. We here give an up-to-date overview of the open standard data formats that have been developed under the umbrella of the Proteomics Standards Initiative (PSI). We will focus in those formats related to mass spectrometry (MS). Most of them are based in XML (Extensible Markup Language) schemas: mzML (for primary MS data, the output of mass spectrometers), mzIdentML (for peptide and protein identification data), mzQuantML (for peptide and protein quantification data) and TraML (for reporting transition lists for selected reaction monitoring approaches). In addition, mzTab was developed as a simpler tab-delimited file to support peptide, protein and small molecule identification and quantification data in the same file. In all cases, we will explain the main characteristics of each format, describe the main existing software implementations and give an update of the ongoing work to extend the formats to support new use cases. Additionally, we will discuss other data formats that have been inspired by the PSI formats. Finally, other PSI data standard formats (not MS related) will be also outlined in brief.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132035986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Candace R. Guerrero, P. Jagtap, James E. Johnson, T. Griffin
{"title":"Chapter 13. Using Galaxy for Proteomics","authors":"Candace R. Guerrero, P. Jagtap, James E. Johnson, T. Griffin","doi":"10.1039/9781782626732-00289","DOIUrl":"https://doi.org/10.1039/9781782626732-00289","url":null,"abstract":"The area of informatics for mass spectrometry (MS)-based proteomics data has steadily grown over the last two decades. Numerous, effective software programs now exist for various aspects of proteomic informatics. However, many researchers still have difficulties in using these software. These difficulties arise from problems with running and integrating disparate software programs, scalability issues when dealing with large data volumes, and lack of ability to share and reproduce workflows comprised of different software. The Galaxy framework for bioinformatics provides an attractive option for solving many of these current issues in proteomic informatics. Originally developed as a workbench to enable genomic data analysis, numerous researchers are now turning to Galaxy to implement software for MS-based proteomics applications. Here, we provide an introduction to Galaxy and its features, and describe how software tools are deployed, published and shared via the scalable framework. We also describe some of the existing tools in Galaxy for basic MS-based proteomics data analysis and informatics. Finally, we describe how proteomics tools in Galaxy can be combined with other existing tools for genomic and transcriptomic data analysis to enable powerful multi-omic data analysis applications.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123379393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chapter 16. Proteomics Informed by Transcriptomics","authors":"Shyamasree Saha, D. Matthews, C. Bessant","doi":"10.1039/9781782626732-00385","DOIUrl":"https://doi.org/10.1039/9781782626732-00385","url":null,"abstract":"The choice of protein sequence database used for peptide spectrum matching has a major impact on the extent and significance of protein identifications obtained in a given experiment. Finding a suitable database can be a major challenge, particularly when working with non-model organisms and complex samples containing proteins from multiple species. This chapter introduces the proteomics informed by transcriptomics (PIT) methodology, in which RNA-seq transcriptomics is used to generate a sample-specific protein database against which proteomic mass spectra can be searched. This approach extends the application of proteomics to studies in which it was not previously tractable, and is well suited to the discovery of novel translated genomic elements.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"25 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120902684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CHAPTER 6. Identification and Localization of Post-Translational Modifications by High-Resolution Mass Spectrometry","authors":"R. Matthiesen, A. S. Carvalho","doi":"10.1039/9781782626732-00116","DOIUrl":"https://doi.org/10.1039/9781782626732-00116","url":null,"abstract":"Cells either in response to stimulus or in homeostasis require dynamic signaling through alterations in protein composition. Identification and temporospatial profiling of post translational modifications constitutes one of the most challenging tasks in biology. These challenges comprise both experimental and computational aspects. From the computational point of view identification of post translational modifications by mass spectrometry analysis frequently leads to algorithms with exponential complexity which in practice is approached by algorithms with lower complexity. Regulation of post translational modifications has been implicated in a number of diseases such as cancer, neurodegenerative diseases and metabolic diseases. Furthermore, some post translational modifications are considered as biomarkers and surrogate markers. Consequently, there is a high interest in methodologies that can identify and quantify post translational modifications. We found few papers addressing the issue of which modifications should be considered in a standard database dependent search of MS data for protein analysis. Furthermore, the few papers on the topic are from a time where MS instruments with high precision in both MS and MS/MS were not available. Therefore, based on literature search and extensive analysis we provide recommendations on post translational modifications to be included in mass spectrometry database searches of MS data with high precision in both MS and MS/MS (e.g. <5 ppm).","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130812816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}