{"title":"Chapter 4. PSM Scoring and Validation","authors":"James C. Wright, J. Choudhary","doi":"10.1039/9781782626732-00069","DOIUrl":"https://doi.org/10.1039/9781782626732-00069","url":null,"abstract":"Identification and quantification of proteins by shotgun proteomics experiments is underpinned by the use of accurate masses and fragmentation patterns generated by tandem mass spectrometry. Assigning peptide sequences to tandem MS data is supported by a plethora of informatics tools. The majority of spectral identification software report arbitrary fitness scores reflecting the quality of a match, however, valid statistical metrics must be used to make sense of these scores and attribute a confidence to the peptide identifications. Accurately estimating the error and devising filtering routines to minimise incorrect and random identifications is essential for making valid and reproducible conclusions about the biology of the sample being analysed. This chapter discusses the statistical approaches used to evaluate and validate shotgun proteomics peptide to spectrum matches and provides a summary of software available for this purpose.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126129561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chapter 1. Introduction to Proteome Informatics","authors":"C. Bessant","doi":"10.1039/9781782626732-00001","DOIUrl":"https://doi.org/10.1039/9781782626732-00001","url":null,"abstract":"At its core, proteomics can be defined as the branch of analytical science concerned with identifying and, ideally, quantifying every protein within a complex biological sample. This chapter provides a high level overview of this field and the key technologies that underpin it, as a primer for the chapters that follow. It also introduces the field of proteome informatics, and explains why it is an integral part of any proteomics experiment.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"553 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123101516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chapter 10. Data Analysis for Data Independent Acquisition","authors":"Pedro Navarro, Marco Trevisan-Herraz, H. Röst","doi":"10.1039/9781782626732-00200","DOIUrl":"https://doi.org/10.1039/9781782626732-00200","url":null,"abstract":"Mass spectrometry-based proteomics using soft ionization techniques has been used successfully to identify large numbers of proteins from complex biological samples. However, reproducible quantification across a large number of samples is still highly challenging with commonly used “shotgun proteomics” which uses stochastic sampling of the peptide analytes (data dependent acquisition; DDA) to analyze samples. Recently, data independent acquisition (DIA) methods have been investigated for their potential for reproducible protein quantification, since they deterministically sample all peptide analytes in every single run. This increases reproducibility and sensitivity, reduces the number of missing values and removes stochasticity from the acquisition process. However, one of the major challenges for wider adoption of DIA has been data analysis. In this chapter we will introduce the five most well-known of these techniques, as well as their data analysis methods, classified either as targeted or untargeted; then, we will discuss briefly the meaning of the false discovery rate (FDR) in DIA experiments, to finally close the chapter with a review of the current challenges in this subject.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114191516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chapter 8. MS2-Based Quantitation","authors":"Marc Vaudel","doi":"10.1039/9781782626732-00155","DOIUrl":"https://doi.org/10.1039/9781782626732-00155","url":null,"abstract":"MS2-based protein quantification techniques refer to tandem mass spectrometry based quantification of proteins relying on fragment ion spectra of peptides. The two main representatives of this class of quantification techniques are spectrum counting, and reporter ion based quantification. They are both widely used in proteomics, appreciated for the simplicity and swiftness of their execution. As a result, most proteome bioinformatics suites include MS2-based protein quantification modules. In this chapter, the principles of these quantification techniques are introduced, different bioinformatic implementations are presented, and a use case is demonstrated using free open source solutions. Finally, the main pitfalls of the data processing are discussed and the performance of these techniques critically evaluated. This chapter is thus a good starting point for scientists wanting to easily and critically conduct MS2-based protein quantification.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114164987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chapter 2. De novo Peptide Sequencing","authors":"B. Ma","doi":"10.1039/9781782626732-00015","DOIUrl":"https://doi.org/10.1039/9781782626732-00015","url":null,"abstract":"De novo peptide sequencing refers to the process of determining a peptide’s amino acid sequence from its MS/MS spectrum alone. The principle of this process is fairly straightforward: a high-quality spectrum may present a ladder of fragment ion peaks. The mass difference between every two adjacent peaks in the ladder is used to determine a residue of the peptide. However, most practical spectra do not have sufficient quality to support this straightforward process. Therefore, research in de novo sequencing has largely been a battle against the errors in the data. This chapter reviews some of the major developments in this field. The chapter starts with a quick review of the history in Section 1. Then manual de novo sequencing is examined in Section 2. Section 3 introduces a few commonly used de novo sequencing algorithms. An important aspect of automated de novo sequencing software is a good scoring function that serves as the optimization goal of the algorithm. Thus, Section 4 is devoted for the methods to define good scoring functions. Section 5 reviews a list of relevant software. The chapter concludes with a discussion of the applications and limitations of de novosequencing in Section 6.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127994803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanqing Liao, Alexander Phillips, A. Jankevics, A. Dowsey
{"title":"Chapter 7. Algorithms for MS1-Based Quantitation","authors":"Hanqing Liao, Alexander Phillips, A. Jankevics, A. Dowsey","doi":"10.1039/9781782626732-00133","DOIUrl":"https://doi.org/10.1039/9781782626732-00133","url":null,"abstract":"MS1-based quantitation is performed by direct integration of peptide precursor signal intensity from the MS1 spectra across retention time, based on the assumption that these signals have a linear relationship with abundance across a relatively wide dynamic range. Since ionisation efficiency varies between peptides, only relative abundance changes between biological samples are usually established. Whether each sample is run individually ‘label-free’, or two or three samples multiplexed within each run by a MS1-based labelling technique such as stable isotope labeling by amino acids in cell culture (SILAC), the informatics methods involved are broadly similar. In this chapter we present the key components of such pipelines, including the detection and quantitation of peptide features from the raw data, alignment of chromatographic variations between runs so that corresponding features can be matched, intensity normalisation to correct sample-loading differences and ionisation fluctuations, and methods to combine the peptide-level quantifications for the statistical analysis of differential protein expression across treatment groups. At each stage, the techniques have been designed for robustness against the systematic and random variation inherent in MS data, and errors during the preceding parts of the pipeline.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130125080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}