{"title":"Traversing the <i>k</i>-mer Landscape of NGS Read Datasets for Quality Score Sparsification.","authors":"Y William Yu, Deniz Yorukoglu, Bonnie Berger","doi":"10.1007/978-3-319-05269-4_31","DOIUrl":"https://doi.org/10.1007/978-3-319-05269-4_31","url":null,"abstract":"<p><p>It is becoming increasingly impractical to indefinitely store raw sequencing data for later processing in an uncompressed state. In this paper, we describe a scalable compressive framework, Read-Quality-Sparsifier (RQS), which substantially outperforms the compression ratio and speed of other de novo quality score compression methods while maintaining SNP-calling accuracy. Surprisingly, RQS also improves the SNP-calling accuracy on a gold-standard, real-life sequencing dataset (NA12878) using a <i>k</i>-mer density profile constructed from 77 other individuals from the 1000 Genomes Project. This improvement in downstream accuracy emerges from the observation that quality score values within NGS datasets are inherently encoded in the <i>k</i>-mer landscape of the genomic sequences. To our knowledge, RQS is the first scalable sequence based quality compression method that can efficiently compress quality scores of terabyte-sized and larger sequencing datasets.</p><p><strong>Availability: </strong>An implementation of our method, RQS, is available for download at: http://rqs.csail.mit.edu/.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"8394 ","pages":"385-399"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-05269-4_31","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35427238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jan Hoinka, Alexey Berezhnoy, Zuben E Sauna, Eli Gilboa, Teresa M Przytycka
{"title":"AptaCluster - A Method to Cluster HT-SELEX Aptamer Pools and Lessons from its Application.","authors":"Jan Hoinka, Alexey Berezhnoy, Zuben E Sauna, Eli Gilboa, Teresa M Przytycka","doi":"10.1007/978-3-319-05269-4_9","DOIUrl":"https://doi.org/10.1007/978-3-319-05269-4_9","url":null,"abstract":"<p><p>Systematic Evolution of Ligands by EXponential Enrichment (SELEX) is a well established experimental procedure to identify aptamers - synthetic single-stranded (ribo)nucleic molecules that bind to a given molecular target. Recently, new sequencing technologies have revolutionized the SELEX protocol by allowing for deep sequencing of the selection pools after each cycle. The emergence of High Throughput SELEX (HT-SELEX) has opened the field to new computational opportunities and challenges that are yet to be addressed. To aid the analysis of the results of HT-SELEX and to advance the understanding of the selection process itself, we developed AptaCluster. This algorithm allows for an efficient clustering of whole HT-SELEX aptamer pools; a task that could not be accomplished with traditional clustering algorithms due to the enormous size of such datasets. We performed HT-SELEX with Interleukin 10 receptor alpha chain (IL-10RA) as the target molecule and used AptaCluster to analyze the resulting sequences. AptaCluster allowed for the first survey of the relationships between sequences in different selection rounds and revealed previously not appreciated properties of the SELEX protocol. As the first tool of this kind, AptaCluster enables novel ways to analyze and to optimize the HT-SELEX procedure. Our AptaCluster algorithm is available as a very fast multiprocessor implementation upon request.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"8394 ","pages":"115-128"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-05269-4_9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32949306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hetunandan Kamisetty, Bornika Ghosh, Christopher James Langmead, Chris Bailey-Kellogg
{"title":"Learning Sequence Determinants of Protein:protein Interaction Specificity with Sparse Graphical Models.","authors":"Hetunandan Kamisetty, Bornika Ghosh, Christopher James Langmead, Chris Bailey-Kellogg","doi":"10.1007/978-3-319-05269-4_10","DOIUrl":"https://doi.org/10.1007/978-3-319-05269-4_10","url":null,"abstract":"<p><p>In studying the strength and specificity of interaction between members of two protein families, key questions center on <i>which</i> pairs of possible partners actually interact, <i>how well</i> they interact, and <i>why</i> they interact while others do not. The advent of large-scale experimental studies of interactions between members of a target family and a diverse set of possible interaction partners offers the opportunity to address these questions. We develop here a method, DgSpi (Data-driven Graphical models of Specificity in Protein:protein Interactions), for learning and using graphical models that explicitly represent the amino acid basis for interaction specificity (<i>why</i>) and extend earlier classification-oriented approaches (<i>which</i>) to predict the Δ<i>G</i> of binding (<i>how well</i>). We demonstrate the effectiveness of our approach in analyzing and predicting interactions between a set of 82 PDZ recognition modules, against a panel of 217 possible peptide partners, based on data from MacBeath and colleagues. Our predicted Δ<i>G</i> values are highly predictive of the experimentally measured ones, reaching correlation coefficients of 0.69 in 10-fold cross-validation and 0.63 in leave-one-PDZ-out cross-validation. Furthermore, the model serves as a compact representation of amino acid constraints underlying the interactions, enabling protein-level Δ<i>G</i> predictions to be naturally understood in terms of residue-level constraints. Finally, as a generative model, DgSpi readily enables the design of new interacting partners, and we demonstrate that designed ligands are novel and diverse.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"8394 ","pages":"129-143"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-05269-4_10","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32830491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kelley Harris, Sara Sheehan, John A Kamm, Yun S Song
{"title":"Decoding coalescent hidden Markov models in linear time.","authors":"Kelley Harris, Sara Sheehan, John A Kamm, Yun S Song","doi":"10.1007/978-3-319-05269-4_8","DOIUrl":"https://doi.org/10.1007/978-3-319-05269-4_8","url":null,"abstract":"<p><p>In many areas of computational biology, hidden Markov models (HMMs) have been used to model local genomic features. In particular, coalescent HMMs have been used to infer ancient population sizes, migration rates, divergence times, and other parameters such as mutation and recombination rates. As more loci, sequences, and hidden states are added to the model, however, the runtime of coalescent HMMs can quickly become prohibitive. Here we present a new algorithm for reducing the runtime of coalescent HMMs from quadratic in the number of hidden time states to linear, without making any additional approximations. Our algorithm can be incorporated into various coalescent HMMs, including the popular method PSMC for inferring variable effective population sizes. Here we implement this algorithm to speed up our demographic inference method diCal, which is equivalent to PSMC when applied to a sample of two haplotypes. We demonstrate that the linear-time method can reconstruct a population size change history more accurately than the quadratic-time method, given similar computation resources. We also apply the method to data from the 1000 Genomes project, inferring a high-resolution history of size changes in the European population.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"8394 ","pages":"100-114"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-319-05269-4_8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32766513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast and Accurate Calculation of Protein Depth by Euclidean Distance Transform.","authors":"Dong Xu, Hua Li, Yang Zhang","doi":"10.1007/978-3-642-37195-0_30","DOIUrl":"https://doi.org/10.1007/978-3-642-37195-0_30","url":null,"abstract":"<p><p>The depth of each atom/residue in a protein structure is a key attribution that has been widely used in protein structure modeling and function annotation. However, the accurate calculation of depth is time consuming. Here, we propose to use the Euclidean distance transform (EDT) to calculate the depth, which conveniently converts the protein structure to a 3D gray-scale image with each pixel labeling the minimum distance of the pixel to the surface of the molecule (i.e. the depth). We tested the proposed EDT method on a set of 261 non-redundant protein structures. The data show that the EDT method is 2.6 times faster than the widely used method by Chakravarty and Varadarajan. The depth value by EDT method is also highly accurate, which is almost identical to the depth calculated by exhaustive search (Pearson's correlation coefficient≈1). We believe the EDT-based depth calculation program can be used as an efficient tool to assist the studies of protein fold recognition and structure-based function annotation.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"7821 ","pages":"304-316"},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4098708/pdf/nihms592637.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32513516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Fertin, A. Labarre, I. Rusu, Éric Tannier, Stéphane Vialette
{"title":"Combinatorics of Genome Rearrangements","authors":"G. Fertin, A. Labarre, I. Rusu, Éric Tannier, Stéphane Vialette","doi":"10.7551/mitpress/9780262062824.001.0001","DOIUrl":"https://doi.org/10.7551/mitpress/9780262062824.001.0001","url":null,"abstract":"From one cell to another, from one individual to another, and from one species to another, the content of DNA molecules is often similar. The organization of these molecules, however, differs dramatically, and the mutations that affect this organization are known as genome rearrangements. Combinatorial methods are used to reconstruct putative rearrangement scenarios in order to explain the evolutionary history of a set of species, often formalizing the evolutionary events that can explain the multiple combinations of observed genomes as combinatorial optimization problems. This book offers the first comprehensive survey of this rapidly expanding application of combinatorial optimization. It can be used as a reference for experienced researchers or as an introductory text for a broader audience. Genome rearrangement problems have proved so interesting from a combinatorial point of view that the field now belongs as much to mathematics as to biology. This book takes a mathematically oriented approach, but provides biological background when necessary. It presents a series of models, beginning with the simplest (which is progressively extended by dropping restrictions), each constructing a genome rearrangement problem. The book also discusses an important generalization of the basic problem known as the median problem, surveys attempts to reconstruct the relationships between genomes with phylogenetic trees, and offers a collection of summaries and appendixes with useful additional information. Computational Molecular Biology series","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"28 1","pages":"I-XI, 1-288"},"PeriodicalIF":0.0,"publicationDate":"2009-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84613799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Boosting Protein Threading Accuracy.","authors":"Jian Peng, Jinbo Xu","doi":"10.1007/978-3-642-02008-7_3","DOIUrl":"https://doi.org/10.1007/978-3-642-02008-7_3","url":null,"abstract":"<p><p>Protein threading is one of the most successful protein structure prediction methods. Most protein threading methods use a scoring function linearly combining sequence and structure features to measure the quality of a sequence-template alignment so that a dynamic programming algorithm can be used to optimize the scoring function. However, a linear scoring function cannot fully exploit interdependency among features and thus, limits alignment accuracy.This paper presents a nonlinear scoring function for protein threading, which not only can model interactions among different protein features, but also can be efficiently optimized using a dynamic programming algorithm. We achieve this by modeling the threading problem using a probabilistic graphical model Conditional Random Fields (CRF) and training the model using the gradient tree boosting algorithm. The resultant model is a nonlinear scoring function consisting of a collection of regression trees. Each regression tree models a type of nonlinear relationship among sequence and structure features. Experimental results indicate that this new threading model can effectively leverage weak biological signals and improve both alignment accuracy and fold recognition rate greatly.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"5541 ","pages":"31-45"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-642-02008-7_3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30576321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feng Zhao, Jian Peng, Joe Debartolo, Karl F Freed, Tobin R Sosnick, Jinbo Xu
{"title":"A Probabilistic Graphical Model for Ab Initio Folding.","authors":"Feng Zhao, Jian Peng, Joe Debartolo, Karl F Freed, Tobin R Sosnick, Jinbo Xu","doi":"10.1007/978-3-642-02008-7_5","DOIUrl":"https://doi.org/10.1007/978-3-642-02008-7_5","url":null,"abstract":"<p><p>Despite significant progress in recent years, <i>ab initio</i> folding is still one of the most challenging problems in structural biology. This paper presents a probabilistic graphical model for ab initio folding, which employs Conditional Random Fields (CRFs) and directional statistics to model the relationship between the primary sequence of a protein and its three-dimensional structure. Different from the widely-used fragment assembly method and the lattice model for protein folding, our graphical model can explore protein conformations in a continuous space according to their probability. The probability of a protein conformation reflects its stability and is estimated from PSI-BLAST sequence profile and predicted secondary structure. Experimental results indicate that this new method compares favorably with the fragment assembly method and the lattice model.</p>","PeriodicalId":74675,"journal":{"name":"Research in computational molecular biology : ... Annual International Conference, RECOMB ... : proceedings. RECOMB (Conference : 2005- )","volume":"5541 ","pages":"59-73"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/978-3-642-02008-7_5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"31281362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}