{"title":"The Atomizer: Extracting Implicit Molecular Structure from Reaction Network Models","authors":"J. Tapia, J. Faeder","doi":"10.1145/2506583.2512389","DOIUrl":"https://doi.org/10.1145/2506583.2512389","url":null,"abstract":"In this paper we introduce the Atomizer, an expert system for extracting implicit information from reaction network models, like those encoded by the Systems Modeling Markup Language (SBML), to create a structured translation using the rule-based modeling paradigm. Atomized models can be visualized in a compact form through contact maps, which show the underlying molecules, components, and interactions used to construct a model. Analysis of the atomized reactions reveals simplifying assumptions made in the construction of a model that limit the combinatorial complexity. These benefits are elucidated through a case study. We anticipate that the library of translated rule-based models we can generate using the Atomizer will be useful to the biological modeling community by providing a more accessible view of the available models and by facilitating their extension and merging. 9939","PeriodicalId":287007,"journal":{"name":"Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127209737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Confidence Measure for Model Fitting with X-Ray Crystallography Data","authors":"Y. Lei, Ramgopal R. Mettu","doi":"10.1145/2506583.2506609","DOIUrl":"https://doi.org/10.1145/2506583.2506609","url":null,"abstract":"Structure determination from X-ray crystallography requires numerous stages of iterative refinement between real and reciprocal space. Current methods that fit a model structure to X-ray data therefore utilize a refined experimental electron density map along with a scoring function that characterizes the fit of the density map to structure. Additional information (e.g., from an energy function or conformational statistics) may supplement this score. In this paper, we derive a novel confidence measure for fitting model fragments into X-ray crystallography data. Given any set of conformations under consideration (e.g., a set of sidechain rotamers, or backbone fragments), and a scoring function for those conformations (e.g., least squares fit of the associated model density maps), we give a general-purpose method for assessing the confidence of the best-fit model. For the commonly used least-squares measure of fit, our method analyzes the statistics of the matching scores and estimates the probability that the best-fit conformation is the correct underlying model. To our knowledge, ours is the first method for computing such a confidence measure. To demonstrate the practical utility of our method, we study the problem of sidechain placement and show that our confidence measure can be used to detect and correct incorrect conformational predictions. Over nine proteins with density maps of varying resolutions, the Pearson correlation between predictive accuracy (of least-squares fit) and our confidence measure is quite high, about .89. We show that our approach can guide the use of stereochemical restraints when confidence is low in predictions. We also propose a Bayesian data fusion scheme that integrates our confidence measure to weight the contributon of each source of data, which could potentially be used for combining experimental, modeling, and empirical data in automated structure determination.","PeriodicalId":287007,"journal":{"name":"Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125019612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantum Sequence Analysis: A New Alignment-free Technique For Analyzing Sequences in Feature Space","authors":"M. Daoud","doi":"10.1145/2506583.2512375","DOIUrl":"https://doi.org/10.1145/2506583.2512375","url":null,"abstract":"In this paper, we propose a new alignment-free sequence analysis technique (quantum sequence analysis) that can be used to analyze sequences in feature space. The proposed technique can used to estimate the membership value of a given query sequence with respect to different classes of sequences using stochastic approximation, and without assuming any prior stochastic assumptions. We implemented the proposed technique using real datasets, and the proposed technique shows effectiveness in analyzing sequences in feature space.","PeriodicalId":287007,"journal":{"name":"Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125143333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genomic Sequence Fragment Identification using Quasi-Alignment","authors":"Anurag Nagar, Michael Hahsler","doi":"10.1145/2506583.2506647","DOIUrl":"https://doi.org/10.1145/2506583.2506647","url":null,"abstract":"Identification of organisms using their genetic sequences is a popular problem in molecular biology and is used in fields such as metagenomics, molecular phylogenetics and DNA Barcoding. These applications depend on searching large sequence databases for individual matching sequences (e.g., with BLAST) and comparing sequences using multiple sequence alignment (e.g., via Clustal), both of which are computationally expensive and require extensive server resources. We propose a novel method for sequence comparison, analysis, and classification which avoids the need to align sequences at the base level or search a database for similarity. Instead, our method uses alignment-free methods to find probabilistic quasi-alignments for longer (typically 100 base pairs) segments. Clustering is then used to create compact models that can be used to analyze a set of sequences and to score and classify unknown sequences against these models. In this paper we expand prior work in two ways. We show how quasi-alignments can be expanded into larger quasi-aligned sections and we develop a method to classify short sequence fragments. The latter is especially useful when working with Next-Generation Sequencing (NGS) techniques that generate output in the form of relatively short reads. We have conducted extensive experiments using fragments from bacterial 16S rRNA sequences obtained from the Greengenes project and our results show that the new quasi-alignment based approach can provide excellent results as well as overcome some of the restrictions of by the widely used Ribosomal Database Project (RDP) classifier.","PeriodicalId":287007,"journal":{"name":"Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125805738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computational methods for alternative splicing detection using RNA-seq","authors":"Ruolin Liu, J. Dickerson","doi":"10.1145/2506583.2506666","DOIUrl":"https://doi.org/10.1145/2506583.2506666","url":null,"abstract":"RNA-seq technology promises a comprehensive picture of transcriptome. The traditional way of studying differential expression gene is questionable because it fails to consider alternative transcription and post-transcriptional modification. Although some studies have shown that transcript variants from a gene are predominantly generated from alternative transcription, including alternative promoters and transcriptional terminations, rather than splicing mechanisms, more computation methods focus on alternative splicing detection and quantification. Here we are only interested in methods which are able to detect condition-specific difference using RNA-seq and we categorize them into two major classes: Region Quantification (RQ) and Isoform Quantification (IQ). RQ breaks down the gene structure into\"horizontally parallel pieces\", exon units for example, and quantifies the expression in these \"small pieces\" and compares them across different conditions. While IR seeks to separate gene expression into \"vertically parallel isoform\", which itself is a challenging task but is more biologically meaningful, and compares a gene's isoform compositions across different conditions. In addition, based on their ability to localize significantly different regions we can further classify them into \"gene-centric\" or \"exon-centric\" method. The combination of two classification strategies yields 4 categories and we choose one representative for each category. These four representatives are Cufflinks-Cuffdiff package, DEXSeq, DiffSplice and SplicingCompass. We evaluate their performance on alternative splicing analysis using three experiments. The first experiment uses a published RNA-seq data of Arabidopsis under cold condition (NCBI SRA009031). The second experiment is a simulation study using a custom simulator by which we adopt negative binomial model to account for variability across biological replicates. The last experiment makes use of RT-PCR to evaluate the results from different methods.","PeriodicalId":287007,"journal":{"name":"Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125582689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RNA-Seq analyses to reveal the human transcriptome landscape","authors":"N. Deng, D. Zhu","doi":"10.1145/2506583.2506603","DOIUrl":"https://doi.org/10.1145/2506583.2506603","url":null,"abstract":"Alternative splicing plays important roles in many biological processes including diseases. It markedly increases the diversity of transcriptome and proteome since over 90% of human genes are alternatively spliced. Recently, the high-throughput RNA-Seq technology makes it possible to better characterize and understand transcriptomes. Differential expression and differential splicing are two fundamental yet crucial analyses to study differences between transcriptomes. The results from analyses may reveal the landscape of human transcriptomes and yield new insight into cell differentiation that may lead to human disease. We present the analysis results from two RNA-Seq data sets to study the transcriptomes of a human disease and a type of human cell differentiation. For the first study, we applied our analysis pipeline to a RNA-Seq data set of human Idiopathic Pulmonary Fibrosis (IPF) disease. We present a joint analysis result of differential expression and differential splicing to view genes from both aspects simultaneously. We also provide several non-differentially spliced genes with splicing variants validated by qRT-PCR experiments. For the second study, we developed a novel computational method, and applied it on a public RNA-Seq data set of human H1 and H1 differentiation into neural progenitor cell lines. We systematically detected many significant differential splicing events falling into five well-known types of alternative splicing. We present the proportion of the five types of detected differential splicing events in this study. For each type of splicing event, we show a case study to demonstrate the detection procedure of the differential splicing event.","PeriodicalId":287007,"journal":{"name":"Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122301501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying protein complexes in AP-MS data with negative evidence via soft Markov clustering","authors":"Yu-Keng Shih, S. Parthasarathy","doi":"10.1145/2506583.2506591","DOIUrl":"https://doi.org/10.1145/2506583.2506591","url":null,"abstract":"Protein complexes are key units to discover protein mechanism. Traditional protein complex identification methods adopt a soft (overlapping) network clustering algorithm on protein-protein interaction network and predict the clusters as protein complexes. Recently, the AP-MS technique and the scoring method can measure the co-complex relationship among proteins. Unlike traditional PPI networks, AP-MS can provide negative evidence which indicates which proteins are unlikely to be in the same protein complex. However, most of existing network clustering algorithms cannot utilize this negative similarity score. In this paper, we propose a soft network clustering algorithm, SR-MCL-N, which can take into account negative similarity scores. SR-MCL-N is a variation of a previous algorithm, SR-MCL, which is a network clustering algorithm based on the transition flow. Additionally, since the scoring approach we use produces a dense similarity matrix, a sparsification technique is adopted on the similarity matrix. Based on the gold standard CYC2008 and GO terms, we first show that the sparsification can not only speed up SR-MCL-N, but also let SR-MCL-N generate more accurate clusters. SR-MCL-N is then compared against SR-MCL and a hierarchical algorithm which also considers negative similarity score. The results indicate that our algorithm outperforms others since SR-MCL-N not only generates overlapped clusters but also additionally takes negative similarity score into account.","PeriodicalId":287007,"journal":{"name":"Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128740947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guillermo Palma, Maria-Esther Vidal, E. Haag, L. Raschid, Andreas Thor
{"title":"Measuring Relatedness Between Scientific Entities in Annotation Datasets","authors":"Guillermo Palma, Maria-Esther Vidal, E. Haag, L. Raschid, Andreas Thor","doi":"10.1145/2506583.2506651","DOIUrl":"https://doi.org/10.1145/2506583.2506651","url":null,"abstract":"Linked Open Data has made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms (CV terms) from ontologies. These semantic annotations encode scientific knowledge which is captured in annotation datasets. One can mine these datasets to discover relationships and patterns between entities. Determining the relatedness (or similarity) between entities becomes a building block for graph pattern mining, e.g., identifying drug-drug relationships could depend on the similarity of the diseases (conditions) that are associated with each drug. Diverse similarity metrics have been proposed in the literature, e.g., i) string-similarity metrics; ii) path-similarity metrics; iii) topological-similarity metrics; all measure relatedness in a given taxonomy or ontology. In this paper, we consider a novel annotation similarity metric AnnSim that measures the relatedness between two entities in terms of the similarity of their annotations. We model AnnSim as a 1-to-1 maximal weighted bipartite match, and we exploit properties of existing solvers to provide an efficient solution. We empirically study the effectiveness of AnnSim on real-world datasets of genes and their GO annotations, clinical trials, and a human disease benchmark. Our results suggest that AnnSim can provide a deeper understanding of the relatedness of concepts and can provide an explanation of potential novel patterns.","PeriodicalId":287007,"journal":{"name":"Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124743931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating the Number of Manually Segmented Cellular Objects Required to Evaluate the Performance of a Segmentation Algorithm","authors":"A. Peskin, J. Chalfoun, K. Kafadar, J. Elliott","doi":"10.1145/2506583.2512384","DOIUrl":"https://doi.org/10.1145/2506583.2512384","url":null,"abstract":"We propose a new strategy for estimating the number of cellular objects that should be manually segmented for evaluating the segmentation performance of an algorithm. The strategy uses geometric and edge quality measurements that are directly related to segmentation performance, but do not require highly accurate segmentation. Sample sizes are determined from standard deviations of cell features calculated from the entire image set. We examine the relationship between approximate confidence level and sample size. The use of our strategy may reduce the effort and time required for generating a reference dataset for evaluating segmentation algorithm performance with images of biological cells. We demonstrate the usefulness of this methodology on a large and diverse data set for which reference data are available.","PeriodicalId":287007,"journal":{"name":"Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126790520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kamal Al-Nasr, Lin Chen, D. Ranjan, M. Zubair, Dong Si, Jing He
{"title":"A Constrained K-shortest Path Algorithm to Rank the Topologies of the Protein Secondary Structure Elements Detected in CryoEM Volume Maps","authors":"Kamal Al-Nasr, Lin Chen, D. Ranjan, M. Zubair, Dong Si, Jing He","doi":"10.1145/2506583.2506705","DOIUrl":"https://doi.org/10.1145/2506583.2506705","url":null,"abstract":"Although many electron density maps have been produced into the medium resolutions, it is still challenging to derive the atomic structure from such volumetric data. Current methods primarily rely on the availability of an existing atomic structure for fitting or a homologous template structure for modeling. In the process of developing a template-free, de novo, method, the topology of the secondary structure elements need to be resolved first. In this paper, we extend our previous algorithm of finding the optimal solution in the constraint graph problem. We illustrate an approach to obtain the top-K topologies by combining a dynamic programming algorithm with the K-shortest path algorithm. The effectiveness of the algorithms is demonstrated from the test using three datasets of different nature. The algorithm improves the accuracy, space and time of an existing method.","PeriodicalId":287007,"journal":{"name":"Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114797992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}