{"title":"Relation Extraction for Protein-protein Interactions Affected by Mutations","authors":"Ziling Fan, Luca Soldaini, Arman Cohan, Nazli Goharian","doi":"10.1145/3233547.3233617","DOIUrl":"https://doi.org/10.1145/3233547.3233617","url":null,"abstract":"Precision Medicine has attracted increasing attention from biomedical research. Extracting information from biomedical literature about protein-protein interactions affected by mutations is a vital step towards PM because it uncovers mechanisms leading to diseases. We investigate a feature-rich supervised method to accomplish this relation extraction challenge. Our approach leverages a novel combination of features, as well as two auxiliary corpora, to achieve up 44% improvement in F1-score over baseline method.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116969473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Barry, Svetlana Shulga Morskaya, Tuyen Nguyen, Sarah Solomon, K. Fitzgerald, S. Milstein, G. Hinkle
{"title":"RNA-Seq Dose Response Experiments for Quantification of Off-Target Effects with RNAi Therapeutics","authors":"J. Barry, Svetlana Shulga Morskaya, Tuyen Nguyen, Sarah Solomon, K. Fitzgerald, S. Milstein, G. Hinkle","doi":"10.1145/3233547.3233656","DOIUrl":"https://doi.org/10.1145/3233547.3233656","url":null,"abstract":"RNAi therapeutics can be designed to silence almost any gene of interest and have demonstrated high levels of efficacy and acceptable safety profiles in pre-clinical and clinical development for cardio-metabolic, hepatic infectious, central nervous system, and rare diseases. Minimizing microRNA-like off-target activity while maintaining on-target silencing is a means to maximize the safety profile. One strategy to mitigate off-target activity is to incorporate thermally destabilizing residues such as glycol nucleic acid in the seed region of the antisense strand of a double-stranded RNA. Here we demonstrate the benefit of this strategy using Alnylam's ESC+ conjugate platform by performing RNA-Seq in dose response to measure both on-target and off-target effects. Diverse measures and visualizations of transcriptomic noise will be presented, as well as estimates of relative on-target to off-target effects as a function of dose. These results show that ESC+ conjugates are capable of simultaneously achieving high levels of on-target silencing while maintaining low levels of transcriptomic noise.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127148813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Phylogenetic Consensus for Exact Median Trees","authors":"Pawel Tabaszewski, P. Górecki, O. Eulenstein","doi":"10.1145/3233547.3233560","DOIUrl":"https://doi.org/10.1145/3233547.3233560","url":null,"abstract":"Solving median tree problems is a classic approach for inferring species trees from a collection of discordant gene trees. Such problems are typically NP-hard and dealt with by local search heuristics. Unfortunately, such heuristics generally lack any provable correctness and precision. Algorithmic advances addressing this uncertainty, have led to exact dynamic programming formulations suitable to solve a well-studied group of median tree problems for smaller phylogenetic analyzes. However, these formulations allow to compute only very few optimal species trees out of possibly many such trees, and phylogenetic studies often require the analysis of all optimal solutions through their consensus tree. Here, we describe a significant algorithmic modification of the dynamic programming formulations that compute the cluster counts of all optimal species trees from which various types of consensus trees can be efficiently computed. Through experimental studies, we demonstrate that our parallel implementation of the modified programming formulation is more efficient than a previous implementation of the original formulation, and can greatly benefit phylogenetic analyses.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122914677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid Spectral/Subspace Clustering of Molecular Dynamics Simulations","authors":"I. Syzonenko, Joshua L. Phillips","doi":"10.1145/3233547.3233595","DOIUrl":"https://doi.org/10.1145/3233547.3233595","url":null,"abstract":"Data clustering approaches are widely used in many domains including molecular dynamics (MD) simulation. Modern applications of clustering for MD simulation data must be capable of assessing both natively folded and disordered proteins. We compare the performance of the spectral clustering with a more recent subspace clustering approach, and a newly proposed 'hybrid' clustering algorithm which seeks to combine the useful characteristics of both methods on MD data from both protein classes. Results are analysed in terms of accuracy, stability, data density, and other properties. We conclude with what combinations of algorithms/improvements/data density will provide results that are either more accurate or more stable. We find that subspace clustering produces better results than standard spectral clustering, especially for disordered proteins and regardless of input data density or choice of affinity scaling. Additionally, our hybrid approach improves subspace results in most cases and entropic affinity scaling leads to a better performance of both spectral clustering and our hybrid approach.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129849627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","authors":"","doi":"10.1145/3233547","DOIUrl":"https://doi.org/10.1145/3233547","url":null,"abstract":"","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128773361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Differences in Engagement Among BabyCenter.com Community Forum Contributors: A Pilot Study","authors":"Austin Gu, C. Taylor","doi":"10.1145/3233547.3233621","DOIUrl":"https://doi.org/10.1145/3233547.3233621","url":null,"abstract":"For many women, online health communities, such as BabyCenter.com, provide mediums to quell doubts and receive answers amidst the pressing uncertainties of pregnancy [1]. Women contributing to such community forums often suffer from complications such as postpartum depression, and likely want their posts addressed in a timely and adequate manner. This work examined quantitative and qualitative factors that contribute to various levels of responsiveness to posts in BabyCenter.com post-partum depression online health communities. Our aim was to identify post characteristics conducive to higher levels of engagement from online health forum contributors. In this study, we analyzed characteristics of posts (length of the main text, time of day, existence of exclamation points or question marks in the title) to see if there was a relationship between the number of community comments (as a measure of engagement) and varying levels of the characteristics. The number of comments was used as a measure of engagement because it is an estimate of the extent to which community members were drawn to and felt compelled to interact with the post. For each of 100 randomly-selected posts (from 3 groups related to postpartum depression and anxiety), we generated summary statistics and performed two-sample t-tests. For the length of main post variable, a regression analysis was performed as well. In the end, we found no significant differences in engagement resulting from the three variables. For time of day, the average comments was 14.25 for AM posts, whereas the average number of comments for PM posts was 7.87. (p-value = 0.054, 95% CI: -0.11, 12.9, Figure 1). Length of the main post did not appear to predict level of engagement by online health community members (R2=0.0006, p-value=0.814, Figure 1). The difference in number of comments for posts with more than 148 words (median length) compared to posts with fewer than 148 words was also non-significant (p-value=0.58, 95% CI: -5.1, 2.8). Differences in engagement for posts with punctuation (exclamation point or question mark) in title (N=33) compared to those without (N=67) were non-significant (p-value=0.81, 95% CI: -4.0, 5.1) as well. (Figure 1) The strengths of this pilot study are in revealing characteristics that may appeal to users responding on online health community forums, it also sets the stage for future work investigating user behavior trends on social question and answering sites. The limitations include small sample size given our use of 100 randomly selected forum posts. Future work will assess larger data sets and examine more in-depth characteristics such as content and previous user behavior. For those participating in online health forum discussions, this research provides insight into factors that may foster a more reciprocal, communal environment for posting questions and comments. This process may lead to health benefits through providing better social support to posters managing postpartum","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131369965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cooper J. Park, Keir J. Macartney, Junfu Shen, Kunpeng Xie, Xin Zhang, R. Bergeron, W. Thomas, Cheryl P. Andam, A. Westbrook
{"title":"A Novel Approach for Increasing Taxonomic Resolution in Protein-Based Alignments","authors":"Cooper J. Park, Keir J. Macartney, Junfu Shen, Kunpeng Xie, Xin Zhang, R. Bergeron, W. Thomas, Cheryl P. Andam, A. Westbrook","doi":"10.1145/3233547.3233646","DOIUrl":"https://doi.org/10.1145/3233547.3233646","url":null,"abstract":"Most of today's genome sequencing technology requires that genomes be sequenced in fragments. Typically, these fragments are then aligned using a variety of different alignment programs. All alignment tools query against a reference database to determine the most accurate reassembly of the original DNA strand's nucleotide sequence. Although these programs can align in both nucleotide and protein space, each method comes with its own disadvantages. Protein aligners such as PALADIN consistently align a greater percent of reads faster and provide greater insight into the functional capabilities of the aligned sequence. On the other hand, this method reduces the sensitivity of taxonomic classification due to the degeneracy of the genetic codes. Our program, Renuc, is a PALADIN plugin that addresses this issue by taking protein alignment results using the UniProt database and identifying the most likely taxonomic origin for each nucleotide sequence associated with each detected protein. We have validated our approach and its implementation in Renuc by successfully retrieving the nucleotide sequence and corresponding taxonomic IDs for all of the aligned proteins in our test dataset consisting of a whole Escherichia coli genome. Our program aligns over 99 percent of the nucleotide reads with 97 percent of them remaining in the same protein cluster as the original protein alignment. However, this dataset is incredibly well studied and documented in UniProt. Future work should be considered with a dataset containing less annotations in the database. Renuc quickly identifies and visualizes the alignment's taxonomic data in a user friendly way. The integration of SQLite into the program significantly reduces the time required to retrieve information from the UniProt database. Currently, we seek to improve the retrieval of nucleotide sequences by creating a local cache of the NCBI RefSeq database, and visualizing taxonomy with greater resolution using RaxML.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131984380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Species Tree and Reconciliation Estimation under a Duplication-Loss-Coalescence Model","authors":"Peng Du, Luay K. Nakhleh","doi":"10.1145/3233547.3233600","DOIUrl":"https://doi.org/10.1145/3233547.3233600","url":null,"abstract":"Gene duplication and loss are two evolutionary processes that occur across all three domains of life. These two processes result in different loci, across a set of related genomes, having different gene trees. Inferring the phylogeny of the genomes from data sets of such gene trees is a central task in phylogenomics. Furthermore, when the evolutionary history of the genomes includes short branches, deep coalescence, or incomplete lineage sorting (ILS), could be at play, in addition to duplication and loss, further adding to the complexity of gene/genome relationships. Recently, researchers have developed methods to infer these evolutionary processes by simultaneously modeling gene duplication, loss, and incomplete lineage sorting with respect to a given (fixed) species tree. In this work, we focused on the task of inferring species trees, as well as locus and gene trees, from sequence data in the presence of all three processes. We developed a search heuristic for estimating the maximum a posteriori species/locus/gene tree triad, as well as their associated parameters, from the sequence data of independent gene families. We demonstrate the performance of our method on simulated data and a data set of 200 gene families from six yeast genomes. Our work enables new statistical phylogenomic analyses, particularly when hidden paralogy and incomplete lineage sorting could be simultaneously at play.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130062981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. R. Amin, Alisa Yurovsky, Yingtao Tian, S. Skiena
{"title":"DeepAnnotator: Genome Annotation with Deep Learning","authors":"M. R. Amin, Alisa Yurovsky, Yingtao Tian, S. Skiena","doi":"10.1145/3233547.3233577","DOIUrl":"https://doi.org/10.1145/3233547.3233577","url":null,"abstract":"Genome annotation is the process of labeling DNA sequences of an organism with its biological features, and is one of the fundamental problems in Bioinformatics. Public annotation pipelines such as NCBI integrate a variety of algorithms and homology searches on public and private databases. However, they build on the information of varying consistency and quality, produced over the last two decades. We identified 12,415 errors in NCBI RNA gene annotations, demonstrating the need for improved annotation programs. We use Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) to demonstrate the potential of deep learning networks to annotate genome sequences, and evaluate different approaches on prokaryotic sequences from NCBI database. Particularly, we evaluate DNA $K-$mer embeddings and the application of RNNs for genome annotation. We show how to improve the performance of our deep networks by incorporating intermediate objectives and downstream algorithms to achieve better accuracy. Our method, called DeepAnnotator, achieves an F-score of ~94%, and establishes a generalized computational approach for genome annotation using deep learning. Our results are very encouraging as our method eliminates the requirement of hand crafted features and motivates further research in application of deep learning to full genome annotation. DeepAnnotator algorithms and models can be accessed in Github: urlhttps://github.com/ruhulsbu/DeepAnnotator.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132619080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HetNetAligner: A Novel Algorithm for Local Alignment of Heterogeneous Biological Networks","authors":"Marianna Milano, P. Guzzi, M. Cannataro","doi":"10.1145/3233547.3233690","DOIUrl":"https://doi.org/10.1145/3233547.3233690","url":null,"abstract":"The importance of the use of networks to model and analyse biological data and the interplay of bio-molecules is widely recognised. Consequently, many algorithms for the analysis and the comparison of networks (such as alignment algorithms) have been developed in the past. Recently, many different approaches tried to integrate into a single model the interplay of different molecules, such as genes, transcription factors and microRNAs. A possible formalism to model such scenario comes from node/edge coloured networks (or heterogeneous networks) implemented as node/ edge-coloured graphs. Consequently, the need for the introduction of alignment algorithms able to analyse heterogeneous networks arises. We here focus on the local comparison of heterogeneous networks that may be formulated as a network alignment problem. To the best of our knowledge, this problem has not been investigated in the past. We here propose HetNetAligner a novel algorithm that receives as input two heterogeneous networks (node-coloured graphs) and a similarity function among nodes of two networks. We first build a single alignment graph. Then we mine this graph extracting relevant subgraphs. We also implemented our algorithm, and we tested it on some selected heterogeneous biological networks. Preliminary results confirm that our method builds high-quality alignments. The website https://sites.google.com/view/heterogeneusnetworkalignment/home contains supplementary material and the code.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132876047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}