Yi Zhang, Xiaofei Zhang, A. Lane, T. Fan, Jinze Liu
{"title":"TFmeta: A Machine Learning Approach to Uncover Transcription Factors Governing Metabolic Reprogramming","authors":"Yi Zhang, Xiaofei Zhang, A. Lane, T. Fan, Jinze Liu","doi":"10.1145/3233547.3233580","DOIUrl":"https://doi.org/10.1145/3233547.3233580","url":null,"abstract":"Metabolic reprogramming is a hallmark of cancer. In cancer cells, transcription factors (TFs) govern metabolic reprogramming through abnormally increasing or decreasing the transcription rate of metabolic enzymes, which provides cancer cells growth advantages and concurrently leads to the altered metabolic phenotypes observed in many cancers. Consequently, targeting TFs that govern metabolic reprogramming can be highly effective for novel cancer therapeutics. In this work, we present TFmeta, a machine learning approach to uncover TFs that govern reprogramming of cancer metabolism. Our approach achieves state-of-the-art performance in reconstructing interactions between TFs and their target genes on public benchmark data sets. Leveraging TF binding profiles inferred from genome-wide ChIP-Seq experiments and 150 RNA-Seq samples from 75 paired cancerous (CA) and non-cancerous (NC) human lung tissues, our approach predicted 19 key TFs that may be the major regulators of the gene expression changes of metabolic enzymes of the central metabolic pathway glycolysis, which may underlie the dysregulation of glycolysis in non-small-cell lung cancer patients.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133429648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Consensus Approach to Infer Tumor Evolutionary Histories","authors":"Kiya W. Govek, Camden Sikes, Layla Oesper","doi":"10.1145/3233547.3233584","DOIUrl":"https://doi.org/10.1145/3233547.3233584","url":null,"abstract":"Inspired by recent efforts to model cancer evolution with phylogenetic trees, we consider the problem of finding a consensus tumor evolution tree from a set of conflicting input trees. In contrast to traditional phylogenetic trees, the tumor trees we consider contain features such as mutation labels on internal vertices (in addition to the leaves) and allow multiple mutations to label a single vertex. We describe several distance measures between these tumor trees and present an algorithm to solve the consensus problem called GraPhyC. Our approach uses a weighted directed graph where vertices are sets of mutations and edges are weighted using a function that depends on the number of times a parental relationship is observed between their constituent mutations in the set of input trees. We find a minimum weight spanning arborescence in this graph and prove that the resulting tree minimizes the total distance to all input trees for one of our presented distance measures. We evaluate our GraPhyC method using both simulated and real data. On simulated data we show that our method outperforms a baseline method at finding an appropriate representative tree. Using a set of tumor trees derived from both whole-genome and deep sequencing data from a Chronic Lymphocytic Leukemia patient we find that our approach identifies a tree not included in the set of input trees, but that contains characteristics supported by other reported evolutionary reconstructions of this tumor.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133459840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sendong Zhao, Meng Jiang, Ming Liu, Bing Qin, Ting Liu
{"title":"CausalTriad","authors":"Sendong Zhao, Meng Jiang, Ming Liu, Bing Qin, Ting Liu","doi":"10.1145/3233547.3233555","DOIUrl":"https://doi.org/10.1145/3233547.3233555","url":null,"abstract":"Deriving pseudo causal relations from medical text data lies at the heart of medical literature mining. Existing studies have utilized extraction models to find pseudo causal relation from single sentences, while the knowledge created by causation transitivity - often spanning multiple sentences - has not been considered. Furthermore, we observe that many pseudo causal relations follow the rule of causation transitivity, which makes it possible to discover unseen casual relations and generate new causal relation hypotheses. In this paper, we address these two issues by proposing a factor graph model to incorporate three clues to discover causation expressions in the text data. We propose four types of triad structures to represent the rules of causation transitivity among causal relations. Our proposed model, called CausalTriad, uses textual and structural knowledge to infer pseudo causal relations from the triad structures. Experimental results on two datasets demonstrate that (a) CausalTriad is effective for pseudo causal relation discovery within and across sentences; (b) CausalTriad is highly capable at recognizing implicit pseudo causal relations; (c) CausalTriad can infer missing/new pseudo causal relations from text data.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117144718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Kamruzzaman, A. Kalyanaraman, B. Krishnamoorthy
{"title":"Detecting Divergent Subpopulations in Phenomics Data using Interesting Flares","authors":"M. Kamruzzaman, A. Kalyanaraman, B. Krishnamoorthy","doi":"10.1145/3233547.3233593","DOIUrl":"https://doi.org/10.1145/3233547.3233593","url":null,"abstract":"One of the grand challenges of modern biology is to understand how genotypes (G) and environments (E) interact to affect phenotypes (P), i.e., G × E - P . Phenomics is the emerging field that aims to study large and complex data sets encompassing combinations of genotypes, environments, phenotypes readings. A phenomenon of crucial interest in this context is that of divergent subpopulations, i.e., how certain subgroups of the population show differential behavior under different types of environmental conditions. We consider the fundamental task of identifying such \"interesting\" subpopulation-level behavior by analyzing high-dimensional phenomics data sets from a large and diverse population. However, delineation of such subpopulations is a challenging task due to the large size, high dimensionality, and complexity of phenomics data. We present a new framework to extract such subpopulation-level information from phenomics data. Our approach is based on principles from algebraic topology, a branch of mathematics that studies shapes and structure of data in a robust manner. In particular, our framework identifies and quantifies \"flares\", which are structural branching features in data that characterize divergent behavior of subpopulations, in an unsupervised manner. We present algorithms to detect and rank flares, and demonstrate the utility of the proposed framework on two real-world plant phenomics data sets.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122070486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huijun Wang, Francisco Garcia, An Chi, Ivan Cornella Taracido, Anne Mai Wasssermann, Andy Liaw
{"title":"Profiling Diverse Chemical Space to Map the Druggable Proteome","authors":"Huijun Wang, Francisco Garcia, An Chi, Ivan Cornella Taracido, Anne Mai Wasssermann, Andy Liaw","doi":"10.1145/3233547.3233624","DOIUrl":"https://doi.org/10.1145/3233547.3233624","url":null,"abstract":"Chemoproteomics is a powerful mass spectrometry?based affinity chromatography approach for identifying proteome-wide small molecule-protein interactions.1 It aims for unbiased determination of drug targets in a complex cellular environment. Chemoproteomics has been one of the central methods of choice for small molecule mechanism of action (MOA) deconvolution of phenotypic screen hits, as well as for understanding the selectivity and off-target biological activities. In order to understand the modulation of the human proteome with small molecules in a comprehensive and systematic manner, a chemically diverse probe set with drug-like characteristics has been selected and profiled against 8 relevant biosamples including cells and human organ tissues to delineate protein target binding spectra in an unbiased manner at a global scale. In this work, we will update progress-to-date on experimental design, optimization, and current findings from this unprecedented rich system-chemical biology dataset. We will use examples from this study to highlight the cheminformatics and bioinformatics solutions that we developed to address the unique challenge of chemical biology/chemical proteomics data. Insights from this chemoproteomics profiling effort will be discussed from the perspectives of: 1) compound selectivity in the context of diverse biological samples beyond industry standard practice of using in vitro recombinant protein profiling panel or in one or two model cell lines, 2) frequent targets and chemo type hitters as well as 3) novel potential target examples. These efforts to develop a unique human chemo-proteomic database, together with chemo-genomic and transcriptomic approaches, provide chemical biologists the means to prosecute novel target identification and subsequent validation studies in support of relevant disease areas.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122904115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Decoding TDP-43 Dependent Cryptic Splicing in Amyotrophic Lateral Sclerosis and Identifying Novel Disease-causing Genes","authors":"H. Yalamanchili, Hyun-hwan Jeong, Zhandong Liu","doi":"10.1145/3233547.3233698","DOIUrl":"https://doi.org/10.1145/3233547.3233698","url":null,"abstract":"sectionBackground Amyotrophic lateral sclerosis (ALS), is a neurodegenerative disease that primarily effects motor neurons in both brain and spinal cord citezarei2015comprehensive. Several independent studies conformed the deposition of TAR DNA-binding protein (TDP)-43 aggregates in the cytoplasm of the effected cells suggesting the role of TDP-43 in ALS. However, the molecular mechanism of TDP-43 in ALS is not well established. It is only recently reported that TDP-43 contributes to pre-mRNA splicing by inhibiting cryptic exons citeling2015tdp. While this is a very interesting observation, it opens to several intriguing aspects of TDP-43 dependent splicing errors like preferential 5'/3' errors, enrichment of specific alternative splicing events and Intron retentions. A systematic characterization and decoding TDP-43 cryptic splicing is critical to better understanding of the molecular pathogenesis of ALS. However, none of the existing computational approaches are precisely designed for cryptic splice characterization, which advocates a strong need of robust genome-wise scalable pipeline. sectionResults In this study we applied CrypSplice citetan2016extensive, in-house novel cryptic splice site detection and characterization method on several publicly available TDP-43 datasets. Every junction is subjected to a beta binomial test and characterize to aid molecular inferences. Upon exploring 18 TDP-43 knock-down samples across different tissues and cell lines we found that genes that are targeted by cryptic splicing are enriched in cell cycle, autophagy and protein folding. While this is in good agreement with previous studies we uncovered a preferential enrichment of 5' splice site errors indicating a U1 spliceosome mediated mechanism. To infer a co-splicing network, similar cryptic splicing characterization was performed on a total of 236 samples covering 118 RNA binding proteins (RBPs) citeyalamanchili2017data. A network of RBPs was constructed based on the induced cryptic load similarity w.r.t TDP-43 cryptic signature that are also validated by TDP-43 binding (eCLIP-Seq). We found other reported ALS genes like FUS, HNRNPA1 and TAF15 enriched in the neighboring genes of TDP-43 in the RBP network. Novel (putative) ALS-causing RBPs are identified and prioritized using Network Propagation, Guilt by association, and Cryptic signature similarity. sectionConclusion Through a comprehensive CrypSplice analysis we uncovered a preferential enrichment of TDP-43 dependent 5' splice site errors. Network propagation and prioritization of RBP cryptic network yielded a list of (putative) novel ALS associated genes. Further follow-ups through genetic screening could discover more ALS causing genes and aid decoding the underlying molecular mechanism.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124087887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Seq3seq Fingerprint: Towards End-to-end Semi-supervised Deep Drug Discovery","authors":"Xiaoyu Zhang, Sheng Wang, Feiyun Zhu, Zheng Xu, Yuhong Wang, Junzhou Huang","doi":"10.1145/3233547.3233548","DOIUrl":"https://doi.org/10.1145/3233547.3233548","url":null,"abstract":"Observing the recent progress in Deep Learning, the employment of AI is surging to accelerate drug discovery and cut R&D costs in the last few years. However, the success of deep learning is attributed to large-scale clean high-quality labeled data, which is generally unavailable in drug discovery practices. In this paper, we address this issue by proposing an end-to-end deep learning framework in a semi-supervised learning fashion. That is said, the proposed deep learning approach can utilize both labeled and unlabeled data. While labeled data is of very limited availability, the amount of available unlabeled data is generally huge. The proposed framework, named as seq3seq fingerprint, automatically learns a strong representation of each molecule in an unsupervised way from a huge training data pool containing a mixture of both unlabeled and labeled molecules. In the meantime, the representation is also adjusted to further help predictive tasks, e.g., acidity, alkalinity or solubility classification. The entire framework is trained end-to-end and simultaneously learn the representation and inference results. Extensive experiments support the superiority of the proposed framework.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125876659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ACM-BCB 2018 ParBio Chairs' Welcome & Organization","authors":"Giuseppe Agapito, W. Lloyd","doi":"10.1145/3233547.3233727","DOIUrl":"https://doi.org/10.1145/3233547.3233727","url":null,"abstract":"It is our great pleasure to welcome you to the ACM-BCB 2018 ParBio workshop. We received 3 submissions from around the world covering a broad range of topics. We evaluated them regarding relevance, quality, and novelty, selecting 3 full papers. We took into account the coverage of the different areas related to ParBio as well as the potential audience, to schedule presentations in a single day with minimal audience interest overlap. ParBio will take place in the morning and will include the following three presentations: A Cooperative Vehicle Routing Algorithm for Logistic Management in Healthcare A Voice-Aware System for Vocal Wellness Deep Learning Based Medical Diagnosis System Using Multiple Data Sources","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129393309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Similarity Metrics on Real World Data to Recommend the Next Treatment","authors":"Kyle Haas, M. Mahoui, Simone Gupta, Stuart Morton","doi":"10.1145/3233547.3233647","DOIUrl":"https://doi.org/10.1145/3233547.3233647","url":null,"abstract":"Studies using similarity metrics have been used to help quantify the relationship between patients; however, these studies do not leverage either the patients' prior treatments or the ordering of these treatments. Our proposal seeks to recommend the next treatment for a given patient by comparing the overall survival of similar patients who share a common treatment stem. Data was aggregated from the FlatIron® Advanced non-small-cell lung (NSCLC) proprietary dataset [1] comprised of 1312 patients from 2008-2016. Our methodology pipeline was comprised of three main components (non-treatment-based similarity (NTS), treatment-based similarity (TS), recommendation). For NTS, we divided all non-treatment features into 2 main categories (i.e. genetic category and clinical/demographic category), computed a patient similarity using Gower Similarity Metric [2] for each category, and lastly, followed a similar approach as Gottlieb et al [3] and created a single similarity measure using the geometric mean. A similarity threshold is used to select for a reference patient p a set of similar patients Simp with a similarity value (determined from the genetic and clinical/demographic features) above the threshold value. TS is used during the next step to filter from Simp the patients that do not share the same treatment class-level stem (prior treatments) as the reference patient. The objective is to consider only patients who share similar previous class treatments in order to determine the next treatment class for the reference patient. From this final subset of patients, we determine which patient has survived the longest number of days following the treatment stem and we recommend this patient's next treatment class to the reference patient. To evaluate this approach, we repeated this methodology across 10 random subsets of patients where each subset was 10% of the entire data set. Each patient in each subset was viewed as a reference patient and compared against the patients outside the random subset. We varied the length of the initial treatment stem and varied the NTS similarity threshold value. We found that for stems lower than two treatments and with similarity thresholds above 0.6, only approximately 30% of patients took the same treatment as the longest surviving patient in the subpopulation. Further work is needed to refine the proposed approach, including assigning different weights to the features used in the similarity computation, and considering other outcome variables to recommend next treatment (e.g. quality of life using ECOG performance score).","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129284157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fahad Almsned, Gideon K Gogovi, Nicole R Bracci, K. Kehn-Hall, Estela Blaisten-Barojas, Amarda Shehu
{"title":"Modeling the Tertiary Structure of a Multi-domain Protein: Structure Prediction of Multi-domain Proteins","authors":"Fahad Almsned, Gideon K Gogovi, Nicole R Bracci, K. Kehn-Hall, Estela Blaisten-Barojas, Amarda Shehu","doi":"10.1145/3233547.3233702","DOIUrl":"https://doi.org/10.1145/3233547.3233702","url":null,"abstract":"Due to the central role that tertiary structure plays in determining protein function, resolving protein tertiary structures is an integral research thrust in both wet and dry laboratories. Dry laboratories have primarily focused on small- to medium-size proteins. However, proteins central to human biology and human health are often quite complex, containing multiple domains and consisting of thou- sands of amino acids. Such proteins are challenging for various reasons, including the inability to crystallize. We present a case study of structure determination for the Rift Valley fever virus L-protein, a a large, multi-domain protein with currently no available tertiary structure. We employ this case study as an emerging paradigm and demonstrate how to leverage the rich and diverse landscape of bioinformatics tools for building tertiary structure models for multi-domain proteins with thousands of amino acids.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124599905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}