J. Barry, Srivathsan Srinivasagopalan, Sharma V. Thankachan, V. Gurupur
{"title":"Diagnosing Schizophrenia: A Deep Learning Approach","authors":"J. Barry, Srivathsan Srinivasagopalan, Sharma V. Thankachan, V. Gurupur","doi":"10.1145/3233547.3233658","DOIUrl":"https://doi.org/10.1145/3233547.3233658","url":null,"abstract":"This paper presents a new method for diagnosing schizophrenia using deep learning. This experiment used a secondary dataset supplied by the National Institute of Health. The experiment analyzes the dataset and identifies schizophrenia using traditional machine learning methods such as logistic regression, support vector machines, and random forest. Finally, a deep neural network with three hidden layers is applied to the dataset. The results show that the neural network model yielded the highest accuracy, suggesting that deep learning may be a feasible method for diagnosing schizophrenia.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130804013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Distributed Constrained Non-negative Matrix Factorization Algorithm for Time-Series Gene Expression Data","authors":"Matthew S. Dyer, Julian Dymacek","doi":"10.1145/3233547.3233579","DOIUrl":"https://doi.org/10.1145/3233547.3233579","url":null,"abstract":"We present a new distributed computing algorithm, Parallel Pattern Discovery (PPD), for constrained Non-negative Matrix Factorization (NMF). Our implementation offers the ability to constrain a specific pattern for optimization of the data while minimizing reconstruction error. Parallel Pattern Discovery operates within a distributed environment using a message passing interface. Distribution of the PPD algorithm provides better scalability and allows operation in single- or multiple-system environments. The algorithm was tested on a set of time-series, dose-dependent mRNA gene expression data. Parallel Pattern Discovery was found to accurately identify patterns within the data and reconstruct the original matrices. Our NMF algorithm found a smaller reconstruction error when compared against standard NMF algorithms. Development focused on running PPD as part of a system which identifies significantly contributing genes. Parallel Pattern Discovery is first run to find patterns from biological data. It is followed by Gene Set Enrichment (GSE) which takes the pattern data and relates it back to genetic pathways.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127013504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. E. Manoochehri, Susmitha Sri Kadiyala, J. Birjandtalab, M. Nourani
{"title":"Feature Selection to Predict Compound's Effect on Aging","authors":"H. E. Manoochehri, Susmitha Sri Kadiyala, J. Birjandtalab, M. Nourani","doi":"10.1145/3233547.3233597","DOIUrl":"https://doi.org/10.1145/3233547.3233597","url":null,"abstract":"Biological aging process is the main cause to many age-related diseases. Therefore, exploring cellular level changes due to aging, chemical impacts and anti-aging compounds are of high interest in drug discovery and personalized drugs research. In this paper, we propose a model to predict the effect of chemical compounds on lifespan of Caenorhabditis elegans. We analyze the data from DrugAge database, which includes chemical compounds that affect lifespan of model organisms and use chemical descriptors and gene ontology as features. We propose a new feature selection scheme based on particle swarm optimization and correlation-based feature selection to select the most relevant features for classification task. The experimental results indicate our approach achieves higher performance over the existing methods. We discuss the benefits of our proposed feature selection schema over other methodologies and compare our results conducted by random forest with base-line support vector machine and artificial neural network classifiers.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114314984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sahar Hooshmand, Paniz Abedin, Daniel Gibney, S. Aluru, Sharma V. Thankachan
{"title":"Faster Computation of Genome Mappability","authors":"Sahar Hooshmand, Paniz Abedin, Daniel Gibney, S. Aluru, Sharma V. Thankachan","doi":"10.1145/3233547.3233645","DOIUrl":"https://doi.org/10.1145/3233547.3233645","url":null,"abstract":"beginthebibliography 1 bibitemalzamel2017faster M. Alzamel, P. Charalampopoulos, C. S. Iliopoulos, S. P. Pissis, J. Radoszewski, and W.-K. Sung. newblock Faster algorithms for 1-mappability of a sequence. newblock In em International Conference on Combinatorial Optimization and Applications, pages 109--121. Springer, 2017. bibitemderrien2012fast T. Derrien, J. Estellé, S. M. Sola, D. G. Knowles, E. Raineri, R. Guigó, and P. Ribeca. newblock Fast computation and applications of genome mappability. newblock em PloS one, 7(1):e30377, 2012. bibitemThankachanACA18 S. V. Thankachan, C. Aluru, S. P. Chockalingam, and S. Aluru. newblock Algorithmic framework for approximate matching under bounded edits with applications to sequence analysis. newblock In em Research in Computational Molecular Biology - 22nd Annual International Conference, RECOMB 2018, Paris, France, April 21-24, 2018, Proceedings, pages 211--224, 2018. endthebibliography","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115827384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Applying Stochastic Process Model to Imputation of Censored Longitudinal Data","authors":"I. Zhbannikov, K. Arbeev, A. Yashin","doi":"10.1145/3233547.3233591","DOIUrl":"https://doi.org/10.1145/3233547.3233591","url":null,"abstract":"Longitudinal data are widely used in medicine, demography, sociology and other areas. Incomplete observations in such data often confound the results of analysis. A plethora of data imputation methods have already been proposed to alleviate this problem. The Stochastic Process Model (SPM) represents a general framework for modeling joint evolution of repeatedly measured variables and time-to-event outcome typically observed in longitudinal studies of aging, health and longevity. It is perfectly suitable for imputing missing observations in censored longitudinal data. We applied SPM to the problem of imputation of censored missing longitudinal data. This model was applied both to the Framingham Heart Study and Cardiovascular Health Study data as well as to simulated datasets. We also present an R package stpm designed for this purpose.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123240456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Are Profile Hidden Markov Models Identifiable?","authors":"Srilakshmi Pattabiraman, T. Warnow","doi":"10.1145/3233547.3233563","DOIUrl":"https://doi.org/10.1145/3233547.3233563","url":null,"abstract":"Profile Hidden Markov Models (HMMs) are graphical models that can be used to produce finite length sequences from a distribution. In fact, although they were only introduced for bioinformatics 25 years ago (by Haussler et al., Hawaii International Conference on Systems Science 1993), they are arguably the most commonly used statistical model in bioinformatics, with multiple applications, including protein structure and function prediction, classifications of novel proteins into existing protein families and superfamilies, metagenomics, and multiple sequence alignment. The standard use of profile HMMs in bioinformatics has two steps: first a profile HMM is built for a collection of molecular sequences (which may not be in a multiple sequence alignment), and then the profile HMM is used in some subsequent analysis of new molecular sequences. The construction of the profile thus is itself a statistical estimation problem, since any given set of sequences might potentially fit more than one model well. Hence a basic question about profile HMMs is whether they are em statistically identifiable, which means that no two profile HMMs can produce the same distribution on finite length sequences. Indeed, statistical identifiability is a fundamental aspect of any statistical model, and yet it is not known whether profile HMMs are statistically identifiable. In this paper, we report on preliminary results towards characterizing the statistical identifiability of profile HMMs in one of the standard forms used in bioinformatics.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129722058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tanner D. Jensen, Kristi A. Bresciano, Emma Dallon, M. Fujimoto, Cole A. Lyman, Enoch Stewart, Joel S. Griffitts, M. Clement
{"title":"The PepSeq Pipeline: Software for Antimicrobial Motif Discovery in Randomly-Generated Peptide Libraries","authors":"Tanner D. Jensen, Kristi A. Bresciano, Emma Dallon, M. Fujimoto, Cole A. Lyman, Enoch Stewart, Joel S. Griffitts, M. Clement","doi":"10.1145/3233547.3233599","DOIUrl":"https://doi.org/10.1145/3233547.3233599","url":null,"abstract":"Bacteria with resistance genes are becoming ever more common, and new methods of discovering antibiotics are being developed. One of these new methods involves researchers creating random peptides and testing their antimicrobial activity. Developing antibiotics from these peptides requires understanding which sequence motifs will be toxic to bacteria. To determine if the toxic peptides of a randomly-generated peptide library can be uniquely classified based solely on sequence motifs, we created the PepSeq Pipeline: a new software that utilizes a Random Forest algorithm to extract motifs from a peptide library. We found that this pipeline can accurately classify 56% of the toxic peptides in the peptide library using motifs extracted from the model. Testing on simulated data with less noise, we could classify up to 94% of the toxic peptides. The pipeline extracted significant toxic motifs in every library that was tested, but its ability to classify all toxic peptides depended on the number of motifs in the library. Once extracted, these motifs can be used both to understand the biology behind why certain peptides are toxic and to create novel antibiotics. The code and data used in this analysis can be found at https://github.com/tjense25/pep-seq-pipeline.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129862950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher Ma, Tina Gui, Xin Dang, Yixin Chen, D. Wilkins
{"title":"Integration of Cancer Data through Multiple Mixed Graphical Model","authors":"Christopher Ma, Tina Gui, Xin Dang, Yixin Chen, D. Wilkins","doi":"10.1145/3233547.3233557","DOIUrl":"https://doi.org/10.1145/3233547.3233557","url":null,"abstract":"The state of the art in bio-medical technologies has produced many genomic, epigenetic, transcriptomic, and proteomic data of varied types across different biological conditions. Historically, it has always been a challenge to produce new ways to integrate data of different types. Here, we leverage the node-conditional uni-variate exponential family distribution to capture the dependencies and interactions between different data types. The graph underlying our mixed graphical model contains both un-directed and directed edges. In addition, it is widely believed that incorporating data across different experimental conditions can lead us to a more holistic view of the biological system and help to unravel the regulatory mechanism behind complex diseases. We then integrate the data across related biological conditions through multiple graphical models. The performance of our approach is demonstrated through simulations and its application to cancer genomics.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"47 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120918788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cohesion-driven Online Actor-Critic Reinforcement Learning for mHealth Intervention","authors":"Feiyun Zhu, Peng Liao, Xinliang Zhu, Jiawen Yao, Junzhou Huang","doi":"10.1145/3233547.3233553","DOIUrl":"https://doi.org/10.1145/3233547.3233553","url":null,"abstract":"In the wake of the vast population of smart device users worldwide, mobile health (mHealth) technologies are hopeful to generate positive and wide influence on people's health. They are able to provide flexible, affordable and portable health guides to devise users. Current online decision-making methods for mHealth assume that the users are completely heterogeneous. They share no information among users and learn a separate policy for each user. However, data for each user is very limited in size to support the separate online learning, leading to unstable policies that contain lots of variances. Besides, we find the truth that a user may be similar with some, but not all, users, and connected users tend to have similar behaviors. In this paper, we propose a network cohesion constrained (actor-critic) Reinforcement Learning (RL) method for mHealth. The goal is to explore how to share information among similar users to better convert the limited user information into sharper learned policies. To the best of our knowledge, this is the first online actor-critic RL for mHealth and first network cohesion constrained (actor-critic) RL method in all applications. The network cohesion is important to derive effective policies. We come up with a novel method to learn the network by using the warm start trajectory, which directly reflects the users' property. The optimization of our model is difficult and very different from the general supervised learning due to the indirect observation of values. As a contribution, we propose two algorithms for the proposed online RLs. Apart from mHealth, the proposed methods can be easily applied or adapted to other health-related tasks. Extensive experiment results on the HeartSteps dataset demonstrates that in a variety of parameter settings, the proposed two methods obtain obvious improvements over the state-of-the-art methods.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116709108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Highlights Talks at ACM BCB 2018","authors":"L. Cowen, Xiaoqian Jiang","doi":"10.1145/3233547.3233715","DOIUrl":"https://doi.org/10.1145/3233547.3233715","url":null,"abstract":"It is our great pleasure to have eleven highlights talks in the program of the 2018 ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB'18. These invited talks are based on articles that have been published in the last 12 months and represent some of the most interesting and exciting works in our field.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121291882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}