Erdem Turk, Turkan Arit, Delikanli Mertcan Susus, Ilayda Ucar, Baris E. Suzek
{"title":"ProSetComp","authors":"Erdem Turk, Turkan Arit, Delikanli Mertcan Susus, Ilayda Ucar, Baris E. Suzek","doi":"10.1145/3233547.3233628","DOIUrl":"https://doi.org/10.1145/3233547.3233628","url":null,"abstract":"The amount of data available in public bioinformatics resources and the complexity of user interfaces they are served through often challenges appreciation and effective utilization of these valuable resources. While education, documentation and training activities mitigate this problem, there is still a need to develop user interfaces to serve simple day-to-day needs of scientists. To this end, we developed ProSetComp; a simple web-based platform to create and compare protein sets, following a traditional software development process; from requirement analysis to implementation. First, we interviewed and collected user scenarios from wet lab scientists with seniority, research interests and backgrounds. Reviewing the user scenarios, we identified one high impact need that drove the development of ProSetComp; ability to 1) create protein sets by searching databases, 2) compare these protein sets in different dimensions such as functional domains, pathways, molecular functions and biological processes, and 3) visualize results graphically. Next, we collected and integrated necessary data from several bioinformatics resources including UniProt, Reactome, Gene Ontology and PFAM in a local relational database. Finally, we designed user interfaces that facilitate the creation of protein sets by using form-based query generators and exploring the relationship between created protein sets using tabular and graphical representations. The current internal release of the platform contains ~120 million protein entries. The user interface supports >50 search criteria to create up-to four protein sets and comparison of these sets in four dimensions; protein domains, molecular functions, biological processes, and pathways. The commonality and differences between protein sets, along with tables, can be explored using novel user interface components such as Venn and UpSet diagrams. The first public release of ProSetComp (http://ceng.mu.edu.tr/labs/bioinfo/prosetcomp) is targeted for mid-August, 2018 and planned to be updated monthly thereafter. Upon public release, source code ProSetComp will become available through GitHub. The database content and user interface will be expanded as per community needs. The ProSetComp project is supported by The Scientific and Technological Research Council of Turkey (TUBITAK, Grant number: 216Z111).","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114875476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Du, Kristin A Dickinson, Calvin A. Johnson, L. Saligan
{"title":"Identifying Genes to Predict Cancer Radiotherapy-Related Fatigue with Machine-Learning Methods","authors":"Wei Du, Kristin A Dickinson, Calvin A. Johnson, L. Saligan","doi":"10.1145/3233547.3233636","DOIUrl":"https://doi.org/10.1145/3233547.3233636","url":null,"abstract":"While many factors influence the fatigue experienced by patients undergoing radiation therapy (RT), we hypothesize that expression of genes related to oxidative stress can be predictive of RT-related fatigue. In this work, we present a two-phase scheme which first selects a limited subset of genes deemed most predictive by a regularized elastic net, followed by a widely used classifier, the regularized random forest, to discriminate patients having high fatigue from low fatigue during RT. The model predicted 80% accuracy (0.80 AUC) in cross-validation. Initial results suggest that several genes are consistently selected in the proposed scheme, such as PRDX5, FHL2 and GPX4, showing promise as potential predictors for RT-related fatigue, and may provide information of its biologic underpinnings.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121556073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning for Resolution Validation of Three Dimensional Cryo-Electron Microscopy Density Maps","authors":"Todor K. Avramov, Dong Si","doi":"10.1145/3233547.3233712","DOIUrl":"https://doi.org/10.1145/3233547.3233712","url":null,"abstract":"Cryo-electron microscopy (cryo-EM) is becoming the imaging method of choice for determining protein structures. Many atomic structures have been resolved based on an exponentially growing number of published three-dimensional (3D) high resolution cryo-EM density maps. The resolution value claimed for the reconstructed 3D density map has been the topic of scientific debate for many years. The Fourier Shell Correlation (FSC) is the currently accepted cryo-EM resolution measure, but it can be subjective and has its own limitations. The FSC indicates the quality of the experimental maps but no the amount of geometric and volumetric feature details present in the 3D map. In this study, we propose supervised deep learning methods to extract representative 3D features at high, medium and low resolutions from simulated protein density maps and build classification models that objectively validate resolutions of experimental 3D cryo-EM maps. Specifically, we build classification models based on dense artificial neural network (DNN) and 3D convolutional neural network (3D CNN) architectures. The trained models can classify a given 3D cryo-EM density map into one of three resolution levels: high, medium, low. The DNN model achieved 92.73% accuracy and the 3D CNN model achieved 99.75% accuracy on simulated test maps. Applying the DNN and 3D CNN models to thirty experimental cryo-EM maps achieved an agreement of 60.0% and 56.7%, respectively, with the author published resolution value of the density maps. The results suggest that deep learning can be utilized to potentially improve the resolution validation process of experimental cryo-EM maps.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133218980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of Molecular Structure Data]{Systematic Study of Different Design Decisions in Markov Model-based Analysis of Molecular Structure Data: Extended Abstract","authors":"S. Roychoudhury, Amarda Shehu","doi":"10.1145/3233547.3233618","DOIUrl":"https://doi.org/10.1145/3233547.3233618","url":null,"abstract":"Modeling and simulation software now provide us with a view of the structure space navigated by peptides and proteins under physiological conditions. Such software, such as Molecular Dynamics, yields trajectories of consecutive structures accessed by a dynamic molecule, but does not readily expose the underlying organization in the structure state so as to summarize the equilibrium dynamics over the present structural states. In this paper we investigate Markov State Models on their ability to do so. While we make use of an established software to do so, we analyze within it different design decisions and measure their impact on the obtained results. We present our findings on optimal design decisions, revealing in the process the dynamics of the Met-enkephaline peptide.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134144427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TCGA Lung Cancer Analysis Pipeline","authors":"T. Zengin, Tugba Onal Suzek","doi":"10.1145/3233547.3233615","DOIUrl":"https://doi.org/10.1145/3233547.3233615","url":null,"abstract":"Cancer cells contain thousands of mutated genes, differential copy numbers and differential expressions of genes. The progression of cancer differs from patient to patient. Identification of key proteins and pathways of individual patient's molecular profile has become important for personalized medicine. At the first step of our proposed pipeline, gene mutations, gene expression profile, copy number variations and clinical data of lung cancer patients (LUAD) are downloaded from TCGA. Significant genomic variations are determined by using R MADGIC and GAIA packages. Using R Deseq2 package, most active differentially expressed genes are determined for the patients (number of patients=55) for whom the adjacent normal tissue RNA-seq expression levels are available. Most active pathways are determined by Cytoscape jactivemodules program based on expression levels. For significant genomic variations and gene expression levels, MDS plot and Kaplan-Meier survival analysis of the patients is performed. The most mutated genes in 565 LUAD samples were identified by TCGA-Biolinks package. We found that TP53, a known tumor suppressor gene, has a mutation in 48% of the patients. Survival analysis for the 55 LUAD patients clustered using K-means clustering (k=2) was performed. Results show that survival probability of two clusters doesn't vary significantly. The goals of this study are to 1) computationally identify the most significant genes whose mutation and expression profile correlate with the patient survival time 2) verify the significance of results against the results of an earlier study conducted on TCGA LUAD dataset [1] and 3) provide an open-source automated pipeline.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131876407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuan Gong, Hasini Yatawatte, C. Poellabauer, Sandra L. Schneider, Susan Latham
{"title":"Automatic Autism Spectrum Disorder Detection Using Everyday Vocalizations Captured by Smart Devices","authors":"Yuan Gong, Hasini Yatawatte, C. Poellabauer, Sandra L. Schneider, Susan Latham","doi":"10.1145/3233547.3233574","DOIUrl":"https://doi.org/10.1145/3233547.3233574","url":null,"abstract":"Autism Spectrum Disorder (ASD) is a pervasive and lifelong neuro-developmental disability where early treatment has been shown to improve a person's symptoms and ability to function. One of the most significant obstacles to effective treatment of ASD is the challenge of early detection, but unfortunately, due to the limited availability of screening and diagnostic instruments in some regions, many affected children remain undiagnosed or are diagnosed late. Recent studies have shown that characteristics in vocalizations could be used to build new ASD screening tools, but most prior efforts are based on recordings made in controlled settings and processed manually, affecting the practical value of such solutions. On the other hand, we are increasingly surrounded by smart devices that can capture an individual's vocalizations, including devices specifically targeted at child populations (e.g., Amazon Echo Kids Edition). In this paper, we propose a practical and fully automatic ASD screening solution that can be implemented on such devices, which captures and analyzes a child's everyday vocalizations at home, without the need for professional help. A 17-month experiment on 35 children is used to verify the effectiveness of the proposed approach, showing that we can obtain an unweighted F1-score of 0.87 for the classification of typically developing and ASD children.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130026405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Huge Cohorts, Genomics, and Clinical Data to Personalize Medicine","authors":"J. Denny","doi":"10.1145/3233547.3233608","DOIUrl":"https://doi.org/10.1145/3233547.3233608","url":null,"abstract":"Precision medicine offers the promise of improved diagnosis and for more effective, patient-specific therapies. Typically, such studies have been pursued using research cohorts. At Vanderbilt, we have linked de-identified electronic health records (EHRs), to a DNA repository, called BioVU, which has nearly 250,000 samples. Through BioVU and a NHGRI-funded network using EHRs for discovery, the Electronic Medical Records and Genomics (eMERGE) network, we have used clinical data of genomic basis of disease and drug response using real-world clinical data. The EHR also enables the inverse experiment - starting with a genotype and discovering all the phenotypes with which it is associated - a phenome-wide association study. By looking for clusters of diseases and symptoms through phenotype risk scores, we find unrecognized genetic variants associated with common disease. The era of huge international cohorts such as the UK Biobank, Million Veteran Program, and the newly started All of Us Research Program will make millions of individuals available with dense molecular and phenotypic data. All of Us launched May 6, 2018 and will engage one million diverse individuals across the US who will contribute data and also receive results back.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132190936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ACM-BCB 2018 Computational Advances in Molecular Epidemiology (CAME) Chairs' Welcome","authors":"Y. Khudyakov, I. Măndoiu, P. Skums, A. Zelikovsky","doi":"10.1145/3233547.3233670","DOIUrl":"https://doi.org/10.1145/3233547.3233670","url":null,"abstract":"","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127075944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genet-CNV: Boolean Implication Networks for Modelling Genome-Wide Co-occurrence of DNA Copy Number Variations","authors":"Salvi Singh, N. Guo","doi":"10.1145/3233547.3233652","DOIUrl":"https://doi.org/10.1145/3233547.3233652","url":null,"abstract":"Boolean implication networks (Genet) have been utilized to model gene co-expression networks in our previous research. In this study, they are constructed to model the co-occurrence of amplification/deletion events in DNA copy number variations (CNVs) at a genome-wide scale. The Boolean implication scheme extends the dichotomous nature of the variable under scrutiny such that it can have numerous discrete values corresponding to DNA CNVs, and pairwise co-occurrence of CNVs is computed. The implication network was implemented in a software package (Genet-CNV) and run on 271 patient samples afflicted with non-small cell lung cancer (NSCLC )[GSE31800].","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115991493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling Macromolecular Structures and Motions: Computational Methods for Sampling andAnalysis of Energy Landscapes","authors":"Kevin Molloy, N. Akhter, Amarda Shehu","doi":"10.1145/3233547.3233662","DOIUrl":"https://doi.org/10.1145/3233547.3233662","url":null,"abstract":"With biomolecular structure recognized as central to understanding mechanisms in the cell, dry laboratories have spent significant efforts on modeling and analyzing structure and dynamics. While significant advances have been made, particularly in the design of sophisticated energetic models and molecular representations, such efforts are experiencing diminishing returns. One of the culprits is the low exploration capability of Molecular Dynamics- and Monte Carlo-based exploration algorithms. The impasse has attracted AI researchers bringing complementary tools, such as randomized search and stochastic optimization. The tutorial introduces students and researchers to stochastic optimization treatments and methodologies for understanding and elucidating the role of biomolecular structure and dynamics in function. In addition, the tutorial allows attendees to connect between structures, motions, and function via analysis tools that take an energy landscape view of the relationship between biomolecular structure, dynamics, and function. The presentation is enhanced via open-source software that permit hands-on exercises, which benefits both students and senior researchers keen to make their own contributions.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"37 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116788869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}