{"title":"Large-Scale Analysis of Genetic and Clinical Patient Data","authors":"M. Ritchie","doi":"10.1146/ANNUREV-BIODATASCI-080917-013508","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013508","url":null,"abstract":"Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013508","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46186041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture","authors":"Xi Chen, S. Teichmann, K. Meyer","doi":"10.1146/ANNUREV-BIODATASCI-080917-013452","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013452","url":null,"abstract":"With the recent transformative developments in single-cell genomics and, in particular, single-cell gene expression analysis, it is now possible to study tissues at the single-cell level, rather than having to rely on data from bulk measurements. Here we review the rapid developments in single-cell RNA sequencing (scRNA-seq) protocols that have the potential for unbiased identification and profiling of all cell types within a tissue or organism. In addition, novel approaches for spatial profiling of gene expression allow us to map individual cells and cell types back into the three-dimensional context of organs. The combination of in-depth single-cell and spatial gene expression data will reveal tissue architecture in unprecedented detail, generating a wealth of biological knowledge and a better understanding of many diseases.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013452","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48410668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning in Biomedical Data Science","authors":"P. Baldi","doi":"10.1146/ANNUREV-BIODATASCI-080917-013343","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013343","url":null,"abstract":"Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013343","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42925605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patrick D. McGillivray, Declan Clarke, W. Meyerson, Jing Zhang, Donghoon Lee, Mengting Gu, Sushant Kumar, Holly Zhou, M. Gerstein
{"title":"Network Analysis as a Grand Unifier in Biomedical Data Science","authors":"Patrick D. McGillivray, Declan Clarke, W. Meyerson, Jing Zhang, Donghoon Lee, Mengting Gu, Sushant Kumar, Holly Zhou, M. Gerstein","doi":"10.1146/ANNUREV-BIODATASCI-080917-013444","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013444","url":null,"abstract":"Biomedical data scientists study many types of networks, ranging from those formed by neurons to those created by molecular interactions. People often criticize these networks as uninterpretable diagrams termed hairballs; however, here we show that molecular biological networks can be interpreted in several straightforward ways. First, we can break down a network into smaller components, focusing on individual pathways and modules. Second, we can compute global statistics describing the network as a whole. Third, we can compare networks. These comparisons can be within the same context (e.g., between two gene regulatory networks) or cross-disciplinary (e.g., between regulatory networks and governmental hierarchies). The latter comparisons can transfer a formalism, such as that for Markov chains, from one context to another or relate our intuitions in a familiar setting (e.g., social networks) to the relatively unfamiliar molecular context. Finally, key aspects of molecular networks are dynamics and evolution, i.e., how they evolve over time and how genetic variants affect them. By studying the relationships between variants in networks, we can begin to interpret many common diseases, such as cancer and heart disease.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013444","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49037025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Haendel, J. McMurry, R. Relevo, C. Mungall, P. Robinson, C. Chute
{"title":"A Census of Disease Ontologies","authors":"M. Haendel, J. McMurry, R. Relevo, C. Mungall, P. Robinson, C. Chute","doi":"10.1146/ANNUREV-BIODATASCI-080917-013459","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013459","url":null,"abstract":"For centuries, humans have sought to classify diseases based on phenotypic presentation and available treatments. Today, a wide landscape of strategies, resources, and tools exist to classify patients and diseases. Ontologies can provide a robust foundation of logic for precise stratification and classification along diverse axes such as etiology, development, treatment, and genetics. Disease and phenotype ontologies are used in four primary ways: ( a) search, retrieval, and annotation of knowledge; ( b) data integration and analysis; ( c) clinical decision support; and ( d) knowledge discovery. Computational inference can connect existing knowledge and generate new insights and hypotheses about drug targets, prognosis prediction, or diagnosis. In this review, we examine the rise of disease and phenotype ontologies and the diverse ways they are represented and applied in biomedicine.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013459","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49330122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anob M Chakrabarti, Nejc Haberman, Arne Praznik, Nicholas M Luscombe, Jernej Ule
{"title":"Data Science Issues in Studying Protein-RNA Interactions with CLIP Technologies.","authors":"Anob M Chakrabarti, Nejc Haberman, Arne Praznik, Nicholas M Luscombe, Jernej Ule","doi":"10.1146/annurev-biodatasci-080917-013525","DOIUrl":"10.1146/annurev-biodatasci-080917-013525","url":null,"abstract":"<p><p>An interplay of experimental and computational methods is required to achieve a comprehensive understanding of protein-RNA interactions. UV crosslinking and immunoprecipitation (CLIP) identifies endogenous interactions by sequencing RNA fragments that copurify with a selected RNA-binding protein under stringent conditions. Here we focus on approaches for the analysis of the resulting data and appraise the methods for peak calling, visualization, analysis, and computational modeling of protein-RNA binding sites. We advocate that the sensitivity and specificity of data be assessed in combination for computational quality control. Moreover, we demonstrate the value of analyzing sequence motif enrichment in peaks assigned from CLIP data and of visualizing RNA maps, which examine the positional distribution of peaks around regulated landmarks in transcripts. We use these to assess how variations in CLIP data quality and in different peak calling methods affect the insights into regulatory mechanisms. We conclude by discussing future opportunities for the computational analysis of protein-RNA interaction experiments.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"1 1","pages":"235-261"},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7614488/pdf/EMS174063.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9404672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Big Data Approaches for Modeling Response and Resistance to Cancer Drugs.","authors":"Peng Jiang, W. Sellers, X. S. Liu","doi":"10.1146/ANNUREV-BIODATASCI-080917-013350","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013350","url":null,"abstract":"Despite significant progress in cancer research, current standard-of-care drugs fail to cure many types of cancers. Hence, there is an urgent need to identify better predictive biomarkers and treatment regimes. Conventionally, insights from hypothesis-driven studies are the primary force for cancer biology and therapeutic discoveries. Recently, the rapid growth of big data resources, catalyzed by breakthroughs in high-throughput technologies, has resulted in a paradigm shift in cancer therapeutic research. The combination of computational methods and genomics data has led to several successful clinical applications. In this review, we focus on recent advances in data-driven methods to model anticancer drug efficacy, and we present the challenges and opportunities for data science in cancer therapeutic research.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"1 1","pages":"1-27"},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013350","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46709281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What is Biomedical Data Science and Do We Need an Annual Review of It?","authors":"R. Altman, M. Levitt","doi":"10.1146/ANNUREV-BD-01-041718-100001","DOIUrl":"https://doi.org/10.1146/ANNUREV-BD-01-041718-100001","url":null,"abstract":"We are pleased to bring you the first volume of the Annual Review of Biomedical Data Science. It spans a range of biological and medical research challenges that are data intensive and focused on the creation of novel methodologies to advance biomedical science discovery. The term “data science” describes expertise associated with taking (usually large) data sets and annotating, cleaning, organizing, storing, and analyzing them for the purposes of extracting knowledge. It merges the disciplines of statistics, computer science, and computational engineering. Many are irritated by the term—all of science depends ultimately on data, and many of the activities listed above sound like engineering (which is about solving problems) and not science (which is about discovery of new knowledge). If “data science” is not about science and the adjective “data” has no particular meaning, why does this term exist? Indeed, the allied fields of informatics have existed for several decades in many forms—medical informatics, clinical informatics, health informatics, bioinformatics, and biomedical informatics—and variants all refer to the development of methods to analyze data, information, and knowledge within the space of biology and medicine. Practitioners of these fields are quick to point out that most if not all of data science falls within the purview of informatics. Informatics is a broad field that includes the social aspects of interacting with data, information, and knowledge; the challenges of human–computer interfaces; and the issues associated with introducing disruptive new computational interventions into systems (like hospitals and laboratories) with existing workflows. So why is the introduction of a new name for the field necessary? The term “data science” has gained recognition, and the widespread comfort with it suggests it serves a useful purpose. Here we offer some observations on the diverse use of the moniker for many activities:","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2018-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BD-01-041718-100001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41873602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Ren, Xin Bai, Yang Young Lu, Kujin Tang, Ying Wang, Gesine Reinert, Fengzhu Sun
{"title":"Alignment-Free Sequence Analysis and Applications.","authors":"Jie Ren, Xin Bai, Yang Young Lu, Kujin Tang, Ying Wang, Gesine Reinert, Fengzhu Sun","doi":"10.1146/annurev-biodatasci-080917-013431","DOIUrl":"10.1146/annurev-biodatasci-080917-013431","url":null,"abstract":"<p><p>Genome and metagenome comparisons based on large amounts of next generation sequencing (NGS) data pose significant challenges for alignment-based approaches due to the huge data size and the relatively short length of the reads. Alignment-free approaches based on the counts of word patterns in NGS data do not depend on the complete genome and are generally computationally efficient. Thus, they contribute significantly to genome and metagenome comparison. Recently, novel statistical approaches have been developed for the comparison of both long and shotgun sequences. These approaches have been applied to many problems including the comparison of gene regulatory regions, genome sequences, metagenomes, binning contigs in metagenomic data, identification of virus-host interactions, and detection of horizontal gene transfers. We provide an updated review of these applications and other related developments of word-count based approaches for alignment-free sequence analysis.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"1 ","pages":"93-114"},"PeriodicalIF":7.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6905628/pdf/nihms-1016592.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37450115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan M Banda, Martin Seneviratne, Tina Hernandez-Boussard, Nigam H Shah
{"title":"Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models.","authors":"Juan M Banda, Martin Seneviratne, Tina Hernandez-Boussard, Nigam H Shah","doi":"10.1146/annurev-biodatasci-080917-013315","DOIUrl":"10.1146/annurev-biodatasci-080917-013315","url":null,"abstract":"<p><p>With the widespread adoption of electronic health records (EHRs), large repositories of structured and unstructured patient data are becoming available to conduct observational studies. Finding patients with specific conditions or outcomes, known as phenotyping, is one of the most fundamental research problems encountered when using these new EHR data. Phenotyping forms the basis of translational research, comparative effectiveness studies, clinical decision support, and population health analyses using routinely collected EHR data. We review the evolution of electronic phenotyping, from the early rule-based methods to the cutting edge of supervised and unsupervised machine learning models. We aim to cover the most influential papers in commensurate detail, with a focus on both methodology and implementation. Finally, future research directions are explored.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"1 ","pages":"53-68"},"PeriodicalIF":6.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/annurev-biodatasci-080917-013315","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37072036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}