Tiffany J Callahan, Ignacio J Tripodi, Harrison Pielke-Lombardo, Lawrence E Hunter
{"title":"Knowledge-Based Biomedical Data Science.","authors":"Tiffany J Callahan, Ignacio J Tripodi, Harrison Pielke-Lombardo, Lawrence E Hunter","doi":"10.1146/annurev-biodatasci-010820-091627","DOIUrl":"10.1146/annurev-biodatasci-010820-091627","url":null,"abstract":"<p><p>Knowledge-based biomedical data science involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey recent progress in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as progress on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing to construct knowledge graphs, and the expansion of novel knowledge-based approaches to clinical and biological domains.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8095730/pdf/nihms-1654586.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38954383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deciphering Cell Fate Decision by Integrated Single-Cell Sequencing Analysis.","authors":"Sagar, Dominic Grün","doi":"10.1146/annurev-biodatasci-111419-091750","DOIUrl":"10.1146/annurev-biodatasci-111419-091750","url":null,"abstract":"<p><p>Cellular differentiation is a common underlying feature of all multicellular organisms through which naïve cells progressively become fate restricted and develop into mature cells with specialized functions. A comprehensive understanding of the regulatory mechanisms of cell fate choices during de- velopment, regeneration, homeostasis, and disease is a central goal of mod- ern biology. Ongoing rapid advances in single-cell biology are enabling the exploration of cell fate specification at unprecedented resolution. Here, we review single-cell RNA sequencing and sequencing of other modalities as methods to elucidate the molecular underpinnings of lineage specification. We specifically discuss how the computational tools available to reconstruct lineage trajectories, quantify cell fate bias, and perform dimensionality re- duction for data visualization are providing new mechanistic insights into the process of cell fate decision. Studying cellular differentiation using single- cell genomic tools is paving the way for a detailed understanding of cellular behavior in health and disease.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7115822/pdf/EMS86588.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38260582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Immunoinformatics: Predicting Peptide-MHC Binding.","authors":"Morten Nielsen, Massimo Andreatta, Bjoern Peters, Søren Buus","doi":"10.1146/annurev-biodatasci-021920-100259","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-021920-100259","url":null,"abstract":"<p><p>Immunoinformatics is a discipline that applies methods of computer science to study and model the immune system. A fundamental question addressed by immunoinformatics is how to understand the rules of antigen presentation by MHC molecules to T cells, a process that is central to adaptive immune responses to infections and cancer. In the modern era of personalized medicine, the ability to model and predict which antigens can be presented by MHC is key to manipulating the immune system and designing strategies for therapeutic intervention. Since the MHC is both polygenic and extremely polymorphic, each individual possesses a personalized set of MHC molecules with different peptide-binding specificities, and collectively they present a unique individualized peptide imprint of the ongoing protein metabolism. Mapping all MHC allotypes is an enormous undertaking that cannot be achieved without a strong bioinformatics component. Computational tools for the prediction of peptide-MHC binding have thus become essential in most pipelines for T cell epitope discovery and an inescapable component of vaccine and cancer research. Here, we describe the development of several such tools, from pioneering efforts to the current state-of-the-art methods, that have allowed for accurate predictions of peptide binding of all MHC molecules, even including those that have not yet been characterized experimentally.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/annurev-biodatasci-021920-100259","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9809194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating Imaging and Omics: Computational Methods and Challenges","authors":"J. Hériché, S. Alexander, J. Ellenberg","doi":"10.1146/ANNUREV-BIODATASCI-080917-013328","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-080917-013328","url":null,"abstract":"Fluorescence microscopy imaging has long been complementary to DNA sequencing- and mass spectrometry–based omics in biomedical research, but these approaches are now converging. On the one hand, omics methods are moving from in vitro methods that average across large cell populations to in situ molecular characterization tools with single-cell sensitivity. On the other hand, fluorescence microscopy imaging has moved from a morphological description of tissues and cells to quantitative molecular profiling with single-molecule resolution. Recent technological developments underpinned by computational methods have started to blur the lines between imaging and omics and have made their direct correlation and seamless integration an exciting possibility. As this trend continues rapidly, it will allow us to create comprehensive molecular profiles of living systems with spatial and temporal context and subcellular resolution. Key to achieving this ambitious goal will be novel computational methods and successfully dealing with the challenges of data integration and sharing as well as cloud-enabled big data analysis.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-080917-013328","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43681862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Biomolecular Data Resources: Bioinformatics Infrastructure for Biomedical Data Science","authors":"J. Vamathevan, R. Apweiler, E. Birney","doi":"10.1146/ANNUREV-BIODATASCI-072018-021321","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021321","url":null,"abstract":"Technological advances have continuously driven the generation of bio-molecular data and the development of bioinformatics infrastructure, which enables data reuse for scientific discovery. Several types of data management resources have arisen, such as data deposition databases, added-value databases or knowledgebases, and biology-driven portals. In this review, we provide a unique overview of the gradual evolution of these resources and discuss the goals and features that must be considered in their development. With the increasing application of genomics in the health care context and with 60 to 500 million whole genomes estimated to be sequenced by 2022, biomedical research infrastructure is transforming, too. Systems for federated access, portable tools, provision of reference data, and interpretation tools will enable researchers to derive maximal benefits from these data. Collaboration, coordination, and sustainability of data resources are key to ensure that biomedical knowledge management can scale with technology shifts and growing data volumes.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45228710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Keenan, Megan L. Wojciechowicz, Zichen Wang, Kathleen M. Jagodnik, S. L. Jenkins, Alexander Lachmann, Avi Ma’ayan
{"title":"Connectivity Mapping: Methods and Applications","authors":"A. Keenan, Megan L. Wojciechowicz, Zichen Wang, Kathleen M. Jagodnik, S. L. Jenkins, Alexander Lachmann, Avi Ma’ayan","doi":"10.1146/ANNUREV-BIODATASCI-072018-021211","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021211","url":null,"abstract":"Connectivity mapping resources consist of signatures representing changes in cellular state following systematic small-molecule, disease, gene, or other form of perturbations. Such resources enable the characterization of signatures from novel perturbations based on similarity; provide a global view of the space of many themed perturbations; and allow the ability to predict cellular, tissue, and organismal phenotypes for perturbagens. A signature search engine enables hypothesis generation by finding connections between query signatures and the database of signatures. This framework has been used to identify connections between small molecules and their targets, to discover cell-specific responses to perturbations and ways to reverse disease expression states with small molecules, and to predict small-molecule mimickers for existing drugs. This review provides a historical perspective and the current state of connectivity mapping resources with a focus on both methodology and community implementations.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021211","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49485099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Deng, Timothy P. Daley, G. Brandine, Andrew D. Smith
{"title":"Molecular Heterogeneity in Large-Scale Biological Data: Techniques and Applications","authors":"C. Deng, Timothy P. Daley, G. Brandine, Andrew D. Smith","doi":"10.1146/ANNUREV-BIODATASCI-072018-021339","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021339","url":null,"abstract":"High-throughput sequencing technologies have evolved at a stellar pace for almost a decade and have greatly advanced our understanding of genome biology. In these sampling-based technologies, there is an important detail that is often overlooked in the analysis of the data and the design of the experiments, specifically that the sampled observations often do not give a representative picture of the underlying population. This has long been recognized as a problem in statistical ecology and in the broader statistics literature. In this review, we discuss the connections between these fields, methodological advances that parallel both the needs and opportunities of large-scale data analysis, and specific applications in modern biology. In the process we describe unique aspects of applying these approaches to sequencing technologies, including sequencing error, population and individual heterogeneity, and the design of experiments.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021339","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44142841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discovering Pathway and Cell Type Signatures in Transcriptomic Compendia with Machine Learning","authors":"G. Way, C. Greene","doi":"10.1146/ANNUREV-BIODATASCI-072018-021348","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021348","url":null,"abstract":"Pathway and cell type signatures are patterns present in transcriptome data that are associated with biological processes or phenotypic consequences. These signatures result from specific cell type and pathway expression but can require large transcriptomic compendia to detect. Machine learning techniques can be powerful tools for signature discovery through their ability to provide accurate and interpretable results. In this review, we discuss various machine learning applications to extract pathway and cell type signatures from transcriptomic compendia. We focus on the biological motivations and interpretation for both supervised and unsupervised learning approaches in this setting. We consider recent advances, including deep learning, and their applications to expanding bulk and single-cell RNA data. As data and computational resources increase, there will be more opportunities for machine learning to aid in revealing biological signatures.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021348","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46673466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Imaging, Visualization, and Computation in Developmental Biology","authors":"F. Cutrale, S. Fraser, Le A. Trinh","doi":"10.1146/ANNUREV-BIODATASCI-072018-021305","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021305","url":null,"abstract":"Embryonic development is highly complex and dynamic, requiring the coordination of numerous molecular and cellular events at precise times and places. Advances in imaging technology have made it possible to follow developmental processes at cellular, tissue, and organ levels over time as they take place in the intact embryo. Parallel innovations of in vivo probes permit imaging to report on molecular, physiological, and anatomical events of embryogenesis, but the resulting multidimensional data sets pose significant challenges for extracting knowledge. In this review, we discuss recent and emerging advances in imaging technologies, in vivo labeling, and data processing that offer the greatest potential for jointly deciphering the intricate cellular dynamics and the underlying molecular mechanisms. Our discussion of the emerging area of “image-omics” highlights both the challenges of data analysis and the promise of more fully embracing computation and data science for rapidly advancing our understanding of biology.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2019-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021305","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47191858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Hernaez, Dmitri S. Pavlichin, T. Weissman, Idoia Ochoa
{"title":"Genomic Data Compression","authors":"M. Hernaez, Dmitri S. Pavlichin, T. Weissman, Idoia Ochoa","doi":"10.1146/ANNUREV-BIODATASCI-072018-021229","DOIUrl":"https://doi.org/10.1146/ANNUREV-BIODATASCI-072018-021229","url":null,"abstract":"Recently, there has been growing interest in genome sequencing, driven by advances in sequencing technology, in terms of both efficiency and affordability. These developments have allowed many to envision whole-genome sequencing as an invaluable tool for both personalized medical care and public health. As a result, increasingly large and ubiquitous genomic data sets are being generated. This poses a significant challenge for the storage and transmission of these data. Already, it is more expensive to store genomic data for a decade than it is to obtain the data in the first place. This situation calls for efficient representations of genomic information. In this review, we emphasize the need for designing specialized compressors tailored to genomic data and describe the main solutions already proposed. We also give general guidelines for storing these data and conclude with our thoughts on the future of genomic formats and compressors.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":6.0,"publicationDate":"2019-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1146/ANNUREV-BIODATASCI-072018-021229","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46626764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}