Ruowang Li, Joseph D Romano, Yong Chen, Jason H Moore
{"title":"Centralized and Federated Models for the Analysis of Clinical Data.","authors":"Ruowang Li, Joseph D Romano, Yong Chen, Jason H Moore","doi":"10.1146/annurev-biodatasci-122220-115746","DOIUrl":"10.1146/annurev-biodatasci-122220-115746","url":null,"abstract":"<p><p>The progress of precision medicine research hinges on the gathering and analysis of extensive and diverse clinical datasets. With the continued expansion of modalities, scales, and sources of clinical datasets, it becomes imperative to devise methods for aggregating information from these varied sources to achieve a comprehensive understanding of diseases. In this review, we describe two important approaches for the analysis of diverse clinical datasets, namely the centralized model and federated model. We compare and contrast the strengths and weaknesses inherent in each model and present recent progress in methodologies and their associated challenges. Finally, we present an outlook on the opportunities that both models hold for the future analysis of clinical data.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140899793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Evolutionary Interplay of Somatic and Germline Mutation Rates.","authors":"Annabel C Beichman, Luke Zhu, Kelley Harris","doi":"10.1146/annurev-biodatasci-102523-104225","DOIUrl":"10.1146/annurev-biodatasci-102523-104225","url":null,"abstract":"<p><p>Novel sequencing technologies are making it increasingly possible to measure the mutation rates of somatic cell lineages. Accurate germline mutation rate measurement technologies have also been available for a decade, making it possible to assess how this fundamental evolutionary parameter varies across the tree of life. Here, we review some classical theories about germline and somatic mutation rate evolution that were formulated using principles of population genetics and the biology of aging and cancer. We find that somatic mutation rate measurements, while still limited in phylogenetic diversity, seem consistent with the theory that selection to preserve the soma is proportional to life span. However, germline and somatic theories make conflicting predictions regarding which species should have the most accurate DNA repair. Resolving this conflict will require carefully measuring how mutation rates scale with time and cell division and achieving a better understanding of mutation rate pleiotropy among cell types.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140872288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anthony Cesnik, Leah V Schaffer, Ishan Gaur, Mayank Jain, Trey Ideker, Emma Lundberg
{"title":"Mapping the Multiscale Proteomic Organization of Cellular and Disease Phenotypes.","authors":"Anthony Cesnik, Leah V Schaffer, Ishan Gaur, Mayank Jain, Trey Ideker, Emma Lundberg","doi":"10.1146/annurev-biodatasci-102423-113534","DOIUrl":"10.1146/annurev-biodatasci-102423-113534","url":null,"abstract":"<p><p>While the primary sequences of human proteins have been cataloged for over a decade, determining how these are organized into a dynamic collection of multiprotein assemblies, with structures and functions spanning biological scales, is an ongoing venture. Systematic and data-driven analyses of these higher-order structures are emerging, facilitating the discovery and understanding of cellular phenotypes. At present, knowledge of protein localization and function has been primarily derived from manual annotation and curation in resources such as the Gene Ontology, which are biased toward richly annotated genes in the literature. Here, we envision a future powered by data-driven mapping of protein assemblies. These maps can capture and decode cellular functions through the integration of protein expression, localization, and interaction data across length scales and timescales. In this review, we focus on progress toward constructing integrated cell maps that accelerate the life sciences and translational research.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11343683/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140946150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingxuan Bao, Brian N Lee, Junhao Wen, Mansu Kim, Shizhuo Mu, Shu Yang, Christos Davatzikos, Qi Long, Marylyn D Ritchie, Li Shen
{"title":"Employing Informatics Strategies in Alzheimer's Disease Research: A Review from Genetics, Multiomics, and Biomarkers to Clinical Outcomes.","authors":"Jingxuan Bao, Brian N Lee, Junhao Wen, Mansu Kim, Shizhuo Mu, Shu Yang, Christos Davatzikos, Qi Long, Marylyn D Ritchie, Li Shen","doi":"10.1146/annurev-biodatasci-102423-121021","DOIUrl":"10.1146/annurev-biodatasci-102423-121021","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is a critical national concern, affecting 5.8 million people and costing more than $250 billion annually. However, there is no available cure. Thus, effective strategies are in urgent need to discover AD biomarkers for disease early detection and drug development. In this review, we study AD from a biomedical data scientist perspective to discuss the four fundamental components in AD research: genetics (G), molecular multiomics (M), multimodal imaging biomarkers (B), and clinical outcomes (O) (collectively referred to as the GMBO framework). We provide a comprehensive review of common statistical and informatics methodologies for each component within the GMBO framework, accompanied by the major findings from landmark AD studies. Our review highlights the potential of multimodal biobank data in addressing key challenges in AD, such as early diagnosis, disease heterogeneity, and therapeutic development. We identify major hurdles in AD research, including data scarcity and complexity, and advocate for enhanced collaboration, data harmonization, and advanced modeling techniques. This review aims to be an essential guide for understanding current biomedical data science strategies in AD research, emphasizing the need for integrated, multidisciplinary approaches to advance our understanding and management of AD.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525791/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141288709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatially Resolved Single-Cell Omics: Methods, Challenges, and Future Perspectives.","authors":"Felipe Segato Dezem, Wani Arjumand, Hannah DuBose, Natalia Silva Morosini, Jasmine Plummer","doi":"10.1146/annurev-biodatasci-102523-103640","DOIUrl":"10.1146/annurev-biodatasci-102523-103640","url":null,"abstract":"<p><p>Overlaying omics data onto spatial biological dimensions has been a promising technology to provide high-resolution insights into the interactome and cellular heterogeneity relative to the organization of the molecular microenvironment of tissue samples in normal and disease states. Spatial omics can be categorized into three major modalities: (<i>a</i>) next-generation sequencing-based assays, (<i>b</i>) imaging-based spatially resolved transcriptomics approaches including in situ hybridization/in situ sequencing, and (<i>c</i>) imaging-based spatial proteomics. These modalities allow assessment of transcripts and proteins at a cellular level, generating large and computationally challenging datasets. The lack of standardized computational pipelines to analyze and integrate these nonuniform structured data has made it necessary to apply artificial intelligence and machine learning strategies to best visualize and translate their complexity. In this review, we summarize the currently available techniques and computational strategies, highlight their advantages and limitations, and discuss their future prospects in the scientific field.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141071246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yonghyun Nam, Jaesik Kim, Sang-Hyuk Jung, Jakob Woerner, Erica H Suh, Dong-Gi Lee, Manu Shivakumar, Matthew E Lee, Dokyoon Kim
{"title":"Harnessing Artificial Intelligence in Multimodal Omics Data Integration: Paving the Path for the Next Frontier in Precision Medicine.","authors":"Yonghyun Nam, Jaesik Kim, Sang-Hyuk Jung, Jakob Woerner, Erica H Suh, Dong-Gi Lee, Manu Shivakumar, Matthew E Lee, Dokyoon Kim","doi":"10.1146/annurev-biodatasci-102523-103801","DOIUrl":"10.1146/annurev-biodatasci-102523-103801","url":null,"abstract":"<p><p>The integration of multiomics data with detailed phenotypic insights from electronic health records marks a paradigm shift in biomedical research, offering unparalleled holistic views into health and disease pathways. This review delineates the current landscape of multimodal omics data integration, emphasizing its transformative potential in generating a comprehensive understanding of complex biological systems. We explore robust methodologies for data integration, ranging from concatenation-based to transformation-based and network-based strategies, designed to harness the intricate nuances of diverse data types. Our discussion extends from incorporating large-scale population biobanks to dissecting high-dimensional omics layers at the single-cell level. The review underscores the emerging role of large language models in artificial intelligence, anticipating their influence as a near-future pivot in data integration approaches. Highlighting both achievements and hurdles, we advocate for a concerted effort toward sophisticated integration models, fortifying the foundation for groundbreaking discoveries in precision medicine.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141071239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyunghoon Cho, David Froelicher, Natnatee Dokmai, Anupama Nandi, Shuvom Sadhuka, Matthew M Hong, Bonnie Berger
{"title":"Privacy-Enhancing Technologies in Biomedical Data Science.","authors":"Hyunghoon Cho, David Froelicher, Natnatee Dokmai, Anupama Nandi, Shuvom Sadhuka, Matthew M Hong, Bonnie Berger","doi":"10.1146/annurev-biodatasci-120423-120107","DOIUrl":"10.1146/annurev-biodatasci-120423-120107","url":null,"abstract":"<p><p>The rapidly growing scale and variety of biomedical data repositories raise important privacy concerns. Conventional frameworks for collecting and sharing human subject data offer limited privacy protection, often necessitating the creation of data silos. Privacy-enhancing technologies (PETs) promise to safeguard these data and broaden their usage by providing means to share and analyze sensitive data while protecting privacy. Here, we review prominent PETs and illustrate their role in advancing biomedicine. We describe key use cases of PETs and their latest technical advances and highlight recent applications of PETs in a range of biomedical domains. We conclude by discussing outstanding challenges and social considerations that need to be addressed to facilitate a broader adoption of PETs in biomedical data science.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11346580/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142044148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mapping the Human Cell Surface Interactome: A Key to Decode Cell-to-Cell Communication.","authors":"Jarrod Shilts, Gavin J Wright","doi":"10.1146/annurev-biodatasci-102523-103821","DOIUrl":"10.1146/annurev-biodatasci-102523-103821","url":null,"abstract":"<p><p>Proteins on the surfaces of cells serve as physical connection points to bridge one cell with another, enabling direct communication between cells and cohesive structure. As biomedical research makes the leap from characterizing individual cells toward understanding the multicellular organization of the human body, the binding interactions between molecules on the surfaces of cells are foundational both for computational models and for clinical efforts to exploit these influential receptor pathways. To achieve this grander vision, we must assemble the full interactome of ways surface proteins can link together. This review investigates how close we are to knowing the human cell surface protein interactome. We summarize the current state of databases and systematic technologies to assemble surface protein interactomes, while highlighting substantial gaps that remain. We aim for this to serve as a road map for eventually building a more robust picture of the human cell surface protein interactome.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140899795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Disease Trajectories from Healthcare Data: Methodologies, Key Results, and Future Perspectives.","authors":"Isabella Friis Jørgensen, Amalie Dahl Haue, Davide Placido, Jessica Xin Hjaltelin, Søren Brunak","doi":"10.1146/annurev-biodatasci-110123-041001","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-110123-041001","url":null,"abstract":"<p><p>Disease trajectories, defined as sequential, directional disease associations, have become an intense research field driven by the availability of electronic population-wide healthcare data and sufficient computational power. Here, we provide an overview of disease trajectory studies with a focus on European work, including ontologies used as well as computational methodologies for the construction of disease trajectories. We also discuss different applications of disease trajectories from descriptive risk identification to disease progression, patient stratification, and personalized predictions using machine learning. We describe challenges and opportunities in the area that eventually will benefit from initiatives such as the European Health Data Space, which, with time, will make it possible to analyze data from cohorts comprising hundreds of millions of patients.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142044147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Science Methods for Real-World Evidence Generation in Real-World Data.","authors":"Fang Liu","doi":"10.1146/annurev-biodatasci-102423-113220","DOIUrl":"10.1146/annurev-biodatasci-102423-113220","url":null,"abstract":"<p><p>In the healthcare landscape, data science (DS) methods have emerged as indispensable tools to harness real-world data (RWD) from various data sources such as electronic health records, claim and registry data, and data gathered from digital health technologies. Real-world evidence (RWE) generated from RWD empowers researchers, clinicians, and policymakers with a more comprehensive understanding of real-world patient outcomes. Nevertheless, persistent challenges in RWD (e.g., messiness, voluminousness, heterogeneity, multimodality) and a growing awareness of the need for trustworthy and reliable RWE demand innovative, robust, and valid DS methods for analyzing RWD. In this article, I review some common current DS methods for extracting RWE and valuable insights from complex and diverse RWD. This article encompasses the entire RWE-generation pipeline, from study design with RWD to data preprocessing, exploratory analysis, methods for analyzing RWD, and trustworthiness and reliability guarantees, along with data ethics considerations and open-source tools. This review, tailored for an audience that may not be experts in DS, aspires to offer a systematic review of DS methods and assists readers in selecting suitable DS methods and enhancing the process of RWE generation for addressing their specific challenges.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":null,"pages":null},"PeriodicalIF":7.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140946147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}