{"title":"Strategies for Creating Robust Patient Groups to Study Diverse Conditions with Electronic Health Records.","authors":"Grace D Ramey, Hannah Takasuka, John A Capra","doi":"10.1146/annurev-biodatasci-020722-114525","DOIUrl":"10.1146/annurev-biodatasci-020722-114525","url":null,"abstract":"<p><p>The growth of electronic health record (EHR) databases in size and availability has created an unprecedented opportunity to better understand human health and disease. However, conducting robust EHR studies requires careful filtering criteria and study design, as EHRs pose several challenges that can confound analyses and lead to inaccurate results. Here we review these challenges and make suggestions about how to avoid or adjust for major confounders and biases in common EHR study designs. We further highlight qualities of EHR data that make different diseases more or less feasible for study. These recommendations for conducting research using EHRs will help inform database selection, improve reproducibility of results across the field, and enhance the validity of study results.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"317-340"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143812613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucy Ham, Taylor E Woodward, Megan A Coomer, Michael P H Stumpf
{"title":"Mapping, Modeling, and Reprogramming Cell-Fate Decision-Making Systems.","authors":"Lucy Ham, Taylor E Woodward, Megan A Coomer, Michael P H Stumpf","doi":"10.1146/annurev-biodatasci-101424-121439","DOIUrl":"10.1146/annurev-biodatasci-101424-121439","url":null,"abstract":"<p><p>Many cellular processes involve information processing and decision-making. We can probe these processes at increasing molecular detail. The analysis of heterogeneous data remains a challenge that requires new ways of thinking about cells in quantitative, predictive, and mechanistic ways. We discuss the role of mathematical models in the context of cell-fate decision-making systems across the tree of life. Complex multicellular organisms have been a particular focus, but single-celled organisms also have to sense and respond to their environment. We center our discussion around the idea of design principles that we can learn from observations and modeling and exploit in order to (re)-design or guide cellular behavior.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"537-562"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143984534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryan Baker, Josep Bassaganya-Riera, Nuria Tubau-Juni, Andrew J Leber, Raquel Hontecillas
{"title":"The TITAN-X Platform Integrates Big Data, Artificial Intelligence, Bioinformatics, and Advanced Computational Modeling to Understand Immune Responses and Develop the Next Wave of Precision Medicines.","authors":"Ryan Baker, Josep Bassaganya-Riera, Nuria Tubau-Juni, Andrew J Leber, Raquel Hontecillas","doi":"10.1146/annurev-biodatasci-103123-094804","DOIUrl":"10.1146/annurev-biodatasci-103123-094804","url":null,"abstract":"<p><p>The TITAN-X Precision Medicine Platform was engineered to rapidly, fully, and efficiently utilize large-scale immunology datasets, including public data, in drug discovery and development. TITAN-X integrates big data with artificial intelligence (AI), bioinformatics, and advanced computational modeling to seamlessly transition from early target discovery to clinical testing of new therapeutics, developing biomarker-driven precision medicines tailored to specific patient populations. We illustrate the capabilities of TITAN-X through four case studies, demonstrating its use in computationally driven target discovery; characterization of novel immunometabolic mechanisms in infectious, inflammatory, and autoimmune diseases; and identification of biomarker signatures for patient stratification in clinical trials designed to maximize therapeutic efficacy and safety. Data-driven and AI-powered approaches like TITAN-X are enhancing the pace of drug development, reducing costs, tailoring treatments, and increasing the probability of success in clinical trials.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"447-469"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144039901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Meta-Analysis and Federated Learning over Decentralized Distributed Research Networks.","authors":"Yiwen Lu, Bingyu Zhang, Jiayi Tong, Yong Chen","doi":"10.1146/annurev-biodatasci-103123-094441","DOIUrl":"10.1146/annurev-biodatasci-103123-094441","url":null,"abstract":"<p><p>Distributed research networks have transformed modern clinical research by enabling large-scale, multi-institutional collaborations while maintaining patient privacy. Two prominent methodologies within these frameworks-meta-analysis and federated learning-address the challenges of synthesizing evidence from decentralized data. Meta-analysis aggregates study-level results to provide robust, interpretable estimates, making it a cornerstone of evidence synthesis for association studies. Federated learning complements this by enabling complex downstream tasks, such as predictive modeling and counterfactual inference, while preserving data privacy through privacy-preserving distributed algorithms. Federated learning facilitates communication-efficient computation and adapts seamlessly to heterogeneous datasets across diverse institutions. This review emphasizes the complementary strengths of federated learning's scalability, flexibility, and readiness for implementation alongside meta-analysis's robust frameworks for evidence synthesis and aggregation in clinical research. Integrations of synthetic data, artificial intelligence (AI)-enhanced harmonization, and hybrid human-AI frameworks are proposed as future directions, promising to further advance both methodologies and enhance their combined impact on privacy-conscious, data-driven healthcare research.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"8 1","pages":"405-421"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144822752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kevin K Tsang, Sophia Kivelson, Jose M Acitores Cortina, Aditi Kuchi, Jacob S Berkowitz, Hongyu Liu, Apoorva Srinivasan, Nadine A Friedrich, Yasaman Fatapour, Nicholas P Tatonetti
{"title":"Foundation Models for Translational Cancer Biology.","authors":"Kevin K Tsang, Sophia Kivelson, Jose M Acitores Cortina, Aditi Kuchi, Jacob S Berkowitz, Hongyu Liu, Apoorva Srinivasan, Nadine A Friedrich, Yasaman Fatapour, Nicholas P Tatonetti","doi":"10.1146/annurev-biodatasci-103123-095633","DOIUrl":"10.1146/annurev-biodatasci-103123-095633","url":null,"abstract":"<p><p>Cancer remains a leading cause of death globally. The complexity and diversity of cancer-related datasets across different specialties pose challenges in refining precision medicine for oncology. Foundation models offer a promising solution. Trained on vast amounts of data, these models develop a broad understanding across a wide range of tasks. We examine the role of foundation models in domains relevant to cancer research, including natural language processing, computer vision, molecular biology, and cheminformatics. Through a review of state-of-the-art methods, we explore how these models have already advanced translational cancer research goals such as precision tumor classification and artificial intelligence-assisted surgery. We also discuss prospective advances in areas like early tumor detection, personalized cancer treatment, and drug discovery. This review provides researchers with a curated set of resources and methodologies, offers practitioners a deeper understanding of how these models enhance cancer care, and points to opportunities for future applications of foundation models in cancer research.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"51-80"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143068152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol
{"title":"Clinical Text Generation: Are We There Yet?","authors":"Nicolas Hiebel, Olivier Ferret, Karën Fort, Aurélie Névéol","doi":"10.1146/annurev-biodatasci-103123-095202","DOIUrl":"10.1146/annurev-biodatasci-103123-095202","url":null,"abstract":"<p><p>Generative artificial intelligence (AI), operationalized as large language models, is increasingly used in the biomedical field to assist with a range of text processing tasks including text classification, information extraction, and decision support. In this article, we focus on the primary purpose of generative language models, namely the production of unstructured text. We review past and current methods used to generate text as well as methods for evaluating open text generation, i.e., in contexts where no reference text is available for comparison. We discuss clinical applications that can benefit from high quality, ethically designed text generation, such as clinical note generation and synthetic text generation in support of secondary use of health data. We also raise awareness of the risks involved with generative AI such as overconfidence in outputs due to anthropomorphism and the risk of representational and allocation harms due to biases.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"173-198"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143658875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conditional Generative Models for Synthetic Tabular Data: Applications for Precision Medicine and Diverse Representations.","authors":"Kara Liu, Russ B Altman","doi":"10.1146/annurev-biodatasci-103123-094844","DOIUrl":"10.1146/annurev-biodatasci-103123-094844","url":null,"abstract":"<p><p>Tabular medical datasets, like electronic health records (EHRs), biobanks, and structured clinical trial data, are rich sources of information with the potential to advance precision medicine and optimize patient care. However, real-world medical datasets have limited patient diversity and cannot simulate hypothetical outcomes, both of which are necessary for equitable and effective medical research. Fueled by recent advancements in machine learning, generative models offer a promising solution to these data limitations by generating enhanced synthetic data. This review highlights the potential of conditional generative models (CGMs) to create patient-specific synthetic data for a variety of precision medicine applications. We survey CGM approaches that tackle two medical applications: correcting for data representation biases and simulating digital health twins. We additionally explore how the surveyed methods handle modeling tabular medical data and briefly discuss evaluation criteria. Finally, we summarize the technical, medical, and ethical challenges that must be addressed before CGMs can be effectively and safely deployed in the medical field.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"21-49"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142984817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrative Data Science in Drug Safety Research: Experiences, Challenges, and Perspectives.","authors":"Ferran Sanz","doi":"10.1146/annurev-biodatasci-103123-095506","DOIUrl":"10.1146/annurev-biodatasci-103123-095506","url":null,"abstract":"<p><p>Pharmaceutical research and development largely depend on the quantity and quality of data that are available to support projects. The secondary use of data by means of collaborative and integrative approaches is yielding promising results in drug safety research. However, there are challenges that must be overcome in these integrative approaches, such as interoperability issues, intellectual property protection, and, in the case of clinical information, personal data safeguards. The OMOP common data model and the EHDEN and DARWIN EU platforms constitute successful examples of data sharing initiatives in the clinical domain, while the eTOX, eTRANSAFE, and VICT3R international projects are examples of corporate data sharing in toxicology research. The VICT3R project is using these shared data for generating virtual control groups to be applied in nonclinical drug safety assessment. Drug-related knowledge bases that integrate information from different sources also constitute useful tools in the drug safety domain.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"275-285"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143765289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vivian Utti, Vasiliki Bikia, Ank A Agarwal, Roxana Daneshjou
{"title":"Integrating Artificial Intelligence in Dermatological Cancer Screening and Diagnosis: Efficacy, Challenges, and Future Directions.","authors":"Vivian Utti, Vasiliki Bikia, Ank A Agarwal, Roxana Daneshjou","doi":"10.1146/annurev-biodatasci-103123-094521","DOIUrl":"10.1146/annurev-biodatasci-103123-094521","url":null,"abstract":"<p><p>Skin cancer is the most common cancer in the United States, with incidence rates continuing to rise both nationally and globally, posing significant health and economic burdens. These challenges are compounded by shortages in dermatological care and barriers to insurance access. To address these gaps, artificial intelligence (AI) and deep learning technologies offer promising solutions, enhancing skin cancer screening and diagnosis. AI has the potential to improve diagnostic accuracy and expand access to care, but significant challenges restrict its deployment. These challenges include clinical validation, algorithmic bias, regulatory oversight, and patient acceptance. Ethical concerns, such as disparities in access and fairness of AI algorithms, also require attention. In this review, we explore these limitations and outline future directions, including advancements in teledermatology and vision-language models (VLMs). Future research should focus on improving VLM reliability and interpretability and developing systems capable of integrating clinical context with dermatological images in a way that assists, rather than replaces, clinicians in making more accurate, timely diagnoses.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"591-603"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144001083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thodoris Koutsandreas, Kalliopi Tsafou, Heiko Horn, Ian Barrett, Evangelia Petsalaki
{"title":"Network-Based Approaches for Drug Target Identification.","authors":"Thodoris Koutsandreas, Kalliopi Tsafou, Heiko Horn, Ian Barrett, Evangelia Petsalaki","doi":"10.1146/annurev-biodatasci-101424-120950","DOIUrl":"10.1146/annurev-biodatasci-101424-120950","url":null,"abstract":"<p><p>Drug target identification is the first step in drug development, and its importance is underscored by the fact that, even when using genetic evidence to improve success rates, only a small fraction of lead targets end up approved for use in the clinic. One of the reasons for this is the lack of in-depth understanding of the complexity of human diseases.In this review we argue that network-based approaches, which are able to capture relationships between relevant genes and proteins, and diverse data modalities have high potential for improving drug target identification and drug repurposing. We present the evolution of network-based methods that have been developed for this purpose and discuss the limitations of these approaches that are holding them back from making an impact in the clinic. We finish by presenting our recommendations for overcoming these limitations, for example, by leveraging emerging technologies such as artificial intelligence and knowledge graphs.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"423-446"},"PeriodicalIF":6.0,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144050632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}