{"title":"Mapping the Human Cell Surface Interactome: A Key to Decode Cell-to-Cell Communication.","authors":"Jarrod Shilts, Gavin J Wright","doi":"10.1146/annurev-biodatasci-102523-103821","DOIUrl":"10.1146/annurev-biodatasci-102523-103821","url":null,"abstract":"<p><p>Proteins on the surfaces of cells serve as physical connection points to bridge one cell with another, enabling direct communication between cells and cohesive structure. As biomedical research makes the leap from characterizing individual cells toward understanding the multicellular organization of the human body, the binding interactions between molecules on the surfaces of cells are foundational both for computational models and for clinical efforts to exploit these influential receptor pathways. To achieve this grander vision, we must assemble the full interactome of ways surface proteins can link together. This review investigates how close we are to knowing the human cell surface protein interactome. We summarize the current state of databases and systematic technologies to assemble surface protein interactomes, while highlighting substantial gaps that remain. We aim for this to serve as a road map for eventually building a more robust picture of the human cell surface protein interactome.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"155-177"},"PeriodicalIF":7.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140899795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Disease Trajectories from Healthcare Data: Methodologies, Key Results, and Future Perspectives.","authors":"Isabella Friis Jørgensen, Amalie Dahl Haue, Davide Placido, Jessica Xin Hjaltelin, Søren Brunak","doi":"10.1146/annurev-biodatasci-110123-041001","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-110123-041001","url":null,"abstract":"<p><p>Disease trajectories, defined as sequential, directional disease associations, have become an intense research field driven by the availability of electronic population-wide healthcare data and sufficient computational power. Here, we provide an overview of disease trajectory studies with a focus on European work, including ontologies used as well as computational methodologies for the construction of disease trajectories. We also discuss different applications of disease trajectories from descriptive risk identification to disease progression, patient stratification, and personalized predictions using machine learning. We describe challenges and opportunities in the area that eventually will benefit from initiatives such as the European Health Data Space, which, with time, will make it possible to analyze data from cohorts comprising hundreds of millions of patients.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"7 1","pages":"251-276"},"PeriodicalIF":7.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142044147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Science Methods for Real-World Evidence Generation in Real-World Data.","authors":"Fang Liu","doi":"10.1146/annurev-biodatasci-102423-113220","DOIUrl":"10.1146/annurev-biodatasci-102423-113220","url":null,"abstract":"<p><p>In the healthcare landscape, data science (DS) methods have emerged as indispensable tools to harness real-world data (RWD) from various data sources such as electronic health records, claim and registry data, and data gathered from digital health technologies. Real-world evidence (RWE) generated from RWD empowers researchers, clinicians, and policymakers with a more comprehensive understanding of real-world patient outcomes. Nevertheless, persistent challenges in RWD (e.g., messiness, voluminousness, heterogeneity, multimodality) and a growing awareness of the need for trustworthy and reliable RWE demand innovative, robust, and valid DS methods for analyzing RWD. In this article, I review some common current DS methods for extracting RWE and valuable insights from complex and diverse RWD. This article encompasses the entire RWE-generation pipeline, from study design with RWD to data preprocessing, exploratory analysis, methods for analyzing RWD, and trustworthiness and reliability guarantees, along with data ethics considerations and open-source tools. This review, tailored for an audience that may not be experts in DS, aspires to offer a systematic review of DS methods and assists readers in selecting suitable DS methods and enhancing the process of RWE generation for addressing their specific challenges.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"201-224"},"PeriodicalIF":7.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140946147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michèle Ramsay, Amelia C Crampin, Ayaga A Bawah, Evelyn Gitau, Kobus Herbst
{"title":"The Value Proposition of Coordinated Population Cohorts Across Africa.","authors":"Michèle Ramsay, Amelia C Crampin, Ayaga A Bawah, Evelyn Gitau, Kobus Herbst","doi":"10.1146/annurev-biodatasci-020722-015026","DOIUrl":"10.1146/annurev-biodatasci-020722-015026","url":null,"abstract":"<p><p>Building longitudinal population cohorts in Africa for coordinated research and surveillance can influence the setting of national health priorities, lead to the introduction of appropriate interventions, and provide evidence for targeted treatment, leading to better health across the continent. However, compared to cohorts from the global north, longitudinal continental African population cohorts remain scarce, are relatively small in size, and lack data complexity. As infections and noncommunicable diseases disproportionately affect Africa's approximately 1.4 billion inhabitants, African cohorts present a unique opportunity for research and surveillance. High genetic diversity in African populations and multiomic research studies, together with detailed phenotyping and clinical profiling, will be a treasure trove for discovery. The outcomes, including novel drug targets, biological pathways for disease, and gene-environment interactions, will boost precision medicine approaches, not only in Africa but across the globe.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"7 1","pages":"277-294"},"PeriodicalIF":7.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142044149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph Artificial Intelligence in Medicine.","authors":"Ruth Johnson, Michelle M Li, Ayush Noori, Owen Queen, Marinka Zitnik","doi":"10.1146/annurev-biodatasci-110723-024625","DOIUrl":"10.1146/annurev-biodatasci-110723-024625","url":null,"abstract":"<p><p>In clinical artificial intelligence (AI), graph representation learning, mainly through graph neural networks and graph transformer architectures, stands out for its capability to capture intricate relationships and structures within clinical datasets. With diverse data-from patient records to imaging-graph AI models process data holistically by viewing modalities and entities within them as nodes interconnected by their relationships. Graph AI facilitates model transfer across clinical tasks, enabling models to generalize across patient populations without additional parameters and with minimal to no retraining. However, the importance of human-centered design and model interpretability in clinical decision-making cannot be overstated. Since graph AI models capture information through localized neural transformations defined on relational datasets, they offer both an opportunity and a challenge in elucidating model rationale. Knowledge graphs can enhance interpretability by aligning model-driven insights with medical knowledge. Emerging graph AI models integrate diverse data modalities through pretraining, facilitate interactive feedback loops, and foster human-AI collaboration, paving the way toward clinically meaningful predictions.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"345-368"},"PeriodicalIF":7.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11344018/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140946148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryan Ehrlich, Eric Glynn, Mona Singh, Dario Ghersi
{"title":"Computational Methods for Predicting Key Interactions in T Cell-Mediated Adaptive Immunity.","authors":"Ryan Ehrlich, Eric Glynn, Mona Singh, Dario Ghersi","doi":"10.1146/annurev-biodatasci-102423-122741","DOIUrl":"10.1146/annurev-biodatasci-102423-122741","url":null,"abstract":"<p><p>The adaptive immune system recognizes pathogen- and cancer-specific features and is endowed with memory, enabling it to respond quickly and efficiently to repeated encounters with the same antigens. T cells play a central role in the adaptive immune system by directly targeting intracellular pathogens and helping to activate B cells to secrete antibodies. Several fundamental protein interactions-including those between major histocompatibility complex (MHC) proteins and antigen-derived peptides as well as between T cell receptors and peptide-MHC complexes-underlie the ability of T cells to recognize antigens with great precision. Computational approaches to predict these interactions are increasingly being used for medically relevant applications, including vaccine design and prediction of patient response to cancer immunotherapies. We provide computational researchers with an accessible introduction to the adaptive immune system, review computational approaches to predict the key protein interactions underlying T cell-mediated adaptive immunity, and highlight remaining challenges.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":"295-316"},"PeriodicalIF":7.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140946171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Addressing the Challenge of Biomedical Data Inequality: An Artificial Intelligence Perspective.","authors":"Yan Gao, Teena Sharma, Yan Cui","doi":"10.1146/annurev-biodatasci-020722-020704","DOIUrl":"10.1146/annurev-biodatasci-020722-020704","url":null,"abstract":"<p><p>Artificial intelligence (AI) and other data-driven technologies hold great promise to transform healthcare and confer the predictive power essential to precision medicine. However, the existing biomedical data, which are a vital resource and foundation for developing medical AI models, do not reflect the diversity of the human population. The low representation in biomedical data has become a significant health risk for non-European populations, and the growing application of AI opens a new pathway for this health risk to manifest and amplify. Here we review the current status of biomedical data inequality and present a conceptual framework for understanding its impacts on machine learning. We also discuss the recent advances in algorithmic interventions for mitigating health disparities arising from biomedical data inequality. Finally, we briefly discuss the newly identified disparity in data quality among ethnic groups and its potential impacts on machine learning.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"153-171"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10529864/pdf/nihms-1913459.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9960491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vasileios Gouzouasis, Spyros Tastsoglou, Antonis Giannakakis, Artemis G Hatzigeorgiou
{"title":"Virus-Derived Small RNAs and microRNAs in Health and Disease.","authors":"Vasileios Gouzouasis, Spyros Tastsoglou, Antonis Giannakakis, Artemis G Hatzigeorgiou","doi":"10.1146/annurev-biodatasci-122220-111429","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-122220-111429","url":null,"abstract":"<p><p>MicroRNAs (miRNAs) are short noncoding RNAs that can regulate all steps of gene expression (induction, transcription, and translation). Several virus families, primarily double-stranded DNA viruses, encode small RNAs (sRNAs), including miRNAs. These virus-derived miRNAs (v-miRNAs) help the virus evade the host's innate and adaptive immune system and maintain an environment of chronic latent infection. In this review, the functions of the sRNA-mediated virus-host interactions are highlighted, delineating their implication in chronic stress, inflammation, immunopathology, and disease. We provide insights into the latest viral RNA-based research-in silico approaches for functional characterization of v-miRNAs and other RNA types. The latest research can assist toward the identification of therapeutic targets to combat viral infections.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"275-298"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9960509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sophia M Guldberg, Trine Line Hauge Okholm, Elizabeth E McCarthy, Matthew H Spitzer
{"title":"Computational Methods for Single-Cell Proteomics.","authors":"Sophia M Guldberg, Trine Line Hauge Okholm, Elizabeth E McCarthy, Matthew H Spitzer","doi":"10.1146/annurev-biodatasci-020422-050255","DOIUrl":"10.1146/annurev-biodatasci-020422-050255","url":null,"abstract":"<p><p>Advances in single-cell proteomics technologies have resulted in high-dimensional datasets comprising millions of cells that are capable of answering key questions about biology and disease. The advent of these technologies has prompted the development of computational tools to process and visualize the complex data. In this review, we outline the steps of single-cell and spatial proteomics analysis pipelines. In addition to describing available methods, we highlight benchmarking studies that have identified advantages and pitfalls of the currently available computational toolkits. As these technologies continue to advance, robust analysis tools should be developed in tandem to take full advantage of the potential biological insights provided by these data.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"47-71"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10621466/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10023948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pankhuri Singhal, Shefali Setia Verma, Marylyn D Ritchie
{"title":"Gene Interactions in Human Disease Studies-Evidence Is Mounting.","authors":"Pankhuri Singhal, Shefali Setia Verma, Marylyn D Ritchie","doi":"10.1146/annurev-biodatasci-102022-120818","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-102022-120818","url":null,"abstract":"<p><p>Despite monumental advances in molecular technology to generate genome sequence data at scale, there is still a considerable proportion of heritability in most complex diseases that remains unexplained. Because many of the discoveries have been single-nucleotide variants with small to moderate effects on disease, the functional implication of many of the variants is still unknown and, thus, we have limited new drug targets and therapeutics. We, and many others, posit that one primary factor that has limited our ability to identify novel drug targets from genome-wide association studies may be due to gene interactions (epistasis), gene-environment interactions, network/pathway effects, or multiomic relationships. We propose that many of these complex models explain much of the underlying genetic architecture of complex disease. In this review, we discuss the evidence from multiple research avenues, ranging from pairs of alleles to multiomic integration studies and pharmacogenomics, that supports the need for further investigation of gene interactions (or epistasis) in genetic and genomic studies of human disease. Our goal is to catalog the mounting evidence for epistasis in genetic studies and the connections between genetic interactions and human health and disease that could enable precision medicine of the future.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"377-395"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9960535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}