Aurélie Cobat, Qian Zhang, Laurent Abel, Jean-Laurent Casanova, Jacques Fellay
{"title":"Human Genomics of COVID-19 Pneumonia: Contributions of Rare and Common Variants.","authors":"Aurélie Cobat, Qian Zhang, Laurent Abel, Jean-Laurent Casanova, Jacques Fellay","doi":"10.1146/annurev-biodatasci-020222-021705","DOIUrl":"10.1146/annurev-biodatasci-020222-021705","url":null,"abstract":"<p><p>SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) infection is silent or benign in most infected individuals, but causes hypoxemic COVID-19 pneumonia in about 10% of cases. We review studies of the human genetics of life-threatening COVID-19 pneumonia, focusing on both rare and common variants. Large-scale genome-wide association studies have identified more than 20 common loci robustly associated with COVID-19 pneumonia with modest effect sizes, some implicating genes expressed in the lungs or leukocytes. The most robust association, on chromosome 3, concerns a haplotype inherited from Neanderthals. Sequencing studies focusing on rare variants with a strong effect have been particularly successful, identifying inborn errors of type I interferon (IFN) immunity in 1-5% of unvaccinated patients with critical pneumonia, and their autoimmune phenocopy, autoantibodies against type I IFN, in another 15-20% of cases. Our growing understanding of the impact of human genetic variation on immunity to SARS-CoV-2 is enabling health systems to improve protection for individuals and populations.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"465-486"},"PeriodicalIF":7.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10879986/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9960534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Single-Cell RNA Sequencing for Studying Human Cancers.","authors":"Dvir Aran","doi":"10.1146/annurev-biodatasci-020722-091857","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-020722-091857","url":null,"abstract":"<p><p>Since the first publication a decade ago describing the use of single-cell RNA sequencing (scRNA-seq) in the context of cancer, over 200 datasets and thousands of scRNA-seq studies have been published in cancer biology. scRNA-seq technologies have been applied across dozens of cancer types and a diverse array of study designs to improve our understanding of tumor biology, the tumor microenvironment, and therapeutic responses, and scRNA-seq is on the verge of being used to improve decision-making in the clinic. Computational methodologies and analytical pipelines are key in facilitating scRNA-seq research. Numerous computational methods utilizing the most advanced tools in data science have been developed to extract meaningful insights. Here, we review the advancements in cancer biology gained by scRNA-seq and discuss the computational challenges of the technology that are specific to cancer research.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"1-22"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9967040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Todd L Edwards, Catherine A Greene, Jacqueline A Piekos, Jacklyn N Hellwege, Gabrielle Hampton, Elizabeth A Jasper, Digna R Velez Edwards
{"title":"Challenges and Opportunities for Data Science in Women's Health.","authors":"Todd L Edwards, Catherine A Greene, Jacqueline A Piekos, Jacklyn N Hellwege, Gabrielle Hampton, Elizabeth A Jasper, Digna R Velez Edwards","doi":"10.1146/annurev-biodatasci-020722-105958","DOIUrl":"10.1146/annurev-biodatasci-020722-105958","url":null,"abstract":"<p><p>The intersection of women's health and data science is a field of research that has historically trailed other fields, but more recently it has gained momentum. This growth is being driven not only by new investigators who are moving into this area but also by the significant opportunities that have emerged in new methodologies, resources, and technologies in data science. Here, we describe some of the resources and methods being used by women's health researchers today to meet challenges in biomedical data science. We also describe the opportunities and limitations of applying these approaches to advance women's health outcomes and the future of the field, with emphasis on repurposing existing methodologies for women's health.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"23-45"},"PeriodicalIF":7.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10877578/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9967041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Overview of Deep Generative Models in Functional and Evolutionary Genomics.","authors":"Burak Yelmen, Flora Jay","doi":"10.1146/annurev-biodatasci-020722-115651","DOIUrl":"10.1146/annurev-biodatasci-020722-115651","url":null,"abstract":"<p><p>Following the widespread use of deep learning for genomics, deep generative modeling is also becoming a viable methodology for the broad field. Deep generative models (DGMs) can learn the complex structure of genomic data and allow researchers to generate novel genomic instances that retain the real characteristics of the original dataset. Aside from data generation, DGMs can also be used for dimensionality reduction by mapping the data space to a latent space, as well as for prediction tasks via exploitation of this learned mapping or supervised/semi-supervised DGM designs. In this review, we briefly introduce generative modeling and two currently prevailing architectures, we present conceptual applications along with notable examples in functional and evolutionary genomics, and we provide our perspective on potential challenges and future directions.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"173-189"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9967062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex A Nguyen, Anne Marie McCarthy, Despina Kontos
{"title":"Combining Molecular and Radiomic Features for Risk Assessment in Breast Cancer.","authors":"Alex A Nguyen, Anne Marie McCarthy, Despina Kontos","doi":"10.1146/annurev-biodatasci-020722-092748","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-020722-092748","url":null,"abstract":"<p><p>Breast cancer risk is highly variable within the population and current research is leading the shift toward personalized medicine. By accurately assessing an individual woman's risk, we can reduce the risk of over/undertreatment by preventing unnecessary procedures or by elevating screening procedures. Breast density measured from conventional mammography has been established as one of the most dominant risk factors for breast cancer; however, it is currently limited by its ability to characterize more complex breast parenchymal patterns that have been shown to provide additional information to strengthen cancer risk models. Molecular factors ranging from high penetrance, or high likelihood that a mutation will show signs and symptoms of the disease, to combinations of gene mutations with low penetrance have shown promise for augmenting risk assessment. Although imaging biomarkers and molecular biomarkers have both individually demonstrated improved performance in risk assessment, few studies have evaluated them together. This review aims to highlight the current state of the art in breast cancer risk assessment using imaging and genetic biomarkers.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"299-311"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9967073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Challenges and Progress in Designing Broad-Spectrum Vaccines Against Rapidly Mutating Viruses.","authors":"Rishi Bedi, Nicholas L Bayless, Jacob Glanville","doi":"10.1146/annurev-biodatasci-020722-041304","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-020722-041304","url":null,"abstract":"<p><p>Viruses evolve to evade prior immunity, causing significant disease burden. Vaccine effectiveness deteriorates as pathogens mutate, requiring redesign. This is a problem that has grown worse due to population increase, global travel, and farming practices. Thus, there is significant interest in developing broad-spectrum vaccines that mitigate disease severity and ideally inhibit disease transmission without requiring frequent updates. Even in cases where vaccines against rapidly mutating pathogens have been somewhat effective, such as seasonal influenza and SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2), designing vaccines that provide broad-spectrum immunity against routinely observed viral variation remains a desirable but not yet achieved goal. This review highlights the key theoretical advances in understanding the interplay between polymorphism and vaccine efficacy, challenges in designing broad-spectrum vaccines, and technology advances and possible avenues forward. We also discuss data-driven approaches for monitoring vaccine efficacy and predicting viral escape from vaccine-induced protection. In each case, we consider illustrative examples in vaccine development from influenza, SARS-CoV-2, and HIV (human immunodeficiency virus)-three examples of highly prevalent rapidly mutating viruses with distinct phylogenetics and unique histories of vaccine technology development.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"419-441"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9960533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Identification of Functional Sequences and Variants in Noncoding DNA.","authors":"Remo Monti, Uwe Ohler","doi":"10.1146/annurev-biodatasci-122120-110102","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-122120-110102","url":null,"abstract":"<p><p>Understanding the noncoding part of the genome, which encodes gene regulation, is necessary to identify genetic mechanisms of disease and translate findings from genome-wide association studies into actionable results for treatments and personalized care. Here we provide an overview of the computational analysis of noncoding regions, starting from gene-regulatory mechanisms and their representation in data. Deep learning methods, when applied to these data, highlight important regulatory sequence elements and predict the functional effects of genetic variants. These and other algorithms are used to predict damaging sequence variants. Finally, we introduce rare-variant association tests that incorporate functional annotations and predictions in order to increase interpretability and statistical power.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"191-210"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10024242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Review of and Roadmap for Data Science and Machine Learning for the Neuropsychiatric Phenotype of Autism.","authors":"Peter Washington, Dennis P Wall","doi":"10.1146/annurev-biodatasci-020722-125454","DOIUrl":"10.1146/annurev-biodatasci-020722-125454","url":null,"abstract":"<p><p>Autism spectrum disorder (autism) is a neurodevelopmental delay that affects at least 1 in 44 children. Like many neurological disorder phenotypes, the diagnostic features are observable, can be tracked over time, and can be managed or even eliminated through proper therapy and treatments. However, there are major bottlenecks in the diagnostic, therapeutic, and longitudinal tracking pipelines for autism and related neurodevelopmental delays, creating an opportunity for novel data science solutions to augment and transform existing workflows and provide increased access to services for affected families. Several efforts previously conducted by a multitude of research labs have spawned great progress toward improved digital diagnostics and digital therapies for children with autism. We review the literature on digital health methods for autism behavior quantification and beneficial therapies using data science. We describe both case-control studies and classification systems for digital phenotyping. We then discuss digital diagnostics and therapeutics that integrate machine learning models of autism-related behaviors, including the factors that must be addressed for translational use. Finally, we describe ongoing challenges and potential opportunities for the field of autism data science. Given the heterogeneous nature of autism and the complexities of the relevant behaviors, this review contains insights that are relevant to neurological behavior analysis and digital psychiatry more broadly.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"211-228"},"PeriodicalIF":7.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11093217/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9960498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Strategies for the Genomic Analysis of Admixed Populations.","authors":"Taotao Tan, Elizabeth G Atkinson","doi":"10.1146/annurev-biodatasci-020722-014310","DOIUrl":"10.1146/annurev-biodatasci-020722-014310","url":null,"abstract":"<p><p>Admixed populations constitute a large portion of global human genetic diversity, yet they are often left out of genomics analyses. This exclusion is problematic, as it leads to disparities in the understanding of the genetic structure and history of diverse cohorts and the performance of genomic medicine across populations. Admixed populations have particular statistical challenges, as they inherit genomic segments from multiple source populations-the primary reason they have historically been excluded from genetic studies. In recent years, however, an increasing number of statistical methods and software tools have been developed to account for and leverage admixture in the context of genomics analyses. Here, we provide a survey of such computational strategies for the informed consideration of admixture to allow for the well-calibrated inclusion of mixed ancestry populations in large-scale genomics studies, and we detail persisting gaps in existing tools.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"105-127"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10871708/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10023273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Learning Methods for Neuroimaging Data Analysis with Applications.","authors":"Hongtu Zhu, Tengfei Li, Bingxin Zhao","doi":"10.1146/annurev-biodatasci-020722-100353","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-020722-100353","url":null,"abstract":"<p><p>The aim of this review is to provide a comprehensive survey of statistical challenges in neuroimaging data analysis, from neuroimaging techniques to large-scale neuroimaging studies and statistical learning methods. We briefly review eight popular neuroimaging techniques and their potential applications in neuroscience research and clinical translation. We delineate four themes of neuroimaging data and review major image processing analysis methods for processing neuroimaging data at the individual level. We briefly review four large-scale neuroimaging-related studies and a consortium on imaging genomics and discuss four themes of neuroimaging data analysis at the population level. We review nine major population-based statistical analysis methods and their associated statistical challenges and present recent progress in statistical methodology to address these challenges.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"73-104"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10023733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}