{"title":"Strategies for the Genomic Analysis of Admixed Populations.","authors":"Taotao Tan, Elizabeth G Atkinson","doi":"10.1146/annurev-biodatasci-020722-014310","DOIUrl":"10.1146/annurev-biodatasci-020722-014310","url":null,"abstract":"<p><p>Admixed populations constitute a large portion of global human genetic diversity, yet they are often left out of genomics analyses. This exclusion is problematic, as it leads to disparities in the understanding of the genetic structure and history of diverse cohorts and the performance of genomic medicine across populations. Admixed populations have particular statistical challenges, as they inherit genomic segments from multiple source populations-the primary reason they have historically been excluded from genetic studies. In recent years, however, an increasing number of statistical methods and software tools have been developed to account for and leverage admixture in the context of genomics analyses. Here, we provide a survey of such computational strategies for the informed consideration of admixture to allow for the well-calibrated inclusion of mixed ancestry populations in large-scale genomics studies, and we detail persisting gaps in existing tools.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"105-127"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10871708/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10023273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Learning Methods for Neuroimaging Data Analysis with Applications.","authors":"Hongtu Zhu, Tengfei Li, Bingxin Zhao","doi":"10.1146/annurev-biodatasci-020722-100353","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-020722-100353","url":null,"abstract":"<p><p>The aim of this review is to provide a comprehensive survey of statistical challenges in neuroimaging data analysis, from neuroimaging techniques to large-scale neuroimaging studies and statistical learning methods. We briefly review eight popular neuroimaging techniques and their potential applications in neuroscience research and clinical translation. We delineate four themes of neuroimaging data and review major image processing analysis methods for processing neuroimaging data at the individual level. We briefly review four large-scale neuroimaging-related studies and a consortium on imaging genomics and discuss four themes of neuroimaging data analysis at the population level. We review nine major population-based statistical analysis methods and their associated statistical challenges and present recent progress in statistical methodology to address these challenges.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"73-104"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10023733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emily Flynn, Ana Almonte-Loya, Gabriela K Fragiadakis
{"title":"Single-Cell Multiomics.","authors":"Emily Flynn, Ana Almonte-Loya, Gabriela K Fragiadakis","doi":"10.1146/annurev-biodatasci-020422-050645","DOIUrl":"10.1146/annurev-biodatasci-020422-050645","url":null,"abstract":"<p><p>Single-cell RNA sequencing methods have led to improved understanding of the heterogeneity and transcriptomic states present in complex biological systems. Recently, the development of novel single-cell technologies for assaying additional modalities, specifically genomic, epigenomic, proteomic, and spatial data, allows for unprecedented insight into cellular biology. While certain technologies collect multiple measurements from the same cells simultaneously, even when modalities are separately assayed in different cells, we can apply novel computational methods to integrate these data. The application of computational integration methods to multimodal paired and unpaired data results in rich information about the identities of the cells present and the interactions between different levels of biology, such as between genetic variation and transcription. In this review, we both discuss the single-cell technologies for measuring these modalities and describe and characterize a variety of computational integration methods for combining the resulting data to leverage multimodal information toward greater biological insight.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"313-337"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11146013/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9960510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recent Developments in Ultralarge and Structure-Based Virtual Screening Approaches.","authors":"Christoph Gorgulla","doi":"10.1146/annurev-biodatasci-020222-025013","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-020222-025013","url":null,"abstract":"<p><p>Drug development is a wide scientific field that faces many challenges these days. Among them are extremely high development costs, long development times, and a small number of new drugs that are approved each year. New and innovative technologies are needed to solve these problems that make the drug discovery process of small molecules more time and cost efficient, and that allow previously undruggable receptor classes to be targeted, such as protein-protein interactions. Structure-based virtual screenings (SBVSs) have become a leading contender in this context. In this review, we give an introduction to the foundations of SBVSs and survey their progress in the past few years with a focus on ultralarge virtual screenings (ULVSs). We outline key principles of SBVSs, recent success stories, new screening techniques, available deep learning-based docking methods, and promising future research directions. ULVSs have an enormous potential for the development of new small-molecule drugs and are already starting to transform early-stage drug discovery.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"229-258"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9960543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mira N Moufarrej, Diana W Bianchi, Gary M Shaw, David K Stevenson, Stephen R Quake
{"title":"Noninvasive Prenatal Testing Using Circulating DNA and RNA: Advances, Challenges, and Possibilities.","authors":"Mira N Moufarrej, Diana W Bianchi, Gary M Shaw, David K Stevenson, Stephen R Quake","doi":"10.1146/annurev-biodatasci-020722-094144","DOIUrl":"10.1146/annurev-biodatasci-020722-094144","url":null,"abstract":"<p><p>Prenatal screening using sequencing of circulating cell-free DNA has transformed obstetric care over the past decade and significantly reduced the number of invasive diagnostic procedures like amniocentesis for genetic disorders. Nonetheless, emergency care remains the only option for complications like preeclampsia and preterm birth, two of the most prevalent obstetrical syndromes. Advances in noninvasive prenatal testing expand the scope of precision medicine in obstetric care. In this review, we discuss advances, challenges, and possibilities toward the goal of providing proactive, personalized prenatal care. The highlighted advances focus mainly on cell-free nucleic acids; however, we also review research that uses signals from metabolomics, proteomics, intact cells, and the microbiome. We discuss ethical challenges in providing care. Finally, we look to future possibilities, including redefining disease taxonomy and moving from biomarker correlation to biological causation.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"397-418"},"PeriodicalIF":7.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10528197/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9969611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuai Ma, Xu Chi, Yusheng Cai, Zhejun Ji, Si Wang, Jie Ren, Guang-Hui Liu
{"title":"Decoding Aging Hallmarks at the Single-Cell Level.","authors":"Shuai Ma, Xu Chi, Yusheng Cai, Zhejun Ji, Si Wang, Jie Ren, Guang-Hui Liu","doi":"10.1146/annurev-biodatasci-020722-120642","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-020722-120642","url":null,"abstract":"<p><p>Organismal aging exhibits wide-ranging hallmarks in divergent cell types across tissues, organs, and systems. The advancement of single-cell technologies and generation of rich datasets have afforded the scientific community the opportunity to decode these hallmarks of aging at an unprecedented scope and resolution. In this review, we describe the technological advancements and bioinformatic methodologies enabling data interpretation at the cellular level. Then, we outline the application of such technologies for decoding aging hallmarks and potential intervention targets and summarize common themes and context-specific molecular features in representative organ systems across the body. Finally, we provide a brief summary of available databases relevant for aging research and present an outlook on the opportunities in this emerging field.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"129-152"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10023274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taojunfeng Su, Michael A R Hollas, Ryan T Fellers, Neil L Kelleher
{"title":"Identification of Splice Variants and Isoforms in Transcriptomics and Proteomics.","authors":"Taojunfeng Su, Michael A R Hollas, Ryan T Fellers, Neil L Kelleher","doi":"10.1146/annurev-biodatasci-020722-044021","DOIUrl":"10.1146/annurev-biodatasci-020722-044021","url":null,"abstract":"<p><p>Alternative splicing is pivotal to the regulation of gene expression and protein diversity in eukaryotic cells. The detection of alternative splicing events requires specific omics technologies. Although short-read RNA sequencing has successfully supported a plethora of investigations on alternative splicing, the emerging technologies of long-read RNA sequencing and top-down mass spectrometry open new opportunities to identify alternative splicing and protein isoforms with less ambiguity. Here, we summarize improvements in short-read RNA sequencing for alternative splicing analysis, including percent splicing index estimation and differential analysis. We also review the computational methods used in top-down proteomics analysis regarding proteoform identification, including the construction of databases of protein isoforms and statistical analyses of search results. While many improvements in sequencing and computational methods will result from emerging technologies, there should be future endeavors to increase the effectiveness, integration, and proteome coverage of alternative splicing events.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"357-376"},"PeriodicalIF":7.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10840079/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10339608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kelsey R Mayo, Melissa A Basford, Robert J Carroll, Moira Dillon, Heather Fullen, Jesse Leung, Hiral Master, Shimon Rura, Lina Sulieman, Nan Kennedy, Eric Banks, David Bernick, Asmita Gauchan, Lee Lichtenstein, Brandy M Mapes, Kayla Marginean, Steve L Nyemba, Andrea Ramirez, Charissa Rotundo, Keri Wolfe, Weiyi Xia, Romuladus E Azuine, Robert M Cronin, Joshua C Denny, Abel Kho, Christopher Lunt, Bradley Malin, Karthik Natarajan, Consuelo H Wilkins, Hua Xu, George Hripcsak, Dan M Roden, Anthony A Philippakis, David Glazer, Paul A Harris
{"title":"The <i>All of Us</i> Data and Research Center: Creating a Secure, Scalable, and Sustainable Ecosystem for Biomedical Research.","authors":"Kelsey R Mayo, Melissa A Basford, Robert J Carroll, Moira Dillon, Heather Fullen, Jesse Leung, Hiral Master, Shimon Rura, Lina Sulieman, Nan Kennedy, Eric Banks, David Bernick, Asmita Gauchan, Lee Lichtenstein, Brandy M Mapes, Kayla Marginean, Steve L Nyemba, Andrea Ramirez, Charissa Rotundo, Keri Wolfe, Weiyi Xia, Romuladus E Azuine, Robert M Cronin, Joshua C Denny, Abel Kho, Christopher Lunt, Bradley Malin, Karthik Natarajan, Consuelo H Wilkins, Hua Xu, George Hripcsak, Dan M Roden, Anthony A Philippakis, David Glazer, Paul A Harris","doi":"10.1146/annurev-biodatasci-122120-104825","DOIUrl":"10.1146/annurev-biodatasci-122120-104825","url":null,"abstract":"<p><p>The <i>All of Us</i> Research Program's Data and Research Center (DRC) was established to help acquire, curate, and provide access to one of the world's largest and most diverse datasets for precision medicine research. Already, over 500,000 participants are enrolled in <i>All of Us</i>, 80% of whom are underrepresented in biomedical research, and data are being analyzed by a community of over 2,300 researchers. The DRC created this thriving data ecosystem by collaborating with engaged participants, innovative program partners, and empowered researchers. In this review, we first describe how the DRC is organized to meet the needs of this broad group of stakeholders. We then outline guiding principles, common challenges, and innovative approaches used to build the <i>All of Us</i> data ecosystem. Finally, we share lessons learned to help others navigate important decisions and trade-offs in building a modern biomedical data platform.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"443-464"},"PeriodicalIF":7.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11157478/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10040579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Review of and Roadmap for Data Science and Machine Learning for the Neuropsychiatric Phenotype of Autism","authors":"Peter Washington, D. Wall","doi":"10.48550/arXiv.2303.03577","DOIUrl":"https://doi.org/10.48550/arXiv.2303.03577","url":null,"abstract":"Autism spectrum disorder (autism) is a neurodevelopmental delay that affects at least 1 in 44 children. Like many neurological disorder phenotypes, the diagnostic features are observable, can be tracked over time, and can be managed or even eliminated through proper therapy and treatments. However, there are major bottlenecks in the diagnostic, therapeutic, and longitudinal tracking pipelines for autism and related neurodevelopmental delays, creating an opportunity for novel data science solutions to augment and transform existing workflows and provide increased access to services for affected families. Several efforts previously conducted by a multitude of research labs have spawned great progress toward improved digital diagnostics and digital therapies for children with autism. We review the literature on digital health methods for autism behavior quantification and beneficial therapies using data science. We describe both case-control studies and classification systems for digital phenotyping. We then discuss digital diagnostics and therapeutics that integrate machine learning models of autism-related behaviors, including the factors that must be addressed for translational use. Finally, we describe ongoing challenges and potential opportunities for the field of autism data science. Given the heterogeneous nature of autism and the complexities of the relevant behaviors, this review contains insights that are relevant to neurological behavior analysis and digital psychiatry more broadly. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 6 is August 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"1 1","pages":""},"PeriodicalIF":6.0,"publicationDate":"2023-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47897781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Wang, Kristin Tsuo, Masahiro Kanai, Benjamin M Neale, Alicia R Martin
{"title":"Challenges and Opportunities for Developing More Generalizable Polygenic Risk Scores.","authors":"Ying Wang, Kristin Tsuo, Masahiro Kanai, Benjamin M Neale, Alicia R Martin","doi":"10.1146/annurev-biodatasci-111721-074830","DOIUrl":"10.1146/annurev-biodatasci-111721-074830","url":null,"abstract":"<p><p>Polygenic risk scores (PRS) estimate an individual's genetic likelihood of complex traits and diseases by aggregating information across multiple genetic variants identified from genome-wide association studies. PRS can predict a broad spectrum of diseases and have therefore been widely used in research settings. Some work has investigated their potential applications as biomarkers in preventative medicine, but significant work is still needed to definitively establish and communicate absolute risk to patients for genetic and modifiable risk factors across demographic groups. However, the biggest limitation of PRS currently is that they show poor generalizability across diverse ancestries and cohorts. Major efforts are underway through methodological development and data generation initiatives to improve their generalizability. This review aims to comprehensively discuss current progress on the development of PRS, the factors that affect their generalizability, and promising areas for improving their accuracy, portability, and implementation.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"293-320"},"PeriodicalIF":7.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9828290/pdf/nihms-1857872.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10555201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}