Emily Flynn, Ana Almonte-Loya, Gabriela K Fragiadakis
{"title":"Single-Cell Multiomics.","authors":"Emily Flynn, Ana Almonte-Loya, Gabriela K Fragiadakis","doi":"10.1146/annurev-biodatasci-020422-050645","DOIUrl":"10.1146/annurev-biodatasci-020422-050645","url":null,"abstract":"<p><p>Single-cell RNA sequencing methods have led to improved understanding of the heterogeneity and transcriptomic states present in complex biological systems. Recently, the development of novel single-cell technologies for assaying additional modalities, specifically genomic, epigenomic, proteomic, and spatial data, allows for unprecedented insight into cellular biology. While certain technologies collect multiple measurements from the same cells simultaneously, even when modalities are separately assayed in different cells, we can apply novel computational methods to integrate these data. The application of computational integration methods to multimodal paired and unpaired data results in rich information about the identities of the cells present and the interactions between different levels of biology, such as between genetic variation and transcription. In this review, we both discuss the single-cell technologies for measuring these modalities and describe and characterize a variety of computational integration methods for combining the resulting data to leverage multimodal information toward greater biological insight.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"313-337"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11146013/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9960510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recent Developments in Ultralarge and Structure-Based Virtual Screening Approaches.","authors":"Christoph Gorgulla","doi":"10.1146/annurev-biodatasci-020222-025013","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-020222-025013","url":null,"abstract":"<p><p>Drug development is a wide scientific field that faces many challenges these days. Among them are extremely high development costs, long development times, and a small number of new drugs that are approved each year. New and innovative technologies are needed to solve these problems that make the drug discovery process of small molecules more time and cost efficient, and that allow previously undruggable receptor classes to be targeted, such as protein-protein interactions. Structure-based virtual screenings (SBVSs) have become a leading contender in this context. In this review, we give an introduction to the foundations of SBVSs and survey their progress in the past few years with a focus on ultralarge virtual screenings (ULVSs). We outline key principles of SBVSs, recent success stories, new screening techniques, available deep learning-based docking methods, and promising future research directions. ULVSs have an enormous potential for the development of new small-molecule drugs and are already starting to transform early-stage drug discovery.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"229-258"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9960543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mira N Moufarrej, Diana W Bianchi, Gary M Shaw, David K Stevenson, Stephen R Quake
{"title":"Noninvasive Prenatal Testing Using Circulating DNA and RNA: Advances, Challenges, and Possibilities.","authors":"Mira N Moufarrej, Diana W Bianchi, Gary M Shaw, David K Stevenson, Stephen R Quake","doi":"10.1146/annurev-biodatasci-020722-094144","DOIUrl":"10.1146/annurev-biodatasci-020722-094144","url":null,"abstract":"<p><p>Prenatal screening using sequencing of circulating cell-free DNA has transformed obstetric care over the past decade and significantly reduced the number of invasive diagnostic procedures like amniocentesis for genetic disorders. Nonetheless, emergency care remains the only option for complications like preeclampsia and preterm birth, two of the most prevalent obstetrical syndromes. Advances in noninvasive prenatal testing expand the scope of precision medicine in obstetric care. In this review, we discuss advances, challenges, and possibilities toward the goal of providing proactive, personalized prenatal care. The highlighted advances focus mainly on cell-free nucleic acids; however, we also review research that uses signals from metabolomics, proteomics, intact cells, and the microbiome. We discuss ethical challenges in providing care. Finally, we look to future possibilities, including redefining disease taxonomy and moving from biomarker correlation to biological causation.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"397-418"},"PeriodicalIF":7.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10528197/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9969611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuai Ma, Xu Chi, Yusheng Cai, Zhejun Ji, Si Wang, Jie Ren, Guang-Hui Liu
{"title":"Decoding Aging Hallmarks at the Single-Cell Level.","authors":"Shuai Ma, Xu Chi, Yusheng Cai, Zhejun Ji, Si Wang, Jie Ren, Guang-Hui Liu","doi":"10.1146/annurev-biodatasci-020722-120642","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-020722-120642","url":null,"abstract":"<p><p>Organismal aging exhibits wide-ranging hallmarks in divergent cell types across tissues, organs, and systems. The advancement of single-cell technologies and generation of rich datasets have afforded the scientific community the opportunity to decode these hallmarks of aging at an unprecedented scope and resolution. In this review, we describe the technological advancements and bioinformatic methodologies enabling data interpretation at the cellular level. Then, we outline the application of such technologies for decoding aging hallmarks and potential intervention targets and summarize common themes and context-specific molecular features in representative organ systems across the body. Finally, we provide a brief summary of available databases relevant for aging research and present an outlook on the opportunities in this emerging field.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"129-152"},"PeriodicalIF":6.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10023274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taojunfeng Su, Michael A R Hollas, Ryan T Fellers, Neil L Kelleher
{"title":"Identification of Splice Variants and Isoforms in Transcriptomics and Proteomics.","authors":"Taojunfeng Su, Michael A R Hollas, Ryan T Fellers, Neil L Kelleher","doi":"10.1146/annurev-biodatasci-020722-044021","DOIUrl":"10.1146/annurev-biodatasci-020722-044021","url":null,"abstract":"<p><p>Alternative splicing is pivotal to the regulation of gene expression and protein diversity in eukaryotic cells. The detection of alternative splicing events requires specific omics technologies. Although short-read RNA sequencing has successfully supported a plethora of investigations on alternative splicing, the emerging technologies of long-read RNA sequencing and top-down mass spectrometry open new opportunities to identify alternative splicing and protein isoforms with less ambiguity. Here, we summarize improvements in short-read RNA sequencing for alternative splicing analysis, including percent splicing index estimation and differential analysis. We also review the computational methods used in top-down proteomics analysis regarding proteoform identification, including the construction of databases of protein isoforms and statistical analyses of search results. While many improvements in sequencing and computational methods will result from emerging technologies, there should be future endeavors to increase the effectiveness, integration, and proteome coverage of alternative splicing events.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"357-376"},"PeriodicalIF":7.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10840079/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10339608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kelsey R Mayo, Melissa A Basford, Robert J Carroll, Moira Dillon, Heather Fullen, Jesse Leung, Hiral Master, Shimon Rura, Lina Sulieman, Nan Kennedy, Eric Banks, David Bernick, Asmita Gauchan, Lee Lichtenstein, Brandy M Mapes, Kayla Marginean, Steve L Nyemba, Andrea Ramirez, Charissa Rotundo, Keri Wolfe, Weiyi Xia, Romuladus E Azuine, Robert M Cronin, Joshua C Denny, Abel Kho, Christopher Lunt, Bradley Malin, Karthik Natarajan, Consuelo H Wilkins, Hua Xu, George Hripcsak, Dan M Roden, Anthony A Philippakis, David Glazer, Paul A Harris
{"title":"The <i>All of Us</i> Data and Research Center: Creating a Secure, Scalable, and Sustainable Ecosystem for Biomedical Research.","authors":"Kelsey R Mayo, Melissa A Basford, Robert J Carroll, Moira Dillon, Heather Fullen, Jesse Leung, Hiral Master, Shimon Rura, Lina Sulieman, Nan Kennedy, Eric Banks, David Bernick, Asmita Gauchan, Lee Lichtenstein, Brandy M Mapes, Kayla Marginean, Steve L Nyemba, Andrea Ramirez, Charissa Rotundo, Keri Wolfe, Weiyi Xia, Romuladus E Azuine, Robert M Cronin, Joshua C Denny, Abel Kho, Christopher Lunt, Bradley Malin, Karthik Natarajan, Consuelo H Wilkins, Hua Xu, George Hripcsak, Dan M Roden, Anthony A Philippakis, David Glazer, Paul A Harris","doi":"10.1146/annurev-biodatasci-122120-104825","DOIUrl":"10.1146/annurev-biodatasci-122120-104825","url":null,"abstract":"<p><p>The <i>All of Us</i> Research Program's Data and Research Center (DRC) was established to help acquire, curate, and provide access to one of the world's largest and most diverse datasets for precision medicine research. Already, over 500,000 participants are enrolled in <i>All of Us</i>, 80% of whom are underrepresented in biomedical research, and data are being analyzed by a community of over 2,300 researchers. The DRC created this thriving data ecosystem by collaborating with engaged participants, innovative program partners, and empowered researchers. In this review, we first describe how the DRC is organized to meet the needs of this broad group of stakeholders. We then outline guiding principles, common challenges, and innovative approaches used to build the <i>All of Us</i> data ecosystem. Finally, we share lessons learned to help others navigate important decisions and trade-offs in building a modern biomedical data platform.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"6 ","pages":"443-464"},"PeriodicalIF":7.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11157478/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10040579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Review of and Roadmap for Data Science and Machine Learning for the Neuropsychiatric Phenotype of Autism","authors":"Peter Washington, D. Wall","doi":"10.48550/arXiv.2303.03577","DOIUrl":"https://doi.org/10.48550/arXiv.2303.03577","url":null,"abstract":"Autism spectrum disorder (autism) is a neurodevelopmental delay that affects at least 1 in 44 children. Like many neurological disorder phenotypes, the diagnostic features are observable, can be tracked over time, and can be managed or even eliminated through proper therapy and treatments. However, there are major bottlenecks in the diagnostic, therapeutic, and longitudinal tracking pipelines for autism and related neurodevelopmental delays, creating an opportunity for novel data science solutions to augment and transform existing workflows and provide increased access to services for affected families. Several efforts previously conducted by a multitude of research labs have spawned great progress toward improved digital diagnostics and digital therapies for children with autism. We review the literature on digital health methods for autism behavior quantification and beneficial therapies using data science. We describe both case-control studies and classification systems for digital phenotyping. We then discuss digital diagnostics and therapeutics that integrate machine learning models of autism-related behaviors, including the factors that must be addressed for translational use. Finally, we describe ongoing challenges and potential opportunities for the field of autism data science. Given the heterogeneous nature of autism and the complexities of the relevant behaviors, this review contains insights that are relevant to neurological behavior analysis and digital psychiatry more broadly. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 6 is August 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"1 1","pages":""},"PeriodicalIF":6.0,"publicationDate":"2023-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47897781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Wang, Kristin Tsuo, Masahiro Kanai, Benjamin M Neale, Alicia R Martin
{"title":"Challenges and Opportunities for Developing More Generalizable Polygenic Risk Scores.","authors":"Ying Wang, Kristin Tsuo, Masahiro Kanai, Benjamin M Neale, Alicia R Martin","doi":"10.1146/annurev-biodatasci-111721-074830","DOIUrl":"10.1146/annurev-biodatasci-111721-074830","url":null,"abstract":"<p><p>Polygenic risk scores (PRS) estimate an individual's genetic likelihood of complex traits and diseases by aggregating information across multiple genetic variants identified from genome-wide association studies. PRS can predict a broad spectrum of diseases and have therefore been widely used in research settings. Some work has investigated their potential applications as biomarkers in preventative medicine, but significant work is still needed to definitively establish and communicate absolute risk to patients for genetic and modifiable risk factors across demographic groups. However, the biggest limitation of PRS currently is that they show poor generalizability across diverse ancestries and cohorts. Major efforts are underway through methodological development and data generation initiatives to improve their generalizability. This review aims to comprehensively discuss current progress on the development of PRS, the factors that affect their generalizability, and promising areas for improving their accuracy, portability, and implementation.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"293-320"},"PeriodicalIF":7.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9828290/pdf/nihms-1857872.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10555201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dan Ju, Daniel Hui, Dorothy A Hammond, Ambroise Wonkam, Sarah A Tishkoff
{"title":"Importance of Including Non-European Populations in Large Human Genetic Studies to Enhance Precision Medicine.","authors":"Dan Ju, Daniel Hui, Dorothy A Hammond, Ambroise Wonkam, Sarah A Tishkoff","doi":"10.1146/annurev-biodatasci-122220-112550","DOIUrl":"10.1146/annurev-biodatasci-122220-112550","url":null,"abstract":"<p><p>One goal of genomic medicine is to uncover an individual's genetic risk for disease, which generally requires data connecting genotype to phenotype, as done in genome-wide association studies (GWAS). While there may be clinical promise to employing prediction tools such as polygenic risk scores (PRS), it currently stands that individuals of non-European ancestry may not reap the benefits of genomic medicine because of underrepresentation in large-scale genetics studies. Here, we discuss why this inequity poses a problem for genomic medicine and the reasons for the low transferability of PRS across populations. We also survey the ancestry representation of published GWAS and investigate how estimates of ancestry diversity in GWASparticipants might be biased. We highlight the importance of expanding genetic research in Africa, one of the most underrepresented regions in human genomics research, and discuss issues of ethics, resources, and technology for equitable advancement of genomic medicine.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"321-339"},"PeriodicalIF":6.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9904154/pdf/nihms-1864817.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9545868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genome Privacy and Trust.","authors":"Gamze Gürsoy","doi":"10.1146/annurev-biodatasci-122120-021311","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-122120-021311","url":null,"abstract":"<p><p>Genomics data are important for advancing biomedical research, improving clinical care, and informing other disciplines such as forensics and genealogy. However, privacy concerns arise when genomic data are shared. In particular, the identifying nature of genetic information, its direct relationship to health status, and the potential financial harm and stigmatization posed to individuals and their blood relatives call for a survey of the privacy issues related to sharing genetic and related data and potential solutions to overcome these issues. In this work, we provide an overview of the importance of genomic privacy, the information gleaned from genomics data, the sources of potential private information leakages in genomics, and ways to preserve privacy while utilizing the genetic information in research. We discuss the relationship between trust in the scientific community and protecting privacy, illuminating a future roadmap for data sharing and study participation.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"163-181"},"PeriodicalIF":6.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9116494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}