{"title":"Genome Privacy and Trust.","authors":"Gamze Gürsoy","doi":"10.1146/annurev-biodatasci-122120-021311","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-122120-021311","url":null,"abstract":"<p><p>Genomics data are important for advancing biomedical research, improving clinical care, and informing other disciplines such as forensics and genealogy. However, privacy concerns arise when genomic data are shared. In particular, the identifying nature of genetic information, its direct relationship to health status, and the potential financial harm and stigmatization posed to individuals and their blood relatives call for a survey of the privacy issues related to sharing genetic and related data and potential solutions to overcome these issues. In this work, we provide an overview of the importance of genomic privacy, the information gleaned from genomics data, the sources of potential private information leakages in genomics, and ways to preserve privacy while utilizing the genetic information in research. We discuss the relationship between trust in the scientific community and protecting privacy, illuminating a future roadmap for data sharing and study participation.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"163-181"},"PeriodicalIF":6.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9116494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fabián Morales-Polanco, Jae Ho Lee, Natália M Barbosa, Judith Frydman
{"title":"Cotranslational Mechanisms of Protein Biogenesis and Complex Assembly in Eukaryotes.","authors":"Fabián Morales-Polanco, Jae Ho Lee, Natália M Barbosa, Judith Frydman","doi":"10.1146/annurev-biodatasci-121721-095858","DOIUrl":"10.1146/annurev-biodatasci-121721-095858","url":null,"abstract":"<p><p>The formation of protein complexes is crucial to most biological functions. The cellular mechanisms governing protein complex biogenesis are not yet well understood, but some principles of cotranslational and posttranslational assembly are beginning to emerge. In bacteria, this process is favored by operons encoding subunits of protein complexes. Eukaryotic cells do not have polycistronic mRNAs, raising the question of how they orchestrate the encounter of unassembled subunits. Here we review the constraints and mechanisms governing eukaryotic co- and posttranslational protein folding and assembly, including the influence of elongation rate on nascent chain targeting, folding, and chaperone interactions. Recent evidence shows that mRNAs encoding subunits of oligomeric assemblies can undergo localized translation and form cytoplasmic condensates that might facilitate the assembly of protein complexes. Understanding the interplay between localized mRNA translation and cotranslational proteostasis will be critical to defining protein complex assembly in vivo.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"67-94"},"PeriodicalIF":6.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11040709/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9769322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Venexia M Walker, Jie Zheng, Tom R Gaunt, George Davey Smith
{"title":"Phenotypic Causal Inference Using Genome-Wide Association Study Data: Mendelian Randomization and Beyond.","authors":"Venexia M Walker, Jie Zheng, Tom R Gaunt, George Davey Smith","doi":"10.1146/annurev-biodatasci-122120-024910","DOIUrl":"10.1146/annurev-biodatasci-122120-024910","url":null,"abstract":"<p><p>statistics for genome-wide association studies (GWAS) are increasingly available for downstream analyses. Meanwhile, the popularity of causal inference methods has grown as we look to gather robust evidence for novel medical and public health interventions. This has led to the development of methods that use GWAS summary statistics for causal inference. Here, we describe these methods in order of their escalating complexity, from genetic associations to extensions of Mendelian randomization that consider thousands of phenotypes simultaneously. We also cover the assumptions and limitations of these approaches before considering the challenges faced by researchers performing causal inference using GWAS data. GWAS summary statistics constitute an important data source for causal inference research that offers a counterpoint to nongenetic methods when triangulating evidence. Continued efforts to address the challenges in using GWAS data for causal inference will allow the full impact of these approaches to be realized.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"1-17"},"PeriodicalIF":7.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7614231/pdf/EMS167448.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10780371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Su Golder, Karen O'Connor, Yunwen Wang, Robin Stevens, Graciela Gonzalez-Hernandez
{"title":"Best Practices on Big Data Analytics to Address Sex-Specific Biases in Our Understanding of the Etiology, Diagnosis, and Prognosis of Diseases.","authors":"Su Golder, Karen O'Connor, Yunwen Wang, Robin Stevens, Graciela Gonzalez-Hernandez","doi":"10.1146/annurev-biodatasci-122120-025806","DOIUrl":"10.1146/annurev-biodatasci-122120-025806","url":null,"abstract":"<p><p>A bias in health research to favor understanding diseases as they present in men can have a grave impact on the health of women. This paper reports on a conceptual review of the literature on machine learning or natural language processing (NLP) techniques to interrogate big data for identifying sex-specific health disparities. We searched Ovid MEDLINE, Embase, and PsycINFO in October 2021 using synonyms and indexing terms for (<i>a</i>) \"women,\" \"men,\" or \"sex\"; (<i>b</i>) \"big data,\" \"artificial intelligence,\" or \"NLP\"; and (<i>c</i>) \"disparities\" or \"differences.\" From 902 records, 22 studies met the inclusion criteria and were analyzed. Results demonstrate that the inclusion by sex is inconsistent and often unreported, although the inclusion of men in these studies is disproportionately less than women. Even though artificial intelligence and NLP techniques are widely applied in healthresearch, few studies use them to take advantage of unstructured text to investigate sex-related differences or disparities. Researchers are increasingly aware of sex-based data bias, but the process toward correction is slow. We reflect on best practices on using big data analytics to address sex-specific biases in understanding the etiology, diagnosis, and prognosis of diseases.</p>","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":"5 ","pages":"251-267"},"PeriodicalIF":7.0,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11524028/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142366765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discovering Biological Conflict Systems Through Genome Analysis: Evolutionary Principles and Biochemical Novelty.","authors":"L. Aravind, L. Iyer, A. M. Burroughs","doi":"10.1146/annurev-biodatasci-122220-101119","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-122220-101119","url":null,"abstract":"Biological replicators, from genes within a genome to whole organisms, are locked in conflicts. Comparative genomics has revealed a staggering diversity of molecular armaments and mechanisms regulating their deployment, collectively termed biological conflict systems. These encompass toxins used in inter- and intraspecific interactions, self/nonself discrimination, antiviral immune mechanisms, and counter-host effectors deployed by viruses and intragenomic selfish elements. These systems possess shared syntactical features in their organizational logic and a set of effectors targeting genetic information flow through the Central Dogma, certain membranes, and key molecules like NAD+. These principles can be exploited to discover new conflict systems through sensitive computational analyses. This has led to significant advances in our understanding of the biology of these systems and furnished new biotechnological reagents for genome editing, sequencing, and beyond. We discuss these advances using specific examples of toxins, restriction-modification, apoptosis, CRISPR/second messenger-regulated systems, and other enigmatic nucleic acid-targeting systems. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42599953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Atkins, C. A. Makridis, G. Alterovitz, R. Ramoni, C. Clancy
{"title":"Developing and Implementing Predictive Models in a Learning Healthcare System: Traditional and Artificial Intelligence Approaches in the Veterans Health Administration.","authors":"D. Atkins, C. A. Makridis, G. Alterovitz, R. Ramoni, C. Clancy","doi":"10.1146/annurev-biodatasci-122220-110053","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-122220-110053","url":null,"abstract":"Predicting clinical risk is an important part of healthcare and can inform decisions about treatments, preventive interventions, and provision of extra services. The field of predictive models has been revolutionized over the past two decades by electronic health record data; the ability to link such data with other demographic, socioeconomic, and geographic information; the availability of high-capacity computing; and new machine learning and artificial intelligence methods for extracting insights from complex datasets. These advances have produced a new generation of computerized predictive models, but debate continues about their development, reporting, validation, evaluation, and implementation. In this review we reflect on more than 10 years of experience at the Veterans Health Administration, the largest integrated healthcare system in the United States, in developing, testing, and implementing such models at scale. We report lessons from the implementation of national risk prediction models and suggest an agenda for research. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47617284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew A. Lin, V. Nimgaonkar, D. Issadore, E. Carpenter
{"title":"Extracellular Vesicle-Based Multianalyte Liquid Biopsy as a Diagnostic for Cancer.","authors":"Andrew A. Lin, V. Nimgaonkar, D. Issadore, E. Carpenter","doi":"10.1146/annurev-biodatasci-122120-113218","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-122120-113218","url":null,"abstract":"Liquid biopsy is the analysis of materials shed by tumors into circulation, such as circulating tumor cells, nucleic acids, and extracellular vesicles (EVs), for the diagnosis and management of cancer. These assays have rapidly evolved with recent FDA approvals of single biomarkers in patients with advanced metastatic disease. However, they have lacked sensitivity or specificity as a diagnostic in early-stage cancer, primarily due to low concentrations in circulating plasma. EVs, membrane-enclosed nanoscale vesicles shed by tumor and other cells into circulation, are a promising liquid biopsy analyte owing to their protein and nucleic acid cargoes carried from their mother cells, their surface proteins specific to their cells of origin, and their higher concentrations over other noninvasive biomarkers across disease stages. Recently, the combination of EVs with non-EV biomarkers has driven improvements in sensitivity and accuracy; this has been fueled by the use of machine learning (ML) to algorithmically identify and combine multiple biomarkers into a composite biomarker for clinical prediction. This review presents an analysis of EV isolation methods, surveys approaches for and issues with using ML in multianalyte EV datasets, and describes best practices for bringing multianalyte liquid biopsy to clinical implementation. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49480054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exchange of Human Data Across International Boundaries.","authors":"H. Bentzen","doi":"10.1146/annurev-biodatasci-122220-110811","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-122220-110811","url":null,"abstract":"There is a need to share personal data across jurisdictional boundaries. However, the laws regulating such transfers are not harmonized, and sometimes even conflict, causing challenges and occasional data stalls. This review describes the legal landscape for transfer of human data across international boundaries. The European Union's data protection legislation is used as the starting point for illustrating the legislation of countries across the world, how these diverge, and one's options for exchanging human data internationally in a legally compliant manner. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45740463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computational Approaches for Understanding Sequence Variation Effects on the 3D Genome Architecture.","authors":"P. Avdeyev, Jian Zhou","doi":"10.1146/annurev-biodatasci-102521-012018","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-102521-012018","url":null,"abstract":"Decoding how genomic sequence and its variations affect 3D genome architecture is indispensable for understanding the genetic architecture of various traits and diseases. The 3D genome organization can be significantly altered by genome variations and in turn impact the function of the genomic sequence. Techniques for measuring the 3D genome architecture across spatial scales have opened up new possibilities for understanding how the 3D genome depends upon the genomic sequence and how it can be altered by sequence variations. Computational methods have become instrumental in analyzing and modeling the sequence effects on 3D genome architecture, and recent development in deep learning sequence models have opened up new opportunities for studying the interplay between sequence variations and the 3D genome. In this review, we focus on computational approaches for both the detection and modeling of sequence variation effects on the 3D genome, and we discuss the opportunities presented by these approaches. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48686416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bioinformatics of Corals: Investigating Heterogeneous Omics Data from Coral Holobionts for Insight into Reef Health and Resilience.","authors":"L. Cowen, H. Putnam","doi":"10.1146/annurev-biodatasci-122120-030732","DOIUrl":"https://doi.org/10.1146/annurev-biodatasci-122120-030732","url":null,"abstract":"Coral reefs are home to over two million species and provide habitat for roughly 25% of all marine animals, but they are being severely threatened by pollution and climate change. A large amount of genomic, transcriptomic, and other omics data is becoming increasingly available from different species of reef-building corals, the unicellular dinoflagellates, and the coral microbiome (bacteria, archaea, viruses, fungi, etc.). Such new data present an opportunity for bioinformatics researchers and computational biologists to contribute to a timely, compelling, and urgent investigation of critical factors that influence reef health and resilience. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.","PeriodicalId":29775,"journal":{"name":"Annual Review of Biomedical Data Science","volume":" ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43372901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}