{"title":"How Can Data Science Revolutionize Humanitarian Crises?","authors":"L. Vittert, Rita Ko","doi":"10.1162/99608f92.57898732","DOIUrl":"https://doi.org/10.1162/99608f92.57898732","url":null,"abstract":"","PeriodicalId":73195,"journal":{"name":"Harvard data science review","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49496841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Kazmi, Ingrid Munné-Collado, Khaoula Tidriri, L. Nordström, F. Gielen, J. Driesen
{"title":"Data Science and Energy: Some Lessons from Europe on Higher Education Course Design and Delivery","authors":"H. Kazmi, Ingrid Munné-Collado, Khaoula Tidriri, L. Nordström, F. Gielen, J. Driesen","doi":"10.1162/99608f92.fd504fc4","DOIUrl":"https://doi.org/10.1162/99608f92.fd504fc4","url":null,"abstract":"Data science is seen as a key enabler for technologies that help decarbonize global energy use. However, the energy sector continues to struggle to attract and train enough data scientists. The primary reason for this is the lack of emphasis on data science in most graduate programs in energy engineering, and the high barriers of entry for data scientists from other sectors. In this article, we present a snapshot of the data science–related curriculum being taught in graduate energy programs in four different European universities as well as include feedback we received from students and alumni of these programs. While knowledge of data science remains low across the board, students in these programs already recognize data science as an important element of their future professional careers. We also present findings from running three separate iterations of an energy data science course we developed in light of this feedback—one of these iterations was offered only in KU Leuven (Belgium), while the other two were accessible to students at all four universities. In the article, we also discuss challenges and opportunities arising from designing and delivering courses in a cross-university context. This foundational course and others like it are seen as a necessary means to enable students to take more specialized courses in data science, and eventually contribute toward realizing a sustainable energy transition and meeting climate change mitigation objectives.","PeriodicalId":73195,"journal":{"name":"Harvard data science review","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42542270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"When Lions Write: An American Immigrant Story","authors":"T. Olubunmi","doi":"10.1162/99608f92.08b08a97","DOIUrl":"https://doi.org/10.1162/99608f92.08b08a97","url":null,"abstract":"","PeriodicalId":73195,"journal":{"name":"Harvard data science review","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42833070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"When Quantum Computation Meets Data Science: Making Data Science Quantum","authors":"Yazhen Wang","doi":"10.1162/99608f92.ef5d8928","DOIUrl":"https://doi.org/10.1162/99608f92.ef5d8928","url":null,"abstract":"Quantum computation and quantum information have attracted considerable attention on multiple frontiers of scientific fields ranging from physics to chemistry and engineering, as well as from computer science to mathematics and statistics. Data science combines statistical methods, computational algorithms, and domain science information to extract knowledge and insights from big data, and to solve complex real world problems. While it is well-known that quantum computation has the potential to revolutionize data science, much less has been said about the potential of data science to advance quantum computation. Yet because the stochasticity of quantum physics renders quantum computation random, data science can play an important role in the development of quantum computation and quantum information. This article gives an overview of quantum computation and promotes interplay between quantum science and data science. Overall, it advocates for the development of quantum data science for advancing quantum computation and quantum information.","PeriodicalId":73195,"journal":{"name":"Harvard data science review","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45445481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Science on a Future Quantum Internet","authors":"Nana Liu","doi":"10.1162/99608f92.32fa682f","DOIUrl":"https://doi.org/10.1162/99608f92.32fa682f","url":null,"abstract":"","PeriodicalId":73195,"journal":{"name":"Harvard data science review","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44723060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Development of Quantum Machine Learning","authors":"K. Najafi, S. Yelin, Xun Gao","doi":"10.1162/99608f92.5a9fd72c","DOIUrl":"https://doi.org/10.1162/99608f92.5a9fd72c","url":null,"abstract":"","PeriodicalId":73195,"journal":{"name":"Harvard data science review","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49655009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Flush.","authors":"Xiaotong Shen, Xuan Bi, Rex Shen","doi":"10.1162/99608f92.681fe3bd","DOIUrl":"https://doi.org/10.1162/99608f92.681fe3bd","url":null,"abstract":"<p><p>Data perturbation is a technique for generating synthetic data by adding \"noise\" to raw data, which has an array of applications in science and engineering, primarily in data security and privacy. One challenge for data perturbation is that it usually produces synthetic data resulting in information loss at the expense of privacy protection. The information loss, in turn, renders the accuracy loss for any statistical or machine learning method based on the synthetic data, weakening downstream analysis and deteriorating in machine learning. In this article, we introduce and advocate a fundamental principle of data perturbation, which requires the preservation of the distribution of raw data. To achieve this, we propose a new scheme, named <i>data flush</i>, which ascertains the validity of the downstream analysis and maintains the predictive accuracy of a learning task. It perturbs data nonlinearly while accommodating the requirement of strict privacy protection, for instance, differential privacy. We highlight multiple facets of data flush through examples.</p>","PeriodicalId":73195,"journal":{"name":"Harvard data science review","volume":"4 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9997048/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10297631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Family of Single-Case Experimental Designs.","authors":"Leonard H Epstein, Jesse Dallery","doi":"10.1162/99608f92.ff9300a8","DOIUrl":"10.1162/99608f92.ff9300a8","url":null,"abstract":"<p><p>Single-case experimental designs (SCEDs) represent a family of research designs that use experimental methods to study the effects of treatments on outcomes. The fundamental unit of analysis is the single case-which can be an individual, clinic, or community-ideally with replications of effects within and/or between cases. These designs are flexible and cost-effective and can be used for treatment development, translational research, personalized interventions, and the study of rare diseases and disorders. This article provides a broad overview of the family of single-case experimental designs with corresponding examples, including reversal designs, multiple baseline designs, combined multiple baseline/reversal designs, and integration of single-case designs to identify optimal treatments for individuals into larger randomized controlled trials (RCTs). Personalized N-of-1 trials can be considered a subcategory of SCEDs that overlaps with reversal designs. Relevant issues for each type of design-including comparisons of treatments, design issues such as randomization and blinding, standards for designs, and statistical approaches to complement visual inspection of single-case experimental designs-are also discussed.</p>","PeriodicalId":73195,"journal":{"name":"Harvard data science review","volume":"4 SI3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10016625/pdf/nihms-1842588.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10536718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personalized (N-of-1) Trials for Patient-Centered Treatments of Multimorbidity.","authors":"Jerry M Suls, Catherine Alfano, Christina Yap","doi":"10.1162/99608f92.d99e6ff5","DOIUrl":"10.1162/99608f92.d99e6ff5","url":null,"abstract":"<p><p>Treatment of patients who suffer from concurrent health conditions is not well served by (1) evidence-based clinical guidelines that mainly specify treatment of single conditions and (2) conventional randomized controlled trials (RCTs) that identify treatments as safe and effective on <i>average</i>. Clinical decision-making based on the average patient effect may be inappropriate for treatment of those with multimorbidity who experience burdens and obstacles that may be unique to their personal situation. We describe how the personalized (N-of-1) trials can be integrated with an automatic platform and virtual/remote technologies to improve patient-centered care for those living with multimorbidity. To illustrate, we present a hypothetical clinical scenario-survivors of both coronavirus disease 2019 (COVID-19) and cancer who chronically suffer from sleeplessness and fatigue. Then, we will describe how the four standard phases of conventional RCT development can be modified for personalized trials and applied to the multimorbidity clinical scenario, outline how personalized trials can be adapted and extended to compare the benefits of personalized trials versus between-subject trial design, and explain how personalized trials can address special problems associated with multimorbidity for which conventional trials are poorly suited.</p>","PeriodicalId":73195,"journal":{"name":"Harvard data science review","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10673634/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47706300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William Dempsey, Ian Foster, Scott Fraser, Carl Kesselman
{"title":"Sharing Begins at Home: How Continuous and Ubiquitous FAIRness Can Enhance Research Productivity and Data Reuse.","authors":"William Dempsey, Ian Foster, Scott Fraser, Carl Kesselman","doi":"10.1162/99608f92.44d21b86","DOIUrl":"10.1162/99608f92.44d21b86","url":null,"abstract":"<p><p>The broad sharing of research data is widely viewed as critical for the speed, quality, accessibility, and integrity of science. Despite increasing efforts to encourage data sharing, both the quality of shared data and the frequency of data reuse remain stubbornly low. We argue here that a significant reason for this unfortunate state of affairs is that the organization of research results in the findable, accessible, interoperable, and reusable (FAIR) form required for reuse is too often deferred to the end of a research project when preparing publications-by which time essential details are no longer accessible. Thus, we propose an approach to research informatics in which FAIR principles are applied <i>continuously</i>, from the inception of a research project and <i>ubiquitously</i>, to every data asset produced by experiment or computation. We suggest that this seemingly challenging task can be made feasible by the adoption of simple tools, such as lightweight identifiers (to ensure that every data asset is findable), packaging methods (to facilitate understanding of data contents), data access methods, and metadata organization and structuring tools (to support schema development and evolution). We use an example from experimental neuroscience to illustrate how these methods can work in practice.</p>","PeriodicalId":73195,"journal":{"name":"Harvard data science review","volume":"4 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9410569/pdf/nihms-1829357.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33444431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}