The how of data

IF 16.4 1区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
H. MacGillivray
{"title":"The how of data","authors":"H. MacGillivray","doi":"10.1111/test.12329","DOIUrl":null,"url":null,"abstract":"For many decades, professional statisticians and statistics educators have emphasized the central importance of identifying, taking account of, and reporting the 5 W's of data—What, Why, When, Where, and by Whom. If data are to be collected or accessed, we can add How—how can we obtain the data we need or want. The word “How” used broadly, can also encompass much of the 5 W's, as the What and Why are needed to understand How the necessary or desired data can be obtained, or were obtained. That these are all integral to statistics and statistics investigations has also been emphasized but it can never be highlighted enough that they should be at the heart of teaching statistics, no matter to whom or at what level. It can be a delight for teachers to discover this; I will always remember the excitement of senior school teachers learning this 30 years ago in hands-on professional development workshops— “You mean this is all part of statistics, not just preliminaries to statistics? Wow!”. Unfortunately, learning from discipline and/or teaching frontlines does not necessarily penetrate the citadel of educational authority. The question of the Who, the What, the How, and the How much of teaching statistics in education faculties, whether for future teachers or future research (where the multiple t-test tyranny appears to continue unchecked), is open for a different discussion. As the eras of big data and data science gradually grew and then exploded, the 5 W's and the How of data in teaching have “of course” become even more important and have received renewed attention, as commented by many authors, including in the 2021 special issue of Teaching Statistics. But as Shatz [6] reminds us in this issue, we should avoid saying “of course” and be ever mindful of the perpetual need to both explain and illuminate what statistics is, including that the central roles of the 5 W's and the How of data are of critical importance in real data science. In this issue, Lasater et al [2] highlight that “two critical learning elements now are working with complex publically-available datasets and choice and use of appropriate visualization in investigating multivariable data.” In [2], “These are the focus of the lab activity described here, set in an important social context.” Expansion to complex, large publically-available datasets and technologically intensive procedures does not mean relegation of other types of datasets or data collections. It just means the big tent of statistics and statistics teaching got even bigger. Collecting data, observing data, experimental design, and surveys still have major roles to play across all of statistics and its applications, and in teaching. But no matter what type or size of dataset, and no matter what the teaching context, without knowing, taking account of, and reporting on the 5 W's and the How of the data, analysis and interpretation may be compromised. Three articles in this issue provide excellent illustrations of this in different teaching and/or statistical contexts. All three focus on aspects of measurement and design, and all three demonstrate the critical importance of full knowledge of source, nature, and context of data. Whichever are the directions in which statistics and data science and their teaching go, instructors will continue to seek, as they always have, interesting real datasets and rich contexts to introduce, lead into, or illustrate statistical concepts, models, visualizations, technologies, methods, or analyses. Because of the nature of statistics, a variety of datasets for student experiential learning is invaluable. Since combining the use of subsets of larger and/or multivariable datasets, and of smaller more specific datasets, provides good pedagogical balance, instructors are always appreciative of resources of real datasets in real contexts with a specified number of variables of a specified type. In “Bare bones, or a rich feast?” [1], Sue Finch and Ian Gordon discuss the source information provided for datasets in the R “datasets” package, finding that for “69% there were obvious questions about units, factor levels, and/or design or measurement”, and do an extensive investigation into four potentially useful for teaching linear models with one or two categorical explanatory variables. Their findings are that impoverished data landscapes, sometimes even with potentially misleading or wrong contexts, can lead to “Sanitized versions of the reality behind the data fail(ing) to reflect the complexity and messiness that arise in practice” and “missed opportunities in teaching and learning” as well as credibility issues in analyses and interpretations. The authors conclude their investigations with some guidelines on the curation and documentation of datasets for teaching resources, with particular emphasis on measurement and design. DOI: 10.1111/test.12329","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2022-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/test.12329","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

For many decades, professional statisticians and statistics educators have emphasized the central importance of identifying, taking account of, and reporting the 5 W's of data—What, Why, When, Where, and by Whom. If data are to be collected or accessed, we can add How—how can we obtain the data we need or want. The word “How” used broadly, can also encompass much of the 5 W's, as the What and Why are needed to understand How the necessary or desired data can be obtained, or were obtained. That these are all integral to statistics and statistics investigations has also been emphasized but it can never be highlighted enough that they should be at the heart of teaching statistics, no matter to whom or at what level. It can be a delight for teachers to discover this; I will always remember the excitement of senior school teachers learning this 30 years ago in hands-on professional development workshops— “You mean this is all part of statistics, not just preliminaries to statistics? Wow!”. Unfortunately, learning from discipline and/or teaching frontlines does not necessarily penetrate the citadel of educational authority. The question of the Who, the What, the How, and the How much of teaching statistics in education faculties, whether for future teachers or future research (where the multiple t-test tyranny appears to continue unchecked), is open for a different discussion. As the eras of big data and data science gradually grew and then exploded, the 5 W's and the How of data in teaching have “of course” become even more important and have received renewed attention, as commented by many authors, including in the 2021 special issue of Teaching Statistics. But as Shatz [6] reminds us in this issue, we should avoid saying “of course” and be ever mindful of the perpetual need to both explain and illuminate what statistics is, including that the central roles of the 5 W's and the How of data are of critical importance in real data science. In this issue, Lasater et al [2] highlight that “two critical learning elements now are working with complex publically-available datasets and choice and use of appropriate visualization in investigating multivariable data.” In [2], “These are the focus of the lab activity described here, set in an important social context.” Expansion to complex, large publically-available datasets and technologically intensive procedures does not mean relegation of other types of datasets or data collections. It just means the big tent of statistics and statistics teaching got even bigger. Collecting data, observing data, experimental design, and surveys still have major roles to play across all of statistics and its applications, and in teaching. But no matter what type or size of dataset, and no matter what the teaching context, without knowing, taking account of, and reporting on the 5 W's and the How of the data, analysis and interpretation may be compromised. Three articles in this issue provide excellent illustrations of this in different teaching and/or statistical contexts. All three focus on aspects of measurement and design, and all three demonstrate the critical importance of full knowledge of source, nature, and context of data. Whichever are the directions in which statistics and data science and their teaching go, instructors will continue to seek, as they always have, interesting real datasets and rich contexts to introduce, lead into, or illustrate statistical concepts, models, visualizations, technologies, methods, or analyses. Because of the nature of statistics, a variety of datasets for student experiential learning is invaluable. Since combining the use of subsets of larger and/or multivariable datasets, and of smaller more specific datasets, provides good pedagogical balance, instructors are always appreciative of resources of real datasets in real contexts with a specified number of variables of a specified type. In “Bare bones, or a rich feast?” [1], Sue Finch and Ian Gordon discuss the source information provided for datasets in the R “datasets” package, finding that for “69% there were obvious questions about units, factor levels, and/or design or measurement”, and do an extensive investigation into four potentially useful for teaching linear models with one or two categorical explanatory variables. Their findings are that impoverished data landscapes, sometimes even with potentially misleading or wrong contexts, can lead to “Sanitized versions of the reality behind the data fail(ing) to reflect the complexity and messiness that arise in practice” and “missed opportunities in teaching and learning” as well as credibility issues in analyses and interpretations. The authors conclude their investigations with some guidelines on the curation and documentation of datasets for teaching resources, with particular emphasis on measurement and design. DOI: 10.1111/test.12329
数据的方式
几十年来,专业统计学家和统计教育工作者一直强调识别、考虑和报告数据的5w的核心重要性——什么(what)、为什么(Why)、何时(When)、何地(Where)和由谁(who)。如果要收集或访问数据,我们可以添加how -我们如何获得我们需要或想要的数据。广泛使用的“如何”一词也可以包含5w的大部分内容,因为需要“What”和“Why”来理解如何获得或获得必要或期望的数据。这些都是统计学和统计调查的组成部分,这一点也被强调过,但无论对谁或在什么水平上,它们都应该成为统计学教学的核心,这一点再强调也不为过。老师们发现这一点会很高兴;我永远记得30年前高中老师在实践专业发展研讨会上学习这些知识时的兴奋——“你的意思是这都是统计学的一部分,而不仅仅是统计学的初级知识?”哇!”。不幸的是,从学科和/或教学前线学习并不一定能穿透教育权威的堡垒。谁,什么,怎么做,以及有多少教育部门的教学统计的问题,无论是对未来的教师还是未来的研究(多重t检验暴政似乎继续不受限制),都是一个不同的讨论。随着大数据和数据科学时代的逐渐发展和爆发,正如许多作者(包括《教学统计》2021年特刊)所评论的那样,数据在教学中的5w和How“当然”变得更加重要,并受到了新的关注。但正如Shatz b[6]在本期中提醒我们的那样,我们应该避免说“当然”,要时刻注意解释和阐明统计学是什么,包括5w和How of data的核心作用在真正的数据科学中至关重要。在本期中,Lasater等人强调,“现在有两个关键的学习元素是处理复杂的公开数据集,以及在调查多变量数据时选择和使用适当的可视化。”在[2]中,“这些是这里描述的实验室活动的重点,设置在一个重要的社会背景下。”扩展到复杂的、大型的公开可用数据集和技术密集型程序并不意味着其他类型的数据集或数据集合的降级。这只是意味着统计学和统计学教学的大帐篷变得更大了。收集数据、观察数据、实验设计和调查在所有统计学及其应用和教学中仍然发挥着重要作用。但是,无论数据集的类型或大小如何,无论教学背景如何,如果不了解、考虑和报告数据的5w和How,分析和解释都可能受到损害。本期的三篇文章在不同的教学和/或统计背景下提供了很好的例证。这三种方法都侧重于测量和设计方面,并且都证明了充分了解数据的来源、性质和背景的重要性。无论统计学和数据科学及其教学的方向如何,教师都将一如既往地继续寻找有趣的真实数据集和丰富的背景,以介绍、引导或说明统计概念、模型、可视化、技术、方法或分析。由于统计学的本质,各种各样的数据集对于学生的体验式学习是无价的。由于结合使用较大和/或多变量数据集的子集,以及较小的更具体的数据集,提供了良好的教学平衡,教师总是欣赏真实环境中具有特定类型的特定数量变量的真实数据集资源。在《裸骨,还是丰盛的盛宴》中?[1], Sue Finch和Ian Gordon讨论了R“数据集”包中为数据集提供的源信息,发现“69%的数据集存在关于单位、因素水平和/或设计或测量的明显问题”,并对四个具有一两个分类解释变量的线性模型进行了广泛的调查。他们的发现是,贫乏的数据环境,有时甚至是潜在的误导性或错误的背景,可能导致“数据背后的现实的净化版本无法反映实践中出现的复杂性和混乱”,“错失教学和学习的机会”,以及分析和解释中的可信度问题。作者总结了他们对教学资源数据集的管理和记录的一些指导方针,特别强调了测量和设计。DOI: 10.1111 / test.12329
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Accounts of Chemical Research
Accounts of Chemical Research 化学-化学综合
CiteScore
31.40
自引率
1.10%
发文量
312
审稿时长
2 months
期刊介绍: Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance. Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信