有正确的路吗？

IF 1.2 Q2 EDUCATION & EDUCATIONAL RESEARCH

Teaching Statistics Pub Date : 2021-08-11 DOI:10.1111/test.12288

H. MacGillivray

{"title":"有正确的路吗？","authors":"H. MacGillivray","doi":"10.1111/test.12288","DOIUrl":null,"url":null,"abstract":"At the recent OZCOTS (Australian Conference on Teaching Statistics), https://anzsc2021.com.au/ozcots-conference/, Rob Gould's keynote, titled Data Education in pre-College: promises and challenges, attracted a question from Matthew Parry, University of Otago, as to whether the scenario of “.. here's a bunch of data, come up with questions..” is a type of reversal of much previous advocacy to source or collect data to investigate identified issues. Rob's reply, and his discussion in his 2021 paper “Towards data-scientific thinking” [1], include comments that whatever codification is used for the statistical investigation cycle, now often called the data cycle or the learning from data cycle “...it is expected that investigators will ‘skip around' to some extent.” and that the order is not strict. This can be seen in examination of a variety of statistical and data investigations in real and complex contexts, whether in research or applications. In References [1,3], both Rob Gould and Andee Rubin emphasize “consider data” to include all aspects of the assembly of data, whether the data is assembled through sourcing, searching, collating or collecting, or is already available. They, and other authors, comment that the deluge of data means that students and indeed investigators more and more consider or access data already collected. Technological advances also enable greater and more ready access to collected data, and the necessary wrangling to handle such data. These in turn open up many possibilities for students to explore civic issues, including the critiquing of data with the associated vital learning about data quality and inherent dangers in uncritical algorithmic approaches. Rob also commented that students seem to find difficulty in identifying what statistical questions can be posed for an existing dataset. It is interesting to consider that today's data deluges require a return to more emphasis on the questions of “what, when, how, why, who?” In previous eras when instructors had no choice but to provide data and their context to students, these questions were of paramount importance in authentic statistical learning. For those in workplaces, not being able to find answers to such dataquerying questions, prevented the critiquing of reports or the building on previous data investigations or the redoing of analyses. As access to technology increased, enabling students to explore and analyse data beyond simplistic pocket calculator restrictions, students were able to design, collect, observe or source their own data to investigate issues involving a number of variables of interest to them. This could also introduce another question of great practical importance in many disciplines and workplaces, namely, can we measure what we want to measure? Including the information on the “what, when, why, how” in their reporting of data investigations, was, and is, excellent grounding for their future work whether in industry, business or research. Hence we see that greater technological capabilities open greater possibilities in authentic student learning of data investigations, whether in accessing and using data collected by others or in collecting data themselves. The order of identifying issues and sourcing data may be reversed or, as is often the case, reiterated, but the core questions and reporting of “what, when, how, why, who?” and “can we measure what we want?” are as important as ever, along with critiquing and understanding issues of data quality. These apply to both statistics and data science, illustrating again their common crux in data investigations. However, there is one order in teaching statistics and data science which is not appropriate, namely, here is a tool, now find/obtain some data to use it on. This leads to forcing data into tools, neglecting assumptions and their subsequent evaluation, and the over-emphasis on a single question and a single answer which so dominates and inhibits early statistics teaching and contradicts statistical thinking. Many years ago as an undergraduate, I was both bemused and amused by how my medical student undergraduate friends could force every dataset and every question into a chi-square test, because their undergraduate program had included at that stage only a few weeks' introduction to statistics in which they had seen only this tool and how to use it, but not where, when, on what or why. It is also the approach which bedevils users in other disciplines with misuse of multiple simplistic procedures, especially the ubiquitous t, or of numerical codes as having numerical meaning, or misuse of assumptions or diagnostics. This type of approach arises from the mindset of theory-then-example which can appear in any science, including mathematics and computer science. This is not to negate the importance of theory which underpins, unifies, supports and validates methods, procedures and tools, and indeed provides the assumptions which are essential to emphasize in teaching statistics and data science. But I can hear the questioning of surely one has to introduce simple tools first and illustrate with simple DOI: 10.1111/test.12288","PeriodicalId":43739,"journal":{"name":"Teaching Statistics","volume":" ","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2021-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/test.12288","citationCount":"0","resultStr":"{\"title\":\"Is there a right way round?\",\"authors\":\"H. MacGillivray\",\"doi\":\"10.1111/test.12288\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"At the recent OZCOTS (Australian Conference on Teaching Statistics), https://anzsc2021.com.au/ozcots-conference/, Rob Gould's keynote, titled Data Education in pre-College: promises and challenges, attracted a question from Matthew Parry, University of Otago, as to whether the scenario of “.. here's a bunch of data, come up with questions..” is a type of reversal of much previous advocacy to source or collect data to investigate identified issues. Rob's reply, and his discussion in his 2021 paper “Towards data-scientific thinking” [1], include comments that whatever codification is used for the statistical investigation cycle, now often called the data cycle or the learning from data cycle “...it is expected that investigators will ‘skip around' to some extent.” and that the order is not strict. This can be seen in examination of a variety of statistical and data investigations in real and complex contexts, whether in research or applications. In References [1,3], both Rob Gould and Andee Rubin emphasize “consider data” to include all aspects of the assembly of data, whether the data is assembled through sourcing, searching, collating or collecting, or is already available. They, and other authors, comment that the deluge of data means that students and indeed investigators more and more consider or access data already collected. Technological advances also enable greater and more ready access to collected data, and the necessary wrangling to handle such data. These in turn open up many possibilities for students to explore civic issues, including the critiquing of data with the associated vital learning about data quality and inherent dangers in uncritical algorithmic approaches. Rob also commented that students seem to find difficulty in identifying what statistical questions can be posed for an existing dataset. It is interesting to consider that today's data deluges require a return to more emphasis on the questions of “what, when, how, why, who?” In previous eras when instructors had no choice but to provide data and their context to students, these questions were of paramount importance in authentic statistical learning. For those in workplaces, not being able to find answers to such dataquerying questions, prevented the critiquing of reports or the building on previous data investigations or the redoing of analyses. As access to technology increased, enabling students to explore and analyse data beyond simplistic pocket calculator restrictions, students were able to design, collect, observe or source their own data to investigate issues involving a number of variables of interest to them. This could also introduce another question of great practical importance in many disciplines and workplaces, namely, can we measure what we want to measure? Including the information on the “what, when, why, how” in their reporting of data investigations, was, and is, excellent grounding for their future work whether in industry, business or research. Hence we see that greater technological capabilities open greater possibilities in authentic student learning of data investigations, whether in accessing and using data collected by others or in collecting data themselves. The order of identifying issues and sourcing data may be reversed or, as is often the case, reiterated, but the core questions and reporting of “what, when, how, why, who?” and “can we measure what we want?” are as important as ever, along with critiquing and understanding issues of data quality. These apply to both statistics and data science, illustrating again their common crux in data investigations. However, there is one order in teaching statistics and data science which is not appropriate, namely, here is a tool, now find/obtain some data to use it on. This leads to forcing data into tools, neglecting assumptions and their subsequent evaluation, and the over-emphasis on a single question and a single answer which so dominates and inhibits early statistics teaching and contradicts statistical thinking. Many years ago as an undergraduate, I was both bemused and amused by how my medical student undergraduate friends could force every dataset and every question into a chi-square test, because their undergraduate program had included at that stage only a few weeks' introduction to statistics in which they had seen only this tool and how to use it, but not where, when, on what or why. It is also the approach which bedevils users in other disciplines with misuse of multiple simplistic procedures, especially the ubiquitous t, or of numerical codes as having numerical meaning, or misuse of assumptions or diagnostics. This type of approach arises from the mindset of theory-then-example which can appear in any science, including mathematics and computer science. This is not to negate the importance of theory which underpins, unifies, supports and validates methods, procedures and tools, and indeed provides the assumptions which are essential to emphasize in teaching statistics and data science. But I can hear the questioning of surely one has to introduce simple tools first and illustrate with simple DOI: 10.1111/test.12288\",\"PeriodicalId\":43739,\"journal\":{\"name\":\"Teaching Statistics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2021-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1111/test.12288\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Teaching Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1111/test.12288\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Teaching Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/test.12288","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

摘要

在最近的OZCOTS（澳大利亚教学统计会议）上，https://anzsc2021.com.au/ozcots-conference/，Rob Gould的主题演讲《大学前的数据教育：承诺与挑战》吸引了奥塔哥大学Matthew Parry的一个问题，即“……这是一堆数据，提出问题……”的情景是否是对以往许多倡导来源或收集数据以调查已确定问题的一种逆转。Rob的回复，以及他在2021年的论文《走向数据科学思维》[1]中的讨论，包括评论说，无论统计调查周期使用什么编码，现在通常被称为数据周期或从数据中学习周期“……预计调查人员会在某种程度上‘跳过’。”而且顺序并不严格。这可以从现实和复杂背景下的各种统计和数据调查中看出，无论是在研究还是应用中。在参考文献[1,3]中，Rob Gould和Andee Rubin都强调“考虑数据”包括数据汇编的所有方面，无论数据是通过来源、搜索、整理或收集汇编的，还是已经可用的。他们和其他作者评论说，数据的泛滥意味着学生和调查人员越来越多地考虑或访问已经收集的数据。技术进步还使人们能够更容易地获取收集到的数据，并为处理这些数据进行必要的争论。这些反过来又为学生探索公民问题开辟了许多可能性，包括对数据的批评以及相关的关于数据质量的重要学习，以及非批判性算法方法的固有危险。Rob还评论说，学生们似乎很难确定现有数据集可以提出什么统计问题。有趣的是，今天的数据泛滥需要更多地强调“什么、什么时候、如何、为什么、谁？”在以前的时代，教师别无选择，只能向学生提供数据及其背景，这些问题在真实的统计学习中至关重要。对于那些在工作场所的人来说，由于无法找到此类数据查询问题的答案，他们无法对报告进行批评，也无法在以前的数据调查基础上进行构建或重新进行分析。随着获得技术的机会的增加，使学生能够超越简单的袖珍计算器限制来探索和分析数据，学生们能够设计、收集、观察或获取自己的数据，以调查涉及他们感兴趣的许多变量的问题。这也可能在许多学科和工作场所引入另一个具有重大实际意义的问题，即我们能否衡量我们想要衡量的东西？包括他们在数据调查报告中的“什么、何时、为什么、如何”信息，过去和现在都是他们未来工作的良好基础，无论是在行业、商业还是研究中。因此，我们看到，更大的技术能力为学生真正学习数据调查开辟了更大的可能性，无论是在访问和使用他人收集的数据方面，还是在自己收集数据方面。识别问题和获取数据的顺序可能会颠倒，或者像通常的情况一样，重申，但核心问题和“什么、什么时候、如何、为什么、谁？”和“我们能衡量我们想要什么吗？”的报告，以及对数据质量问题的批评和理解，与以往一样重要。这些适用于统计学和数据科学，再次说明了它们在数据调查中的共同症结。然而，在统计学和数据科学的教学中，有一个顺序是不合适的，即这里有一个工具，现在找到/获得一些数据来使用它。这导致将数据强行纳入工具，忽视假设及其后续评估，过于强调单一的问题和单一的答案，从而主导和抑制了早期的统计教学，并与统计思维相矛盾。多年前，作为一名本科生，我对我的医学生本科生朋友们如何将每一个数据集和每一个问题都强行纳入卡方测试感到困惑和好笑，因为在那个阶段，他们的本科生课程只包括了几周的统计学入门，他们只看到了这个工具以及如何使用它，但没有看到在哪里、什么时候、什么或为什么使用它。这种方法也让其他学科的用户感到困扰，因为他们滥用了多个简单化的程序，尤其是普遍存在的t，或者滥用了具有数字意义的数字代码，或者误用了假设或诊断。这种方法源于理论思维，然后是例子思维，这种思维可以出现在任何科学中，包括数学和计算机科学。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Is there a right way round?

At the recent OZCOTS (Australian Conference on Teaching Statistics), https://anzsc2021.com.au/ozcots-conference/, Rob Gould's keynote, titled Data Education in pre-College: promises and challenges, attracted a question from Matthew Parry, University of Otago, as to whether the scenario of “.. here's a bunch of data, come up with questions..” is a type of reversal of much previous advocacy to source or collect data to investigate identified issues. Rob's reply, and his discussion in his 2021 paper “Towards data-scientific thinking” [1], include comments that whatever codification is used for the statistical investigation cycle, now often called the data cycle or the learning from data cycle “...it is expected that investigators will ‘skip around' to some extent.” and that the order is not strict. This can be seen in examination of a variety of statistical and data investigations in real and complex contexts, whether in research or applications. In References [1,3], both Rob Gould and Andee Rubin emphasize “consider data” to include all aspects of the assembly of data, whether the data is assembled through sourcing, searching, collating or collecting, or is already available. They, and other authors, comment that the deluge of data means that students and indeed investigators more and more consider or access data already collected. Technological advances also enable greater and more ready access to collected data, and the necessary wrangling to handle such data. These in turn open up many possibilities for students to explore civic issues, including the critiquing of data with the associated vital learning about data quality and inherent dangers in uncritical algorithmic approaches. Rob also commented that students seem to find difficulty in identifying what statistical questions can be posed for an existing dataset. It is interesting to consider that today's data deluges require a return to more emphasis on the questions of “what, when, how, why, who?” In previous eras when instructors had no choice but to provide data and their context to students, these questions were of paramount importance in authentic statistical learning. For those in workplaces, not being able to find answers to such dataquerying questions, prevented the critiquing of reports or the building on previous data investigations or the redoing of analyses. As access to technology increased, enabling students to explore and analyse data beyond simplistic pocket calculator restrictions, students were able to design, collect, observe or source their own data to investigate issues involving a number of variables of interest to them. This could also introduce another question of great practical importance in many disciplines and workplaces, namely, can we measure what we want to measure? Including the information on the “what, when, why, how” in their reporting of data investigations, was, and is, excellent grounding for their future work whether in industry, business or research. Hence we see that greater technological capabilities open greater possibilities in authentic student learning of data investigations, whether in accessing and using data collected by others or in collecting data themselves. The order of identifying issues and sourcing data may be reversed or, as is often the case, reiterated, but the core questions and reporting of “what, when, how, why, who?” and “can we measure what we want?” are as important as ever, along with critiquing and understanding issues of data quality. These apply to both statistics and data science, illustrating again their common crux in data investigations. However, there is one order in teaching statistics and data science which is not appropriate, namely, here is a tool, now find/obtain some data to use it on. This leads to forcing data into tools, neglecting assumptions and their subsequent evaluation, and the over-emphasis on a single question and a single answer which so dominates and inhibits early statistics teaching and contradicts statistical thinking. Many years ago as an undergraduate, I was both bemused and amused by how my medical student undergraduate friends could force every dataset and every question into a chi-square test, because their undergraduate program had included at that stage only a few weeks' introduction to statistics in which they had seen only this tool and how to use it, but not where, when, on what or why. It is also the approach which bedevils users in other disciplines with misuse of multiple simplistic procedures, especially the ubiquitous t, or of numerical codes as having numerical meaning, or misuse of assumptions or diagnostics. This type of approach arises from the mindset of theory-then-example which can appear in any science, including mathematics and computer science. This is not to negate the importance of theory which underpins, unifies, supports and validates methods, procedures and tools, and indeed provides the assumptions which are essential to emphasize in teaching statistics and data science. But I can hear the questioning of surely one has to introduce simple tools first and illustrate with simple DOI: 10.1111/test.12288

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Teaching Statistics EDUCATION & EDUCATIONAL RESEARCH-

CiteScore

2.10

自引率

25.00%

发文量