{"title":"The how of data","authors":"H. MacGillivray","doi":"10.1111/test.12329","DOIUrl":null,"url":null,"abstract":"For many decades, professional statisticians and statistics educators have emphasized the central importance of identifying, taking account of, and reporting the 5 W's of data—What, Why, When, Where, and by Whom. If data are to be collected or accessed, we can add How—how can we obtain the data we need or want. The word “How” used broadly, can also encompass much of the 5 W's, as the What and Why are needed to understand How the necessary or desired data can be obtained, or were obtained. That these are all integral to statistics and statistics investigations has also been emphasized but it can never be highlighted enough that they should be at the heart of teaching statistics, no matter to whom or at what level. It can be a delight for teachers to discover this; I will always remember the excitement of senior school teachers learning this 30 years ago in hands-on professional development workshops— “You mean this is all part of statistics, not just preliminaries to statistics? Wow!”. Unfortunately, learning from discipline and/or teaching frontlines does not necessarily penetrate the citadel of educational authority. The question of the Who, the What, the How, and the How much of teaching statistics in education faculties, whether for future teachers or future research (where the multiple t-test tyranny appears to continue unchecked), is open for a different discussion. As the eras of big data and data science gradually grew and then exploded, the 5 W's and the How of data in teaching have “of course” become even more important and have received renewed attention, as commented by many authors, including in the 2021 special issue of Teaching Statistics. But as Shatz [6] reminds us in this issue, we should avoid saying “of course” and be ever mindful of the perpetual need to both explain and illuminate what statistics is, including that the central roles of the 5 W's and the How of data are of critical importance in real data science. In this issue, Lasater et al [2] highlight that “two critical learning elements now are working with complex publically-available datasets and choice and use of appropriate visualization in investigating multivariable data.” In [2], “These are the focus of the lab activity described here, set in an important social context.” Expansion to complex, large publically-available datasets and technologically intensive procedures does not mean relegation of other types of datasets or data collections. It just means the big tent of statistics and statistics teaching got even bigger. Collecting data, observing data, experimental design, and surveys still have major roles to play across all of statistics and its applications, and in teaching. But no matter what type or size of dataset, and no matter what the teaching context, without knowing, taking account of, and reporting on the 5 W's and the How of the data, analysis and interpretation may be compromised. Three articles in this issue provide excellent illustrations of this in different teaching and/or statistical contexts. All three focus on aspects of measurement and design, and all three demonstrate the critical importance of full knowledge of source, nature, and context of data. Whichever are the directions in which statistics and data science and their teaching go, instructors will continue to seek, as they always have, interesting real datasets and rich contexts to introduce, lead into, or illustrate statistical concepts, models, visualizations, technologies, methods, or analyses. Because of the nature of statistics, a variety of datasets for student experiential learning is invaluable. Since combining the use of subsets of larger and/or multivariable datasets, and of smaller more specific datasets, provides good pedagogical balance, instructors are always appreciative of resources of real datasets in real contexts with a specified number of variables of a specified type. In “Bare bones, or a rich feast?” [1], Sue Finch and Ian Gordon discuss the source information provided for datasets in the R “datasets” package, finding that for “69% there were obvious questions about units, factor levels, and/or design or measurement”, and do an extensive investigation into four potentially useful for teaching linear models with one or two categorical explanatory variables. Their findings are that impoverished data landscapes, sometimes even with potentially misleading or wrong contexts, can lead to “Sanitized versions of the reality behind the data fail(ing) to reflect the complexity and messiness that arise in practice” and “missed opportunities in teaching and learning” as well as credibility issues in analyses and interpretations. The authors conclude their investigations with some guidelines on the curation and documentation of datasets for teaching resources, with particular emphasis on measurement and design. DOI: 10.1111/test.12329","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2022-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/test.12329","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
For many decades, professional statisticians and statistics educators have emphasized the central importance of identifying, taking account of, and reporting the 5 W's of data—What, Why, When, Where, and by Whom. If data are to be collected or accessed, we can add How—how can we obtain the data we need or want. The word “How” used broadly, can also encompass much of the 5 W's, as the What and Why are needed to understand How the necessary or desired data can be obtained, or were obtained. That these are all integral to statistics and statistics investigations has also been emphasized but it can never be highlighted enough that they should be at the heart of teaching statistics, no matter to whom or at what level. It can be a delight for teachers to discover this; I will always remember the excitement of senior school teachers learning this 30 years ago in hands-on professional development workshops— “You mean this is all part of statistics, not just preliminaries to statistics? Wow!”. Unfortunately, learning from discipline and/or teaching frontlines does not necessarily penetrate the citadel of educational authority. The question of the Who, the What, the How, and the How much of teaching statistics in education faculties, whether for future teachers or future research (where the multiple t-test tyranny appears to continue unchecked), is open for a different discussion. As the eras of big data and data science gradually grew and then exploded, the 5 W's and the How of data in teaching have “of course” become even more important and have received renewed attention, as commented by many authors, including in the 2021 special issue of Teaching Statistics. But as Shatz [6] reminds us in this issue, we should avoid saying “of course” and be ever mindful of the perpetual need to both explain and illuminate what statistics is, including that the central roles of the 5 W's and the How of data are of critical importance in real data science. In this issue, Lasater et al [2] highlight that “two critical learning elements now are working with complex publically-available datasets and choice and use of appropriate visualization in investigating multivariable data.” In [2], “These are the focus of the lab activity described here, set in an important social context.” Expansion to complex, large publically-available datasets and technologically intensive procedures does not mean relegation of other types of datasets or data collections. It just means the big tent of statistics and statistics teaching got even bigger. Collecting data, observing data, experimental design, and surveys still have major roles to play across all of statistics and its applications, and in teaching. But no matter what type or size of dataset, and no matter what the teaching context, without knowing, taking account of, and reporting on the 5 W's and the How of the data, analysis and interpretation may be compromised. Three articles in this issue provide excellent illustrations of this in different teaching and/or statistical contexts. All three focus on aspects of measurement and design, and all three demonstrate the critical importance of full knowledge of source, nature, and context of data. Whichever are the directions in which statistics and data science and their teaching go, instructors will continue to seek, as they always have, interesting real datasets and rich contexts to introduce, lead into, or illustrate statistical concepts, models, visualizations, technologies, methods, or analyses. Because of the nature of statistics, a variety of datasets for student experiential learning is invaluable. Since combining the use of subsets of larger and/or multivariable datasets, and of smaller more specific datasets, provides good pedagogical balance, instructors are always appreciative of resources of real datasets in real contexts with a specified number of variables of a specified type. In “Bare bones, or a rich feast?” [1], Sue Finch and Ian Gordon discuss the source information provided for datasets in the R “datasets” package, finding that for “69% there were obvious questions about units, factor levels, and/or design or measurement”, and do an extensive investigation into four potentially useful for teaching linear models with one or two categorical explanatory variables. Their findings are that impoverished data landscapes, sometimes even with potentially misleading or wrong contexts, can lead to “Sanitized versions of the reality behind the data fail(ing) to reflect the complexity and messiness that arise in practice” and “missed opportunities in teaching and learning” as well as credibility issues in analyses and interpretations. The authors conclude their investigations with some guidelines on the curation and documentation of datasets for teaching resources, with particular emphasis on measurement and design. DOI: 10.1111/test.12329
期刊介绍:
Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance.
Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.