从野外到教科书数据的旅程——可重复刷新全国青年纵向调查数据库中的工资数据

IF 1.6 Q2 EDUCATION, SCIENTIFIC DISCIPLINES

Journal of Statistics and Data Science Education Pub Date : 2022-05-13 DOI:10.1080/26939169.2022.2094300

Dewi Amaliah, D. Cook, Emi Tanaka, Kate Hyde, Nicholas J. Tierney

{"title":"从野外到教科书数据的旅程——可重复刷新全国青年纵向调查数据库中的工资数据","authors":"Dewi Amaliah, D. Cook, Emi Tanaka, Kate Hyde, Nicholas J. Tierney","doi":"10.1080/26939169.2022.2094300","DOIUrl":null,"url":null,"abstract":"Abstract Textbook data is essential for teaching statistics and data science methods because it is clean, allowing the instructor to focus on methodology. Ideally textbook datasets are refreshed regularly, especially when they are subsets taken from an ongoing data collection. It is also important to use contemporary data for teaching, to imbue the sense that the methodology is relevant today. This article describes the trials and tribulations of refreshing a textbook dataset on wages, extracted from the National Longitudinal Survey of Youth (NLSY79) in the early 1990s. The data is useful for teaching modeling and exploratory analysis of longitudinal data. Subsets of NLSY79, including the wages data, can be found in supplementary materials from numerous textbooks and research articles. The NLSY79 database has been continually updated through to 2018, so new records are available. Here we describe our journey to refresh the wages data, and document the process so that the data can be regularly updated into the future. Our journey was difficult because the steps and decisions taken to get from the raw data to the wages textbook subset have not been clearly articulated. We have been diligent to provide a reproducible workflow for others to follow, which also hopefully inspires more attempts at refreshing data for teaching. Three new datasets and the code to produce them are provided in the open source R package called yowie. Supplementary materials for this article are available online.","PeriodicalId":34851,"journal":{"name":"Journal of Statistics and Data Science Education","volume":"30 1","pages":"289 - 303"},"PeriodicalIF":1.6000,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Journey from Wild to Textbook Data to Reproducibly Refresh the Wages Data from the National Longitudinal Survey of Youth Database\",\"authors\":\"Dewi Amaliah, D. Cook, Emi Tanaka, Kate Hyde, Nicholas J. Tierney\",\"doi\":\"10.1080/26939169.2022.2094300\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Textbook data is essential for teaching statistics and data science methods because it is clean, allowing the instructor to focus on methodology. Ideally textbook datasets are refreshed regularly, especially when they are subsets taken from an ongoing data collection. It is also important to use contemporary data for teaching, to imbue the sense that the methodology is relevant today. This article describes the trials and tribulations of refreshing a textbook dataset on wages, extracted from the National Longitudinal Survey of Youth (NLSY79) in the early 1990s. The data is useful for teaching modeling and exploratory analysis of longitudinal data. Subsets of NLSY79, including the wages data, can be found in supplementary materials from numerous textbooks and research articles. The NLSY79 database has been continually updated through to 2018, so new records are available. Here we describe our journey to refresh the wages data, and document the process so that the data can be regularly updated into the future. Our journey was difficult because the steps and decisions taken to get from the raw data to the wages textbook subset have not been clearly articulated. We have been diligent to provide a reproducible workflow for others to follow, which also hopefully inspires more attempts at refreshing data for teaching. Three new datasets and the code to produce them are provided in the open source R package called yowie. Supplementary materials for this article are available online.\",\"PeriodicalId\":34851,\"journal\":{\"name\":\"Journal of Statistics and Data Science Education\",\"volume\":\"30 1\",\"pages\":\"289 - 303\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2022-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Statistics and Data Science Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/26939169.2022.2094300\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"EDUCATION, SCIENTIFIC DISCIPLINES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistics and Data Science Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/26939169.2022.2094300","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}

引用次数: 0

摘要

摘要教材数据对统计学和数据科学方法的教学至关重要，因为它是干净的，可以让教师专注于方法论。理想情况下，教科书数据集定期刷新，尤其是当它们是从正在进行的数据收集中提取的子集时。同样重要的是，将当代数据用于教学，以灌输这种方法论在今天是相关的。本文描述了刷新20世纪90年代初从全国青年纵向调查（NLSY79）中提取的工资教科书数据集的经历。这些数据有助于纵向数据的教学建模和探索性分析。NLSY79的子集，包括工资数据，可以在许多教科书和研究文章的补充材料中找到。NLSY79数据库一直持续更新到2018年，因此可以获得新的记录。在这里，我们描述了刷新工资数据的过程，并记录了这一过程，以便在未来定期更新数据。我们的旅程很艰难，因为从原始数据到工资教科书子集所采取的步骤和决定尚未明确阐述。我们一直致力于为其他人提供一个可复制的工作流程，这也有望激发更多刷新教学数据的尝试。三个新的数据集和产生它们的代码在名为yowie的开源R包中提供。本文的补充材料可在线获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Journey from Wild to Textbook Data to Reproducibly Refresh the Wages Data from the National Longitudinal Survey of Youth Database

Abstract Textbook data is essential for teaching statistics and data science methods because it is clean, allowing the instructor to focus on methodology. Ideally textbook datasets are refreshed regularly, especially when they are subsets taken from an ongoing data collection. It is also important to use contemporary data for teaching, to imbue the sense that the methodology is relevant today. This article describes the trials and tribulations of refreshing a textbook dataset on wages, extracted from the National Longitudinal Survey of Youth (NLSY79) in the early 1990s. The data is useful for teaching modeling and exploratory analysis of longitudinal data. Subsets of NLSY79, including the wages data, can be found in supplementary materials from numerous textbooks and research articles. The NLSY79 database has been continually updated through to 2018, so new records are available. Here we describe our journey to refresh the wages data, and document the process so that the data can be regularly updated into the future. Our journey was difficult because the steps and decisions taken to get from the raw data to the wages textbook subset have not been clearly articulated. We have been diligent to provide a reproducible workflow for others to follow, which also hopefully inspires more attempts at refreshing data for teaching. Three new datasets and the code to produce them are provided in the open source R package called yowie. Supplementary materials for this article are available online.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Statistics and Data Science Education EDUCATION, SCIENTIFIC DISCIPLINES-

CiteScore

3.90

自引率

35.30%

发文量

审稿时长

12 weeks