设计一个可复制的教育研究数据基础设施

2021 Systems and Information Engineering Design Symposium (SIEDS) Pub Date : 2021-04-30 DOI:10.1109/SIEDS52267.2021.9483729

Jordan Machita, Taylor Rohrich, Yusheng Jiang, Yiran Zheng

{"title":"设计一个可复制的教育研究数据基础设施","authors":"Jordan Machita, Taylor Rohrich, Yusheng Jiang, Yiran Zheng","doi":"10.1109/SIEDS52267.2021.9483729","DOIUrl":null,"url":null,"abstract":"The field of education research suffers from a lack of replication of existing research studies. SERA (The Special Education Research Accelerator) is a proposed crowdsourcing platform being developed by a research team at the University of Virginia’s School of Education that intends to help provide a solution by enabling large-scale replication of research studies in special education. In this paper, we present our design and implementation of a cloud-based data pipeline for a research study that could serve as a model for SERA. Cloud-based design considerations include: financial cost, technical feasibility, security concerns, automation capabilities, reproducibility, and scalability [1] [17]. We have designed an architectural frame-work that practitioners in education research can use to host their studies in the cloud and take advantage of automation, reproducibility, transparency, and accessibility. Implementation of our platform design includes automating the data extraction and cleaning, populating the database, and performing analytics and tracking. Additionally, the project includes the development of a web-facing API for researchers to query the database with no SQL knowledge necessary as well as a web-facing dashboard to present select information and metrics to the applied research team. Our data pipeline is hosted on Amazon Web Services (AWS), which provides functionality for automation, storage, database hosting, and APIs. We present this architecture to demonstrate how data could flow through the pipeline of SERA to achieve the goals of large-scale replication research.","PeriodicalId":426747,"journal":{"name":"2021 Systems and Information Engineering Design Symposium (SIEDS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Designing a Replicable Data Infrastructure for Education Research\",\"authors\":\"Jordan Machita, Taylor Rohrich, Yusheng Jiang, Yiran Zheng\",\"doi\":\"10.1109/SIEDS52267.2021.9483729\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The field of education research suffers from a lack of replication of existing research studies. SERA (The Special Education Research Accelerator) is a proposed crowdsourcing platform being developed by a research team at the University of Virginia’s School of Education that intends to help provide a solution by enabling large-scale replication of research studies in special education. In this paper, we present our design and implementation of a cloud-based data pipeline for a research study that could serve as a model for SERA. Cloud-based design considerations include: financial cost, technical feasibility, security concerns, automation capabilities, reproducibility, and scalability [1] [17]. We have designed an architectural frame-work that practitioners in education research can use to host their studies in the cloud and take advantage of automation, reproducibility, transparency, and accessibility. Implementation of our platform design includes automating the data extraction and cleaning, populating the database, and performing analytics and tracking. Additionally, the project includes the development of a web-facing API for researchers to query the database with no SQL knowledge necessary as well as a web-facing dashboard to present select information and metrics to the applied research team. Our data pipeline is hosted on Amazon Web Services (AWS), which provides functionality for automation, storage, database hosting, and APIs. We present this architecture to demonstrate how data could flow through the pipeline of SERA to achieve the goals of large-scale replication research.\",\"PeriodicalId\":426747,\"journal\":{\"name\":\"2021 Systems and Information Engineering Design Symposium (SIEDS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Systems and Information Engineering Design Symposium (SIEDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIEDS52267.2021.9483729\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS52267.2021.9483729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

教育研究领域缺乏对现有研究的复制。SERA(特殊教育研究加速器)是一个众包平台，由弗吉尼亚大学教育学院的一个研究团队开发，旨在通过大规模复制特殊教育研究来帮助提供解决方案。在本文中，我们提出了我们的设计和实现一个基于云的数据管道的研究，可以作为一个模型的SERA。基于云的设计考虑包括:财务成本、技术可行性、安全问题、自动化能力、再现性和可扩展性[1][17]。我们设计了一个架构框架，教育研究的从业者可以使用它在云中托管他们的研究，并利用自动化、可重复性、透明度和可访问性的优势。我们平台设计的实现包括自动化数据提取和清理、填充数据库以及执行分析和跟踪。此外，该项目还包括开发一个面向web的API，供研究人员在不需要SQL知识的情况下查询数据库，以及一个面向web的仪表板，向应用研究团队展示选择的信息和指标。我们的数据管道托管在亚马逊网络服务(AWS)上，它提供自动化、存储、数据库托管和api等功能。我们提出这种架构是为了演示数据如何通过SERA的管道流动，以实现大规模复制研究的目标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Designing a Replicable Data Infrastructure for Education Research

The field of education research suffers from a lack of replication of existing research studies. SERA (The Special Education Research Accelerator) is a proposed crowdsourcing platform being developed by a research team at the University of Virginia’s School of Education that intends to help provide a solution by enabling large-scale replication of research studies in special education. In this paper, we present our design and implementation of a cloud-based data pipeline for a research study that could serve as a model for SERA. Cloud-based design considerations include: financial cost, technical feasibility, security concerns, automation capabilities, reproducibility, and scalability [1] [17]. We have designed an architectural frame-work that practitioners in education research can use to host their studies in the cloud and take advantage of automation, reproducibility, transparency, and accessibility. Implementation of our platform design includes automating the data extraction and cleaning, populating the database, and performing analytics and tracking. Additionally, the project includes the development of a web-facing API for researchers to query the database with no SQL knowledge necessary as well as a web-facing dashboard to present select information and metrics to the applied research team. Our data pipeline is hosted on Amazon Web Services (AWS), which provides functionality for automation, storage, database hosting, and APIs. We present this architecture to demonstrate how data could flow through the pipeline of SERA to achieve the goals of large-scale replication research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 Systems and Information Engineering Design Symposium (SIEDS)

自引率

0.00%

发文量