Jordan Machita, Taylor Rohrich, Yusheng Jiang, Yiran Zheng
{"title":"设计一个可复制的教育研究数据基础设施","authors":"Jordan Machita, Taylor Rohrich, Yusheng Jiang, Yiran Zheng","doi":"10.1109/SIEDS52267.2021.9483729","DOIUrl":null,"url":null,"abstract":"The field of education research suffers from a lack of replication of existing research studies. SERA (The Special Education Research Accelerator) is a proposed crowdsourcing platform being developed by a research team at the University of Virginia’s School of Education that intends to help provide a solution by enabling large-scale replication of research studies in special education. In this paper, we present our design and implementation of a cloud-based data pipeline for a research study that could serve as a model for SERA. Cloud-based design considerations include: financial cost, technical feasibility, security concerns, automation capabilities, reproducibility, and scalability [1] [17]. We have designed an architectural frame-work that practitioners in education research can use to host their studies in the cloud and take advantage of automation, reproducibility, transparency, and accessibility. Implementation of our platform design includes automating the data extraction and cleaning, populating the database, and performing analytics and tracking. Additionally, the project includes the development of a web-facing API for researchers to query the database with no SQL knowledge necessary as well as a web-facing dashboard to present select information and metrics to the applied research team. Our data pipeline is hosted on Amazon Web Services (AWS), which provides functionality for automation, storage, database hosting, and APIs. We present this architecture to demonstrate how data could flow through the pipeline of SERA to achieve the goals of large-scale replication research.","PeriodicalId":426747,"journal":{"name":"2021 Systems and Information Engineering Design Symposium (SIEDS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Designing a Replicable Data Infrastructure for Education Research\",\"authors\":\"Jordan Machita, Taylor Rohrich, Yusheng Jiang, Yiran Zheng\",\"doi\":\"10.1109/SIEDS52267.2021.9483729\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The field of education research suffers from a lack of replication of existing research studies. SERA (The Special Education Research Accelerator) is a proposed crowdsourcing platform being developed by a research team at the University of Virginia’s School of Education that intends to help provide a solution by enabling large-scale replication of research studies in special education. In this paper, we present our design and implementation of a cloud-based data pipeline for a research study that could serve as a model for SERA. Cloud-based design considerations include: financial cost, technical feasibility, security concerns, automation capabilities, reproducibility, and scalability [1] [17]. We have designed an architectural frame-work that practitioners in education research can use to host their studies in the cloud and take advantage of automation, reproducibility, transparency, and accessibility. Implementation of our platform design includes automating the data extraction and cleaning, populating the database, and performing analytics and tracking. Additionally, the project includes the development of a web-facing API for researchers to query the database with no SQL knowledge necessary as well as a web-facing dashboard to present select information and metrics to the applied research team. Our data pipeline is hosted on Amazon Web Services (AWS), which provides functionality for automation, storage, database hosting, and APIs. We present this architecture to demonstrate how data could flow through the pipeline of SERA to achieve the goals of large-scale replication research.\",\"PeriodicalId\":426747,\"journal\":{\"name\":\"2021 Systems and Information Engineering Design Symposium (SIEDS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Systems and Information Engineering Design Symposium (SIEDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIEDS52267.2021.9483729\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS52267.2021.9483729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Designing a Replicable Data Infrastructure for Education Research
The field of education research suffers from a lack of replication of existing research studies. SERA (The Special Education Research Accelerator) is a proposed crowdsourcing platform being developed by a research team at the University of Virginia’s School of Education that intends to help provide a solution by enabling large-scale replication of research studies in special education. In this paper, we present our design and implementation of a cloud-based data pipeline for a research study that could serve as a model for SERA. Cloud-based design considerations include: financial cost, technical feasibility, security concerns, automation capabilities, reproducibility, and scalability [1] [17]. We have designed an architectural frame-work that practitioners in education research can use to host their studies in the cloud and take advantage of automation, reproducibility, transparency, and accessibility. Implementation of our platform design includes automating the data extraction and cleaning, populating the database, and performing analytics and tracking. Additionally, the project includes the development of a web-facing API for researchers to query the database with no SQL knowledge necessary as well as a web-facing dashboard to present select information and metrics to the applied research team. Our data pipeline is hosted on Amazon Web Services (AWS), which provides functionality for automation, storage, database hosting, and APIs. We present this architecture to demonstrate how data could flow through the pipeline of SERA to achieve the goals of large-scale replication research.