How to automate the extraction and analysis of information for educational purposes

IF 5.1 1区文学 Q1 COMMUNICATION

Comunicar Pub Date : 2023-01-01 DOI:10.3916/c74-2023-02

Miriam Calvera-Isabal, Patrícia Santos, H. Hoppe, Cleo Schulten

{"title":"How to automate the extraction and analysis of information for educational purposes","authors":"Miriam Calvera-Isabal, Patrícia Santos, H. Hoppe, Cleo Schulten","doi":"10.3916/c74-2023-02","DOIUrl":null,"url":null,"abstract":"There is an increasing interest and growing practice in Citizen Science (CS) that goes along with the usage of websites for communication as well as for capturing and processing data and materials. From an educational perspective, it is expected that by integrating information about CS in a formal educational setting, it will inspire teachers to create learning activities. This is an interesting case for using bots to automate the process of data extraction from online CS platforms to better understand its use in educational contexts. Although this information is publicly available, it has to follow GDPR rules. This paper aims to explain (1) how CS communicates and is promoted on websites, (2) how web scraping methods and anonymization techniques have been designed, developed and applied to collect information from online sources and (3) how these data could be used for educational purposes. After the analysis of 72 websites, some of the results obtained show that only 24.8% includes detailed information about the CS project and 48.61% includes information about educational purposes or materials.\nEl interés y la práctica de la ciencia ciudadana (CC) ha aumentado en los últimos años. Esto ha derivado en el uso de páginas web como herramienta de comunicación, recolección o análisis datos o repositorio materiales y recursos. Desde una perspectiva educativa, se espera que al integrar información sobre proyectos de CC en un entorno educativo formal, se inspire a los maestros a crear actividades de aprendizaje. Este, es un caso interesante para usar bots que automaticen el proceso de extracción de datos de webs de CC que ayuden a comprender mejor su uso en contextos educativos. Aunque esta información está disponible públicamente, se deben seguir las reglas de la ley de protección de datos o GDPR. Este artículo tiene como objetivo explicar: 1) cómo la CC se comunica y promueve en los sitios web; 2) cómo se diseñan, desarrollan y aplican los métodos de web scraping y las técnicas de anonimización para recopilar información en línea; y 3) cómo se podrían usar estos datos con fines educativos. Tras el análisis de 72 webs algunos de los resultados son que solo el 24,8% incluye información detallada sobre el proyecto, y el 48,61% incluye información sobre propósitos o materiales educativos.","PeriodicalId":10773,"journal":{"name":"Comunicar","volume":" ","pages":""},"PeriodicalIF":5.1000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Comunicar","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.3916/c74-2023-02","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMMUNICATION","Score":null,"Total":0}

引用次数: 1

Abstract

There is an increasing interest and growing practice in Citizen Science (CS) that goes along with the usage of websites for communication as well as for capturing and processing data and materials. From an educational perspective, it is expected that by integrating information about CS in a formal educational setting, it will inspire teachers to create learning activities. This is an interesting case for using bots to automate the process of data extraction from online CS platforms to better understand its use in educational contexts. Although this information is publicly available, it has to follow GDPR rules. This paper aims to explain (1) how CS communicates and is promoted on websites, (2) how web scraping methods and anonymization techniques have been designed, developed and applied to collect information from online sources and (3) how these data could be used for educational purposes. After the analysis of 72 websites, some of the results obtained show that only 24.8% includes detailed information about the CS project and 48.61% includes information about educational purposes or materials. El interés y la práctica de la ciencia ciudadana (CC) ha aumentado en los últimos años. Esto ha derivado en el uso de páginas web como herramienta de comunicación, recolección o análisis datos o repositorio materiales y recursos. Desde una perspectiva educativa, se espera que al integrar información sobre proyectos de CC en un entorno educativo formal, se inspire a los maestros a crear actividades de aprendizaje. Este, es un caso interesante para usar bots que automaticen el proceso de extracción de datos de webs de CC que ayuden a comprender mejor su uso en contextos educativos. Aunque esta información está disponible públicamente, se deben seguir las reglas de la ley de protección de datos o GDPR. Este artículo tiene como objetivo explicar: 1) cómo la CC se comunica y promueve en los sitios web; 2) cómo se diseñan, desarrollan y aplican los métodos de web scraping y las técnicas de anonimización para recopilar información en línea; y 3) cómo se podrían usar estos datos con fines educativos. Tras el análisis de 72 webs algunos de los resultados son que solo el 24,8% incluye información detallada sobre el proyecto, y el 48,61% incluye información sobre propósitos o materiales educativos.

查看原文本刊更多论文

如何为教育目的自动提取和分析信息

随着网站用于通信以及捕获和处理数据和材料，人们对《公民科学》的兴趣越来越大，实践也越来越多。从教育的角度来看，预计通过将有关Cs的信息整合到正式的教育环境中，它将激励教师创建学习活动。这是一个有趣的案例，可以使用机器人来自动化从在线Cs平台中提取数据的过程，以更好地了解其在教育环境中的使用。虽然这一信息是公开的，但必须遵守GDPR规则。本文件旨在解释（1）Cs是如何在网站上交流和推广的，（2）如何设计、开发和应用网络筛选方法和匿名技术来从在线来源收集信息，（3）这些数据如何用于教育目的。在分析了72个网站后，获得的一些结果表明，只有24.8%包含有关Cs项目的详细信息，48.61%包含有关教育目的或材料的信息。近年来，人们对公民科学的兴趣和实践有所增加。这导致使用网页作为交流、收集或分析数据或材料和资源存储库的工具。从教育的角度来看，通过将有关CC项目的信息整合到正式的教育环境中，预计将激励教师创建学习活动。这是一个有趣的案例，可以使用机器人来自动化CC网站的数据挖掘过程，以帮助更好地理解其在教育背景下的使用。虽然这些信息是公开的，但必须遵守《数据保护法》或GDPR的规则。本文旨在解释：（1）CC如何在网站上交流和推广；2）如何设计、开发和应用Web擦除方法和匿名技术来收集在线信息；3）如何将这些数据用于教育目的。在对72个网站进行分析后，一些结果是，只有24.8%的网站包含有关该项目的详细信息，48.61%的网站包含有关教育目的或材料的信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Comunicar Multiple-

CiteScore

10.10

自引率

5.40%

发文量

审稿时长

20 weeks

期刊介绍： Comunicar specialized in educommunication: communication and education, ICT, audiences, new languages...; monographs specialized in current issues. Double format: printed and online; digitally, accessible in full text, free of charge, for the entire scientific community and researchers around the world. Coeditions printed in Spanish and English for the whole world. Published by Oxbridge Publishing House which collaborates with many international centres and universities.