一个新的手写论文数据集，用于新基准的自动论文评分

Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence Pub Date : 2022-12-23 DOI:10.1145/3579654.3579684

Shiyu Hu, Qichuan Yang, Yibing Yang

{"title":"一个新的手写论文数据集，用于新基准的自动论文评分","authors":"Shiyu Hu, Qichuan Yang, Yibing Yang","doi":"10.1145/3579654.3579684","DOIUrl":null,"url":null,"abstract":"The study of algorithms for Automatic Essay Scoring (AES) currently is motivated by textual essay-scoring datasets constructed by anonymous teachers from schools. We propose VisEssay, the first essay-scoring dataset containing handwriting images. VisEssay consists of over 13,000 visual essays originating from 25+ professional in-service teachers whose personal scoring accuracy are recorded by his/her scoring history, together with crowdsourced OCR result per handwriting image. VisEssay differs from the many existing AES datasets because 1) handwriting images are captured from non-native speakers with complementary essay types for existing datasets, 2) teachers scoring these essays are with personal profiles and score accuracy, and 3) corresponding text is checked to keep the consistency. Evaluation of modern algorithms for AES and text classification reveals that the proposed VisEssay is a challenging dataset. In the cause of encouraging a larger community to develop more generalized educational algorithms, we introduce three novel AES systems together with VisEssay and analysis the result as a new benchmark.","PeriodicalId":146783,"journal":{"name":"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A New Handwritten Essay Dataset for Automatic Essay Scoring with A New Benchmark\",\"authors\":\"Shiyu Hu, Qichuan Yang, Yibing Yang\",\"doi\":\"10.1145/3579654.3579684\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The study of algorithms for Automatic Essay Scoring (AES) currently is motivated by textual essay-scoring datasets constructed by anonymous teachers from schools. We propose VisEssay, the first essay-scoring dataset containing handwriting images. VisEssay consists of over 13,000 visual essays originating from 25+ professional in-service teachers whose personal scoring accuracy are recorded by his/her scoring history, together with crowdsourced OCR result per handwriting image. VisEssay differs from the many existing AES datasets because 1) handwriting images are captured from non-native speakers with complementary essay types for existing datasets, 2) teachers scoring these essays are with personal profiles and score accuracy, and 3) corresponding text is checked to keep the consistency. Evaluation of modern algorithms for AES and text classification reveals that the proposed VisEssay is a challenging dataset. In the cause of encouraging a larger community to develop more generalized educational algorithms, we introduce three novel AES systems together with VisEssay and analysis the result as a new benchmark.\",\"PeriodicalId\":146783,\"journal\":{\"name\":\"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3579654.3579684\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579654.3579684","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

自动作文评分(AES)算法的研究目前是由学校匿名教师构建的文本作文评分数据集驱动的。我们提出VisEssay，这是第一个包含手写图像的论文评分数据集。VisEssay由超过13,000篇视觉文章组成，这些文章来自25名以上的在职专业教师，他们的个人评分准确性记录在他/她的评分历史中，以及每个手写图像的众包OCR结果。VisEssay与许多现有AES数据集的不同之处在于:1)手写图像是从非母语人士那里捕获的，与现有数据集的文章类型互补;2)教师对这些文章进行评分是基于个人资料和得分准确性;3)检查相应的文本以保持一致性。对AES和文本分类的现代算法的评估表明，提出的VisEssay是一个具有挑战性的数据集。为了鼓励更大的社区开发更通用的教育算法，我们引入了三种新的AES系统和VisEssay，并将结果作为新的基准进行分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A New Handwritten Essay Dataset for Automatic Essay Scoring with A New Benchmark

The study of algorithms for Automatic Essay Scoring (AES) currently is motivated by textual essay-scoring datasets constructed by anonymous teachers from schools. We propose VisEssay, the first essay-scoring dataset containing handwriting images. VisEssay consists of over 13,000 visual essays originating from 25+ professional in-service teachers whose personal scoring accuracy are recorded by his/her scoring history, together with crowdsourced OCR result per handwriting image. VisEssay differs from the many existing AES datasets because 1) handwriting images are captured from non-native speakers with complementary essay types for existing datasets, 2) teachers scoring these essays are with personal profiles and score accuracy, and 3) corresponding text is checked to keep the consistency. Evaluation of modern algorithms for AES and text classification reveals that the proposed VisEssay is a challenging dataset. In the cause of encouraging a larger community to develop more generalized educational algorithms, we introduce three novel AES systems together with VisEssay and analysis the result as a new benchmark.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence

自引率

0.00%

发文量