Hilário Oliveira , Rafael Ferreira Mello , Péricles Miranda , Hyan Batista , Moésio Wenceslau da Silva Filho , Thiago Cordeiro , Ig Ibert Bittencourt , Seiji Isotani
{"title":"一个基准数据集的叙事学生散文与多能力等级自动作文评分在巴西葡萄牙语","authors":"Hilário Oliveira , Rafael Ferreira Mello , Péricles Miranda , Hyan Batista , Moésio Wenceslau da Silva Filho , Thiago Cordeiro , Ig Ibert Bittencourt , Seiji Isotani","doi":"10.1016/j.dib.2025.111526","DOIUrl":null,"url":null,"abstract":"<div><div>This paper describes the development of a new database comprising 1235 narrative essays written in Portuguese by 5th-grade students in Brazil. The corpus construction process involved three main steps: acquiring and transcribing photos of the essays, annotating them based on a real pre-defined correction rubric by experts considering four key writing competencies (formal language use, textual typology, thematic coherence, and textual cohesion), and resolving disagreements between the annotators. Two human experts manually evaluated each essay using a five-point scale (Level I: Complete lack of domain - Level V: Excellent mastery) aligned with the correction rubric. In cases of disagreement between the initial evaluators, a third expert facilitated the divergences resolution. To the best of our knowledge, this is the first publicly available dataset of elementary school essays in Brazilian Portuguese that features narrative writing samples with corresponding grades across multiple competencies commonly used in writing assessment. We believe this resource can contribute to developing automatic essay scoring systems tailored for evaluating narrative texts written in Brazilian Portuguese.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"60 ","pages":"Article 111526"},"PeriodicalIF":1.0000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A benchmark dataset of narrative student essays with multi-competency grades for automatic essay scoring in Brazilian Portuguese\",\"authors\":\"Hilário Oliveira , Rafael Ferreira Mello , Péricles Miranda , Hyan Batista , Moésio Wenceslau da Silva Filho , Thiago Cordeiro , Ig Ibert Bittencourt , Seiji Isotani\",\"doi\":\"10.1016/j.dib.2025.111526\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper describes the development of a new database comprising 1235 narrative essays written in Portuguese by 5th-grade students in Brazil. The corpus construction process involved three main steps: acquiring and transcribing photos of the essays, annotating them based on a real pre-defined correction rubric by experts considering four key writing competencies (formal language use, textual typology, thematic coherence, and textual cohesion), and resolving disagreements between the annotators. Two human experts manually evaluated each essay using a five-point scale (Level I: Complete lack of domain - Level V: Excellent mastery) aligned with the correction rubric. In cases of disagreement between the initial evaluators, a third expert facilitated the divergences resolution. To the best of our knowledge, this is the first publicly available dataset of elementary school essays in Brazilian Portuguese that features narrative writing samples with corresponding grades across multiple competencies commonly used in writing assessment. We believe this resource can contribute to developing automatic essay scoring systems tailored for evaluating narrative texts written in Brazilian Portuguese.</div></div>\",\"PeriodicalId\":10973,\"journal\":{\"name\":\"Data in Brief\",\"volume\":\"60 \",\"pages\":\"Article 111526\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2025-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data in Brief\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2352340925002586\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340925002586","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
A benchmark dataset of narrative student essays with multi-competency grades for automatic essay scoring in Brazilian Portuguese
This paper describes the development of a new database comprising 1235 narrative essays written in Portuguese by 5th-grade students in Brazil. The corpus construction process involved three main steps: acquiring and transcribing photos of the essays, annotating them based on a real pre-defined correction rubric by experts considering four key writing competencies (formal language use, textual typology, thematic coherence, and textual cohesion), and resolving disagreements between the annotators. Two human experts manually evaluated each essay using a five-point scale (Level I: Complete lack of domain - Level V: Excellent mastery) aligned with the correction rubric. In cases of disagreement between the initial evaluators, a third expert facilitated the divergences resolution. To the best of our knowledge, this is the first publicly available dataset of elementary school essays in Brazilian Portuguese that features narrative writing samples with corresponding grades across multiple competencies commonly used in writing assessment. We believe this resource can contribute to developing automatic essay scoring systems tailored for evaluating narrative texts written in Brazilian Portuguese.
期刊介绍:
Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.