{"title":"An n-gram based approach to the automatic classification of schoolchildren’s writing","authors":"Jordi Cicres, Sheila Queralt","doi":"10.35869/VIAL.V0I16.93","DOIUrl":null,"url":null,"abstract":"This article focuses on the analysis of schoolchildren’s writing (throughout the whole primary school period) using sets of morphological labels (n-grams). We analyzed the sets of bigrams and trigrams from a group of literary texts written by Catalan schoolchildren in order to identify which bigrams and trigrams can help discriminate between texts from the three cycles into which the Spanish primary education system is divided: lower cycle (6- and 7-year-olds), middle cycle (8- and 9-year- olds) and upper cycle (10- and 11-year-olds). The results obtained are close to 70% of correct classifications (77.5% bigrams and 68.6% trigrams), making this technique useful for automatic document classification by age.","PeriodicalId":42598,"journal":{"name":"Vial-Vigo International Journal of Applied Linguistics","volume":" ","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2019-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vial-Vigo International Journal of Applied Linguistics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.35869/VIAL.V0I16.93","RegionNum":4,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 3
Abstract
This article focuses on the analysis of schoolchildren’s writing (throughout the whole primary school period) using sets of morphological labels (n-grams). We analyzed the sets of bigrams and trigrams from a group of literary texts written by Catalan schoolchildren in order to identify which bigrams and trigrams can help discriminate between texts from the three cycles into which the Spanish primary education system is divided: lower cycle (6- and 7-year-olds), middle cycle (8- and 9-year- olds) and upper cycle (10- and 11-year-olds). The results obtained are close to 70% of correct classifications (77.5% bigrams and 68.6% trigrams), making this technique useful for automatic document classification by age.