Hilário Oliveira, Rafael Ferreira Mello, Bruno Alexandre Barreiros Rosa, Mladen Raković, Pericles Miranda, T. Cordeiro, Seiji Isotani, I. Bittencourt, D. Gašević
{"title":"Towards explainable prediction of essay cohesion in Portuguese and English","authors":"Hilário Oliveira, Rafael Ferreira Mello, Bruno Alexandre Barreiros Rosa, Mladen Raković, Pericles Miranda, T. Cordeiro, Seiji Isotani, I. Bittencourt, D. Gašević","doi":"10.1145/3576050.3576152","DOIUrl":null,"url":null,"abstract":"Textual cohesion is an essential aspect of a formally written text, related to linguistic mechanisms that connect elements such as words, sentences, and paragraphs. Several studies have proposed approaches to estimate textual cohesion in essays automatically. There is limited research that aims to study the extent to which the use of machine learning approaches can predict the textual cohesion of essays written in different languages (not just English). This paper reports on the findings of a study that aimed to propose and evaluate approaches that automatically estimate the cohesion of essays in Portuguese and English. The study proposed regression-based models grounded in conventional feature-based machine learning methods and deep learning-based pre-trained language models. The study also examined the explainability of automated approaches to scrutinize their predictions. We analyzed two datasets composed of 4,570 (Portuguese) and 7,101 (English) essays. The results demonstrate that a deep learning-based model achieved the best performance on both datasets with a moderate Pearson correlation with human-rated cohesion scores. However, the explainability of the automatic cohesion estimations based on conventional machine learning models offered a stronger potential than that of the deep learning model.","PeriodicalId":394433,"journal":{"name":"LAK23: 13th International Learning Analytics and Knowledge Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"LAK23: 13th International Learning Analytics and Knowledge Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3576050.3576152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Textual cohesion is an essential aspect of a formally written text, related to linguistic mechanisms that connect elements such as words, sentences, and paragraphs. Several studies have proposed approaches to estimate textual cohesion in essays automatically. There is limited research that aims to study the extent to which the use of machine learning approaches can predict the textual cohesion of essays written in different languages (not just English). This paper reports on the findings of a study that aimed to propose and evaluate approaches that automatically estimate the cohesion of essays in Portuguese and English. The study proposed regression-based models grounded in conventional feature-based machine learning methods and deep learning-based pre-trained language models. The study also examined the explainability of automated approaches to scrutinize their predictions. We analyzed two datasets composed of 4,570 (Portuguese) and 7,101 (English) essays. The results demonstrate that a deep learning-based model achieved the best performance on both datasets with a moderate Pearson correlation with human-rated cohesion scores. However, the explainability of the automatic cohesion estimations based on conventional machine learning models offered a stronger potential than that of the deep learning model.