神经网络还是语言特征？-比较不同机器学习方法对L1和l2学习者议论文文本质量特征的自动评估。

IF 8.5 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

International Journal of Artificial Intelligence in Education Pub Date : 2025-01-01 Epub Date: 2024-09-13 DOI:10.1007/s40593-024-00426-w

Julian F Lohmann, Fynn Junge, Jens Möller, Johanna Fleckenstein, Ruth Trüb, Stefan Keller, Thorben Jansen, Andrea Horbach

{"title":"神经网络还是语言特征？-比较不同机器学习方法对L1和l2学习者议论文文本质量特征的自动评估。","authors":"Julian F Lohmann, Fynn Junge, Jens Möller, Johanna Fleckenstein, Ruth Trüb, Stefan Keller, Thorben Jansen, Andrea Horbach","doi":"10.1007/s40593-024-00426-w","DOIUrl":null,"url":null,"abstract":"Recent investigations in automated essay scoring research imply that hybrid models, which combine feature engineering and the powerful tools of deep neural networks (DNNs), reach state-of-the-art performance. However, most of these findings are from holistic scoring tasks. In the present study, we use a total of four prompts from two different corpora consisting of both L1 and L2 learner essays annotated with trait scores (e.g., content, organization, and language quality). In our main experiments, we compare three variants of trait-specific models using different inputs: (1) models based on 220 linguistic features, (2) models using essay-level contextual embeddings from the distilled version of the pre-trained transformer BERT (DistilBERT), and (3) a hybrid model using both types of features. Results imply that when trait-specific models are trained based on a single resource, the feature-based models slightly outperform the embedding-based models. These differences are most prominent for the organization traits. The hybrid models outperform the single-resource models, indicating that linguistic features and embeddings indeed capture partially different aspects relevant for the assessment of essay traits. To gain more insights into the interplay between both feature types, we run addition and ablation tests for individual feature groups. Trait-specific addition tests across prompts indicate that the embedding-based models can most consistently be enhanced in content assessment when combined with morphological complexity features. Most consistent performance gains in the organization traits are achieved when embeddings are combined with length features, and most consistent performance gains in the assessment of the language traits when combined with lexical complexity, error, and occurrence features. Cross-prompt scoring again reveals slight advantages for the feature-based models.","PeriodicalId":46637,"journal":{"name":"International Journal of Artificial Intelligence in Education","volume":"35 3","pages":"1178-1217"},"PeriodicalIF":8.5000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12450813/pdf/","citationCount":"0","resultStr":"{\"title\":\"Neural Networks or Linguistic Features? - Comparing Different Machine-Learning Approaches for Automated Assessment of Text Quality Traits Among L1- and L2-Learners' Argumentative Essays.\",\"authors\":\"Julian F Lohmann, Fynn Junge, Jens Möller, Johanna Fleckenstein, Ruth Trüb, Stefan Keller, Thorben Jansen, Andrea Horbach\",\"doi\":\"10.1007/s40593-024-00426-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent investigations in automated essay scoring research imply that hybrid models, which combine feature engineering and the powerful tools of deep neural networks (DNNs), reach state-of-the-art performance. However, most of these findings are from holistic scoring tasks. In the present study, we use a total of four prompts from two different corpora consisting of both L1 and L2 learner essays annotated with trait scores (e.g., content, organization, and language quality). In our main experiments, we compare three variants of trait-specific models using different inputs: (1) models based on 220 linguistic features, (2) models using essay-level contextual embeddings from the distilled version of the pre-trained transformer BERT (DistilBERT), and (3) a hybrid model using both types of features. Results imply that when trait-specific models are trained based on a single resource, the feature-based models slightly outperform the embedding-based models. These differences are most prominent for the organization traits. The hybrid models outperform the single-resource models, indicating that linguistic features and embeddings indeed capture partially different aspects relevant for the assessment of essay traits. To gain more insights into the interplay between both feature types, we run addition and ablation tests for individual feature groups. Trait-specific addition tests across prompts indicate that the embedding-based models can most consistently be enhanced in content assessment when combined with morphological complexity features. Most consistent performance gains in the organization traits are achieved when embeddings are combined with length features, and most consistent performance gains in the assessment of the language traits when combined with lexical complexity, error, and occurrence features. Cross-prompt scoring again reveals slight advantages for the feature-based models.\",\"PeriodicalId\":46637,\"journal\":{\"name\":\"International Journal of Artificial Intelligence in Education\",\"volume\":\"35 3\",\"pages\":\"1178-1217\"},\"PeriodicalIF\":8.5000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12450813/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Artificial Intelligence in Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s40593-024-00426-w\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/9/13 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Artificial Intelligence in Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s40593-024-00426-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/13 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

最近对自动作文评分研究的调查表明，混合模型结合了特征工程和深度神经网络（dnn）的强大工具，达到了最先进的性能。然而，这些发现大多来自整体评分任务。在本研究中，我们总共使用了来自两个不同语料库的四个提示，这些语料库由L1和L2学习者的文章组成，并标注了特征分数（例如，内容、组织和语言质量）。在我们的主要实验中，我们比较了使用不同输入的特征特定模型的三种变体：(1)基于220个语言特征的模型，(2)使用来自预训练的变形BERT（蒸馏BERT）的蒸馏版本的文章级上下文嵌入的模型，以及(3)使用两种类型特征的混合模型。结果表明，当基于单个资源训练特定特征模型时，基于特征的模型略优于基于嵌入的模型。这些差异在组织特征上最为突出。混合模型优于单一资源模型，表明语言特征和嵌入确实捕获了与文章特征评估相关的部分不同方面。为了更深入地了解这两种特性类型之间的相互作用，我们对单个特性组运行加法和消融测试。跨提示的特征特定添加测试表明，当与形态复杂性特征相结合时，基于嵌入的模型可以最一致地增强内容评估。当嵌入与长度特征相结合时，在组织特征中获得最一致的性能增益，当与词汇复杂性、错误和出现特征相结合时，在语言特征的评估中获得最一致的性能增益。交叉提示评分再次揭示了基于特征的模型的轻微优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Neural Networks or Linguistic Features? - Comparing Different Machine-Learning Approaches for Automated Assessment of Text Quality Traits Among L1- and L2-Learners' Argumentative Essays.

Recent investigations in automated essay scoring research imply that hybrid models, which combine feature engineering and the powerful tools of deep neural networks (DNNs), reach state-of-the-art performance. However, most of these findings are from holistic scoring tasks. In the present study, we use a total of four prompts from two different corpora consisting of both L1 and L2 learner essays annotated with trait scores (e.g., content, organization, and language quality). In our main experiments, we compare three variants of trait-specific models using different inputs: (1) models based on 220 linguistic features, (2) models using essay-level contextual embeddings from the distilled version of the pre-trained transformer BERT (DistilBERT), and (3) a hybrid model using both types of features. Results imply that when trait-specific models are trained based on a single resource, the feature-based models slightly outperform the embedding-based models. These differences are most prominent for the organization traits. The hybrid models outperform the single-resource models, indicating that linguistic features and embeddings indeed capture partially different aspects relevant for the assessment of essay traits. To gain more insights into the interplay between both feature types, we run addition and ablation tests for individual feature groups. Trait-specific addition tests across prompts indicate that the embedding-based models can most consistently be enhanced in content assessment when combined with morphological complexity features. Most consistent performance gains in the organization traits are achieved when embeddings are combined with length features, and most consistent performance gains in the assessment of the language traits when combined with lexical complexity, error, and occurrence features. Cross-prompt scoring again reveals slight advantages for the feature-based models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Artificial Intelligence in Education COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-

CiteScore

11.10

自引率

6.10%

发文量

期刊介绍： IJAIED publishes papers concerned with the application of AI to education. It aims to help the development of principles for the design of computer-based learning systems. Its premise is that such principles involve the modelling and representation of relevant aspects of knowledge, before implementation or during execution, and hence require the application of AI techniques and concepts. IJAIED has a very broad notion of the scope of AI and of a ''computer-based learning system'', as indicated by the following list of topics considered to be within the scope of IJAIED: adaptive and intelligent multimedia and hypermedia systemsagent-based learning environmentsAIED and teacher educationarchitectures for AIED systemsassessment and testing of learning outcomesauthoring systems and shells for AIED systemsbayesian and statistical methodscase-based systemscognitive developmentcognitive models of problem-solvingcognitive tools for learningcomputer-assisted language learningcomputer-supported collaborative learningdialogue (argumentation, explanation, negotiation, etc.) discovery environments and microworldsdistributed learning environmentseducational roboticsembedded training systemsempirical studies to inform the design of learning environmentsenvironments to support the learning of programmingevaluation of AIED systemsformal models of components of AIED systemshelp and advice systemshuman factors and interface designinstructional design principlesinstructional planningintelligent agents on the internetintelligent courseware for computer-based trainingintelligent tutoring systemsknowledge and skill acquisitionknowledge representation for instructionmodelling metacognitive skillsmodelling pedagogical interactionsmotivationnatural language interfaces for instructional systemsnetworked learning and teaching systemsneural models applied to AIED systemsperformance support systemspractical, real-world applications of AIED systemsqualitative reasoning in simulationssituated learning and cognitive apprenticeshipsocial and cultural aspects of learningstudent modelling and cognitive diagnosissupport for knowledge building communitiessupport for networked communicationtheories of learning and conceptual changetools for administration and curriculum integrationtools for the guided exploration of information resources