A comparative analysis of AI-generated texts, corpus data, and speaker judgments: Subject honorification patterns in Korean

IF 2.1

Applied Corpus Linguistics Pub Date : 2026-04-01 Epub Date: 2025-11-19 DOI:10.1016/j.acorp.2025.100171

Yejin Jung , Kathy MinHye Kim

{"title":"A comparative analysis of AI-generated texts, corpus data, and speaker judgments: Subject honorification patterns in Korean","authors":"Yejin Jung , Kathy MinHye Kim","doi":"10.1016/j.acorp.2025.100171","DOIUrl":null,"url":null,"abstract":"<div><div>Technological innovations can greatly enhance second language (L2) pragmatics instruction by providing learners with more natural and authentic communication opportunities. As Generative Artificial Intelligence (GenAI) tools become increasingly integrated into L2 teaching, questions arise as to whether they provide pedagogically appropriate input and how they can be used for inductive instruction (e.g., Data-driven Learning). To advance meaningful instructional approaches to Korean honorifics, understanding the nature of input is key; particularly, what exemplars of honorifics are available through GenAI and spoken corpora and how L2 learners perceive and evaluate different honorific forms. In response to these inquiries, we analyzed patterns of subject-verb honorific agreement in outputs from <em>ChatGPT 4.0</em> and the NIKL Korean Dialogue Summarization Corpus (Study 1), and conducted an acceptability judgment test of four subject-verb honorific (mis)match forms (Study 2). We found that ChatGPT predominantly favored a subject-verb matched form, whereas corpus data reflected the highly complex, context-dependent use and variations of honorifics. L1 judgments aligned more closely with the corpus results, reflecting sensitivity to nuanced (mis)match forms, whereas L2 judgments closely mirrored ChatGPT’s patterns, lacking sensitivity beyond the matched forms. These results underscore the challenges associated with Korean honorification for both learners and educators, highlighting the need for more refined inductive teaching.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100171"},"PeriodicalIF":2.1000,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Corpus Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S266679912500053X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/11/19 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Technological innovations can greatly enhance second language (L2) pragmatics instruction by providing learners with more natural and authentic communication opportunities. As Generative Artificial Intelligence (GenAI) tools become increasingly integrated into L2 teaching, questions arise as to whether they provide pedagogically appropriate input and how they can be used for inductive instruction (e.g., Data-driven Learning). To advance meaningful instructional approaches to Korean honorifics, understanding the nature of input is key; particularly, what exemplars of honorifics are available through GenAI and spoken corpora and how L2 learners perceive and evaluate different honorific forms. In response to these inquiries, we analyzed patterns of subject-verb honorific agreement in outputs from ChatGPT 4.0 and the NIKL Korean Dialogue Summarization Corpus (Study 1), and conducted an acceptability judgment test of four subject-verb honorific (mis)match forms (Study 2). We found that ChatGPT predominantly favored a subject-verb matched form, whereas corpus data reflected the highly complex, context-dependent use and variations of honorifics. L1 judgments aligned more closely with the corpus results, reflecting sensitivity to nuanced (mis)match forms, whereas L2 judgments closely mirrored ChatGPT’s patterns, lacking sensitivity beyond the matched forms. These results underscore the challenges associated with Korean honorification for both learners and educators, highlighting the need for more refined inductive teaching.

查看原文本刊更多论文

人工智能生成文本、语料库数据和说话人判断的比较分析：韩国语的主语敬语模式

技术创新可以为学习者提供更自然、更真实的交际机会，从而极大地加强第二语言语用教学。随着生成式人工智能（GenAI）工具越来越多地融入第二语言教学，出现了一些问题，如它们是否提供了教学上适当的输入，以及如何将它们用于归纳教学（例如，数据驱动学习）。为了推进有意义的韩国语敬语教学方法，理解输入的本质是关键；特别是，通过GenAI和口语语料库可以获得哪些敬语范例，以及二语学习者如何感知和评估不同的敬语形式。针对这些问题，我们分析了ChatGPT 4.0和NIKL韩语对话摘要语料库输出的主、动词敬语一致性模式（研究1），并对四种主、动词敬语（错误）匹配形式进行了可接受性判断测试（研究2）。我们发现ChatGPT主要倾向于主谓匹配的形式，而语料库数据反映了高度复杂的、依赖于上下文的敬语使用和变化。L1判断与语料库结果更接近，反映了对细微差别（错误）匹配形式的敏感性，而L2判断与ChatGPT的模式密切相关，缺乏匹配形式之外的敏感性。这些结果强调了韩语敬语对学习者和教育者的挑战，强调了对更精细的归纳教学的需要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊