Evaluating NLP models with written and spoken L2 samples

Research Methods in Applied Linguistics Pub Date : 2024-05-17 DOI:10.1016/j.rmal.2024.100120

Kristopher Kyle , Masaki Eguchi

引用次数: 0

Abstract

The use of natural language processing tools such as part-of-speech taggers and syntactic parsers are increasingly being used in studies of second language (L2) proficiency and development. However, relatively little work has focused on reporting on the accuracy of these tools or optimizing their performance in L2 contexts. While some studies reference the published overall accuracy of a particular tool or include a small-scale accuracy analysis, very few (if any) studies provide a comprehensive account of the performance of taggers and parsers across a range of written and spoken registers. In this study, we provide a large-scale accuracy analysis of popular taggers and parsers across L1 and L2 written and spoken texts, both when default and L2-optimized models are used. Accuracy is examined both at the feature level (e.g., identifying adjective-noun relationships) and the text level (e.g., mean mutualinformation scores). The results highlight the strength and weaknesses of these tools.

查看原文本刊更多论文

利用书面和口语 L2 样本评估 NLP 模型

在第二语言（L2）能力和发展的研究中，越来越多地使用自然语言处理工具，如语音部分标记器和句法分析器。然而，关于这些工具的准确性或优化其在第二语言语境中的表现的报告却相对较少。虽然有些研究参考了已发表的特定工具的总体准确性，或包含了小范围的准确性分析，但很少有研究（如果有的话）能全面说明标记器和分析器在一系列书面和口语语域中的表现。在本研究中，我们对使用默认模型和 L2 优化模型的 L1 和 L2 书面和口语文本中流行的标记符号和解析器进行了大规模的准确性分析。准确性在特征层面（如识别形容词-名词关系）和文本层面（如平均互信息得分）进行了检验。结果凸显了这些工具的优缺点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Research Methods in Applied Linguistics

CiteScore

4.10

自引率

0.00%

发文量