Automatic language ability assessment method based on natural language processing

Natural Language Processing Journal Pub Date : 2024-08-06 DOI:10.1016/j.nlp.2024.100094

Nonso Nnamoko , Themis Karaminis , Jack Procter , Joseph Barrowclough , Ioannis Korkontzelos

{"title":"Automatic language ability assessment method based on natural language processing","authors":"Nonso Nnamoko , Themis Karaminis , Jack Procter , Joseph Barrowclough , Ioannis Korkontzelos","doi":"10.1016/j.nlp.2024.100094","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and Objectives:</h3><p>The Wechsler Abbreviated Scales of Intelligence second edition (WASI-II) is a standardised assessment tool that is widely used to assess cognitive ability in clinical, research, and educational settings. In one of the components of this assessment, referred to as the Vocabulary task, the assessed individuals are presented with words (called stimulus items), and asked to explain what each word mean. Their responses are hand-scored based on a list of pre-rated sample responses [0-Point (poor), 1-Point (moderate), or 2-Point (excellent)] that is provided in the accompanying manual of WASI-II. This scoring method is time-consuming, and scoring of responses that do not fully match the pre-rated ones may vary between individual scorers. In this study, we aim to use natural language processing techniques to automate the scoring procedure and make it more time-efficient and reliable (objective).</p></div><div><h3>Methods:</h3><p>Utilising five different word embeddings (Word2vec, Global Vectors, Bidirectional Encoder Representations from Transformers, Generative Pre-trained Transformer 2, and Embeddings from Language Model), we transformed stimulus items and pre-rated responses from the WASI-II Vocabulary task into machine-readable vectors. We measured distance with cosine similarity, evaluating each model against a rational-expectations hypothesis that vector representations for stimuli should align closely with 2-Point responses and diverge from 0-Point responses. Assessment involved frequency of consistent representation and the Pearson correlation coefficient, examining overall consistency with the manual’s ranking across all items and sample responses.</p></div><div><h3>Results:</h3><p>The Word2vec model showed the highest consistency with the WASI-II manual (frequency = 20 out of 27; Pearson Correlation coefficient = 0.61) while Bidirectional Encoder Representations from Transformers was the worst performing model (frequency = 5; Pearson Correlation coefficient = 0.05). The consistency of these two models with the WASI-II manual differed significantly, Z = 2.282, p = 0.022.</p></div><div><h3>Conclusions:</h3><p>Our results showed that the scoring of the WASI-II Vocabulary task can be automated with moderate accuracy relying upon off-the-shelf embedding models. These results are promising, and could be improved further by considering alternative vector dimensions, similarity metrics, and data preprocessing techniques to those used in this study.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"8 ","pages":"Article 100094"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000426/pdfft?md5=3d77e8547a0dc7357280cecba9e28c62&pid=1-s2.0-S2949719124000426-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719124000426","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background and Objectives:

The Wechsler Abbreviated Scales of Intelligence second edition (WASI-II) is a standardised assessment tool that is widely used to assess cognitive ability in clinical, research, and educational settings. In one of the components of this assessment, referred to as the Vocabulary task, the assessed individuals are presented with words (called stimulus items), and asked to explain what each word mean. Their responses are hand-scored based on a list of pre-rated sample responses [0-Point (poor), 1-Point (moderate), or 2-Point (excellent)] that is provided in the accompanying manual of WASI-II. This scoring method is time-consuming, and scoring of responses that do not fully match the pre-rated ones may vary between individual scorers. In this study, we aim to use natural language processing techniques to automate the scoring procedure and make it more time-efficient and reliable (objective).

Methods:

Utilising five different word embeddings (Word2vec, Global Vectors, Bidirectional Encoder Representations from Transformers, Generative Pre-trained Transformer 2, and Embeddings from Language Model), we transformed stimulus items and pre-rated responses from the WASI-II Vocabulary task into machine-readable vectors. We measured distance with cosine similarity, evaluating each model against a rational-expectations hypothesis that vector representations for stimuli should align closely with 2-Point responses and diverge from 0-Point responses. Assessment involved frequency of consistent representation and the Pearson correlation coefficient, examining overall consistency with the manual’s ranking across all items and sample responses.

Results:

The Word2vec model showed the highest consistency with the WASI-II manual (frequency = 20 out of 27; Pearson Correlation coefficient = 0.61) while Bidirectional Encoder Representations from Transformers was the worst performing model (frequency = 5; Pearson Correlation coefficient = 0.05). The consistency of these two models with the WASI-II manual differed significantly, Z = 2.282, p = 0.022.

Conclusions:

Our results showed that the scoring of the WASI-II Vocabulary task can be automated with moderate accuracy relying upon off-the-shelf embedding models. These results are promising, and could be improved further by considering alternative vector dimensions, similarity metrics, and data preprocessing techniques to those used in this study.

Abstract Image

查看原文本刊更多论文

基于自然语言处理的语言能力自动评估方法

背景和目的：韦氏智力缩略量表第二版（WASI-II）是一种标准化的评估工具，广泛用于临床、研究和教育环境中的认知能力评估。在这一评估的一个组成部分，即词汇任务中，被评估者会被出示一些单词（称为刺激项目），并被要求解释每个单词的含义。他们的回答将根据 WASI-II 随附手册中提供的一份预先评定的回答示例清单[0 分（差）、1 分（中等）或 2 分（优）]进行人工评分。这种评分方法比较耗时，而且不同的评分员对不完全符合预评分的回答的评分可能会有所不同。方法：利用五种不同的词嵌入（Word2vec、Global Vectors、Bidirectional Encoder Representations from Transformers、Generative Pre-trained Transformer 2 和 Embeddings from Language Model），我们将 WASI-II 词汇任务中的刺激项目和预评分回答转换成机器可读向量。我们用余弦相似度来测量距离，并根据理性预期假设对每个模型进行评估，即刺激的向量表征应与 2 分反应密切吻合，而与 0 分反应相背离。结果：Word2vec 模型与 WASI-II 手册的一致性最高（频率 = 27 中的 20；皮尔逊相关系数 = 0.61），而来自变形金刚的双向编码器表征是表现最差的模型（频率 = 5；皮尔逊相关系数 = 0.05）。这两个模型与 WASI-II 手册的一致性差异显著，Z = 2.282，P = 0.022。结论：我们的研究结果表明，依靠现成的嵌入模型，WASI-II 词汇任务的评分可以实现自动化，且准确度适中。这些结果很有希望，而且可以通过考虑本研究中使用的其他向量维度、相似度指标和数据预处理技术来进一步改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Natural Language Processing Journal

自引率

0.00%

发文量