Validation of online L2 vocabulary tests: Test performance across laboratory, virtual meeting, and crowdsourcing contexts

Research Methods in Applied Linguistics Pub Date : 2025-07-24 DOI:10.1016/j.rmal.2025.100246

Ayako Aizawa

{"title":"Validation of online L2 vocabulary tests: Test performance across laboratory, virtual meeting, and crowdsourcing contexts","authors":"Ayako Aizawa","doi":"10.1016/j.rmal.2025.100246","DOIUrl":null,"url":null,"abstract":"<div><div>Online data collection has become increasingly common in diverse fields, including marketing and psychology, and is gaining ground in applied linguistics. Although concerns have been raised about the validity and reliability of online assessments, previous research on online data collection suggests that, with appropriate precautions, data quality can be comparable to that obtained using in-person methods. However, the validity and reliability of online vocabulary tests have not been thoroughly investigated. To fill this gap, the present study compared the results of online vocabulary tests with those of face-to-face administration. In this study, 159 Japanese university students took the Vocabulary Size Test and Phrasal Vocabulary Size Test in three environments: (a) in-person (laboratory), (b) online with supervision (virtual meeting), and (c) online without supervision (crowdsourcing). Reliability and validity were analysed, and results showed that test performance was largely comparable: test environment and presence or absence of supervision had minimal effects on three out of the four tests, with only the meaning recall format of the Vocabulary Size Test showing significantly inflated scores in the crowdsourcing condition. While the findings suggest that pooling data online and aggregating data from different environments are feasible for vocabulary testing research, they also highlight the need for careful planning in research design to achieve a desirable environment for the participants to take the tests.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100246"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Methods in Applied Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772766125000679","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Online data collection has become increasingly common in diverse fields, including marketing and psychology, and is gaining ground in applied linguistics. Although concerns have been raised about the validity and reliability of online assessments, previous research on online data collection suggests that, with appropriate precautions, data quality can be comparable to that obtained using in-person methods. However, the validity and reliability of online vocabulary tests have not been thoroughly investigated. To fill this gap, the present study compared the results of online vocabulary tests with those of face-to-face administration. In this study, 159 Japanese university students took the Vocabulary Size Test and Phrasal Vocabulary Size Test in three environments: (a) in-person (laboratory), (b) online with supervision (virtual meeting), and (c) online without supervision (crowdsourcing). Reliability and validity were analysed, and results showed that test performance was largely comparable: test environment and presence or absence of supervision had minimal effects on three out of the four tests, with only the meaning recall format of the Vocabulary Size Test showing significantly inflated scores in the crowdsourcing condition. While the findings suggest that pooling data online and aggregating data from different environments are feasible for vocabulary testing research, they also highlight the need for careful planning in research design to achieve a desirable environment for the participants to take the tests.

查看原文本刊更多论文

在线第二语言词汇测试的验证：跨实验室、虚拟会议和众包环境的测试性能

在线数据收集在包括市场营销和心理学在内的各个领域变得越来越普遍，并且在应用语言学方面也取得了进展。尽管人们对在线评估的有效性和可靠性提出了担忧，但之前关于在线数据收集的研究表明，通过适当的预防措施，数据质量可以与使用面对面方法获得的数据相当。然而，在线词汇测试的效度和信度尚未得到充分的研究。为了填补这一空白，本研究将在线词汇测试的结果与面对面管理的结果进行了比较。本研究以159名日本大学生为研究对象，在(a)面对面（实验室）、(b)有监督的在线（虚拟会议）和(c)无监督的在线（众包）三种环境下进行了词汇量测试和短语词汇量测试。信度和效度分析的结果表明，测试表现在很大程度上是可比性的：测试环境和监督的存在与否对四个测试中的三个测试的影响最小，只有词汇量测试的意义回忆格式在众包条件下显示显着膨胀的分数。虽然研究结果表明，在线汇集数据和汇总来自不同环境的数据对于词汇测试研究是可行的，但它们也强调了在研究设计中需要仔细规划，以实现参与者参加测试的理想环境。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Research Methods in Applied Linguistics

CiteScore

4.10

自引率

0.00%

发文量