{"title":"在线第二语言词汇测试的验证:跨实验室、虚拟会议和众包环境的测试性能","authors":"Ayako Aizawa","doi":"10.1016/j.rmal.2025.100246","DOIUrl":null,"url":null,"abstract":"<div><div>Online data collection has become increasingly common in diverse fields, including marketing and psychology, and is gaining ground in applied linguistics. Although concerns have been raised about the validity and reliability of online assessments, previous research on online data collection suggests that, with appropriate precautions, data quality can be comparable to that obtained using in-person methods. However, the validity and reliability of online vocabulary tests have not been thoroughly investigated. To fill this gap, the present study compared the results of online vocabulary tests with those of face-to-face administration. In this study, 159 Japanese university students took the Vocabulary Size Test and Phrasal Vocabulary Size Test in three environments: (a) in-person (laboratory), (b) online with supervision (virtual meeting), and (c) online without supervision (crowdsourcing). Reliability and validity were analysed, and results showed that test performance was largely comparable: test environment and presence or absence of supervision had minimal effects on three out of the four tests, with only the meaning recall format of the Vocabulary Size Test showing significantly inflated scores in the crowdsourcing condition. While the findings suggest that pooling data online and aggregating data from different environments are feasible for vocabulary testing research, they also highlight the need for careful planning in research design to achieve a desirable environment for the participants to take the tests.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100246"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Validation of online L2 vocabulary tests: Test performance across laboratory, virtual meeting, and crowdsourcing contexts\",\"authors\":\"Ayako Aizawa\",\"doi\":\"10.1016/j.rmal.2025.100246\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Online data collection has become increasingly common in diverse fields, including marketing and psychology, and is gaining ground in applied linguistics. Although concerns have been raised about the validity and reliability of online assessments, previous research on online data collection suggests that, with appropriate precautions, data quality can be comparable to that obtained using in-person methods. However, the validity and reliability of online vocabulary tests have not been thoroughly investigated. To fill this gap, the present study compared the results of online vocabulary tests with those of face-to-face administration. In this study, 159 Japanese university students took the Vocabulary Size Test and Phrasal Vocabulary Size Test in three environments: (a) in-person (laboratory), (b) online with supervision (virtual meeting), and (c) online without supervision (crowdsourcing). Reliability and validity were analysed, and results showed that test performance was largely comparable: test environment and presence or absence of supervision had minimal effects on three out of the four tests, with only the meaning recall format of the Vocabulary Size Test showing significantly inflated scores in the crowdsourcing condition. While the findings suggest that pooling data online and aggregating data from different environments are feasible for vocabulary testing research, they also highlight the need for careful planning in research design to achieve a desirable environment for the participants to take the tests.</div></div>\",\"PeriodicalId\":101075,\"journal\":{\"name\":\"Research Methods in Applied Linguistics\",\"volume\":\"4 3\",\"pages\":\"Article 100246\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Research Methods in Applied Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772766125000679\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Methods in Applied Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772766125000679","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Validation of online L2 vocabulary tests: Test performance across laboratory, virtual meeting, and crowdsourcing contexts
Online data collection has become increasingly common in diverse fields, including marketing and psychology, and is gaining ground in applied linguistics. Although concerns have been raised about the validity and reliability of online assessments, previous research on online data collection suggests that, with appropriate precautions, data quality can be comparable to that obtained using in-person methods. However, the validity and reliability of online vocabulary tests have not been thoroughly investigated. To fill this gap, the present study compared the results of online vocabulary tests with those of face-to-face administration. In this study, 159 Japanese university students took the Vocabulary Size Test and Phrasal Vocabulary Size Test in three environments: (a) in-person (laboratory), (b) online with supervision (virtual meeting), and (c) online without supervision (crowdsourcing). Reliability and validity were analysed, and results showed that test performance was largely comparable: test environment and presence or absence of supervision had minimal effects on three out of the four tests, with only the meaning recall format of the Vocabulary Size Test showing significantly inflated scores in the crowdsourcing condition. While the findings suggest that pooling data online and aggregating data from different environments are feasible for vocabulary testing research, they also highlight the need for careful planning in research design to achieve a desirable environment for the participants to take the tests.