{"title":"Frequentist vs. Bayesian methods: Choosing appropriate statistical methods in second language research","authors":"Shotaro Ueno , Osamu Takeuchi","doi":"10.1016/j.rmal.2025.100256","DOIUrl":"10.1016/j.rmal.2025.100256","url":null,"abstract":"<div><div>Null hypothesis significance testing (NHST) with <em>p</em>-values is one of the most commonly used statistical procedures in second language research. This statistical approach follows the principles of the frequentist method, and although it has various advantages, some researchers have noted its limitations and proposed Bayesian methods as an alternative. To contribute to this debate, this article introduces the basic principles of Bayesian statistics, specifically Bayesian hypothesis testing (BHT), and explores its advantages and limitations compared to the frequentist approach, particularly NHST. The article first outlines the foundational concepts of NHST and reviews the main criticisms associated with its use. It then presents the core ideas of Bayesian methods, with a primary focus on the Bayes factor, followed by a description of general procedures for conducting BHT and an overview of its potential benefits in applied research contexts. Additionally, several challenges and criticisms of Bayesian methods are discussed, emphasizing that they are not always a superior alternative. Based on these discussions, the article argues that both frequentist and Bayesian methods have strengths and limitations, and that specific research goals, questions, and contexts should guide the choice of statistical framework.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100256"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144931891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward research inclusivity in applied linguistics: Methodological considerations for inclusive online experimentation","authors":"Kathy Kim, Erning Henry Chen","doi":"10.1016/j.rmal.2025.100255","DOIUrl":"10.1016/j.rmal.2025.100255","url":null,"abstract":"<div><div>Building on the growing shift toward remote research and the need for inclusive, participant-centered methodologies, this commentary explores the potential of online experimentation in diversification—both at the stages of recruitment and experimental implementation. We first discuss recruitment approaches, including crowdsourcing and social media, and their potential—along with their limitations—for broadening participation among underrepresented learner populations. We then draw on a framework grounded in three interconnected principles—Value, Trust, and Agency—to explore ways inclusivity can be meaningfully incorporated into experimental implementation. Informed by research across disciplines, we offer practical suggestions for designing online experiments that aim to be both methodologically robust and responsive to the varied realities of participants.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100255"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144931892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James Turner , Alison Porter , Suzanne Graham , Travis Ralph-Donaldson , Heike Krüsemann , Pengchong Zhang , Kate Borthwick
{"title":"Evaluating the scoring system of an AI-integrated app to assess foreign language phonological decoding","authors":"James Turner , Alison Porter , Suzanne Graham , Travis Ralph-Donaldson , Heike Krüsemann , Pengchong Zhang , Kate Borthwick","doi":"10.1016/j.rmal.2025.100257","DOIUrl":"10.1016/j.rmal.2025.100257","url":null,"abstract":"<div><div>Phonological decoding in a foreign language (FL)—a two-part process involving first the ability to map written symbols to their corresponding sounds and second to pronounce them intelligibly—is foundational for reading and vocabulary acquisition. Yet assessing this skill efficiently and at scale in young learners remains a persistent challenge. Here, we introduce and evaluate the accuracy and effectiveness of a novel method for assessing FL phonological decoding using an AI-driven app that automatically scores children's pronunciation of symbol-sound correspondences. In a study involving 254 learners of French and Spanish (aged 10–11) across five UK primary schools, pupils completed a read-aloud task (14 symbol-sound correspondences) that was scored by the app’s automatic speech recognition (ASR) technology. The validity of these automated scores was tested by fitting them as independent variables in regression models predicting human auditory coding. The multiple significant relationships between automated and human scores that were established indicate that there is great potential for ASR-based tools to reliably assess phonological decoding in this population. These findings provide the first large-scale empirical validation of an AI-based assessment of FL decoding in children, opening new possibilities, applicable to a range of languages being learnt, for scalable and efficient assessment.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100257"},"PeriodicalIF":0.0,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144925646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automate the ‘boring bits’: An assessment of AI-assisted systematic review (AIASR)","authors":"Timothy Hampson , Kelly Cargos , Jim McKinley","doi":"10.1016/j.rmal.2025.100258","DOIUrl":"10.1016/j.rmal.2025.100258","url":null,"abstract":"<div><div>Systematic review is a powerful tool for disseminating the findings of research, particularly in applied linguistics where we hope to provide insights for practising language teachers. Yet, systematic review is also often prohibitively time-consuming, particularly for small, underfunded teams or solo researchers. In this study, we explore the use of generative artificial intelligence to ease the burden of screening and organising papers. Our findings suggest that AI excels in some tasks, particularly when those tasks involve explicitly stated information, and struggles in others, particularly when information is more implicit. A comparison of generative artificial intelligence for filtering papers with ASReview, a popular non-generative tool, reveals trade-offs, with Generative AI being replicable and more efficient, but with concerns about accuracy. We conclude that generative artificial intelligence can be a useful tool for systematic review but requires rigorous validation before use. We conclude by emphasising the importance of testing AI for systematic review tasks and exploring how this can practically be achieved.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100258"},"PeriodicalIF":0.0,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144922554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating synthetic data for CALL research with GenAI: A proof-of-concept study","authors":"Dennis Foung , Lucas Kohnke","doi":"10.1016/j.rmal.2025.100248","DOIUrl":"10.1016/j.rmal.2025.100248","url":null,"abstract":"<div><div>Popular tools like ChatGPT have placed generative artificial intelligence (GenAI) in the spotlight in recent years. One use of GenAI tools is to generate simulated data—or synthetic data—when the full scope of the required microdata is unavailable. Despite suggestions for educational researchers to use synthetic data, little (if any) computer-assisted language learning (CALL) research has used synthetic data thus far. This study addresses this research gap by exploring the possibility of using synthetic datasets in CALL. The publicly available dataset resembles a typical study with a small sample size (<em>n</em> = 55) performed using a CALL platform. Two synthetic datasets are generated from the original datasets using the <em>synthpop</em> package and generative adversarial networks (GAN) in <em>R</em> (via the <em>RGAN</em> package), which are both common synthetic data generation methods. This study evaluates the synthetic datasets by (a) comparing the distribution between the synthetic and original datasets, (b) examining the model parameters of the rebuilt linear models using the synthetic and original datasets, and (c) examining the privacy disclosure metrics. The results suggest that <em>synthpop</em> better represents the original data and preserves privacy. Notably, the GAN-generated dataset does not produce satisfactory results. This demonstrates GAN’s key challenges alongside the potential benefits of generating synthetic data with <em>synthpop</em>.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100248"},"PeriodicalIF":0.0,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144895543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Eight reasons not to test for baseline group equivalence in a parallel groups pretest-posttest study","authors":"Seth Lindstromberg","doi":"10.1016/j.rmal.2025.100254","DOIUrl":"10.1016/j.rmal.2025.100254","url":null,"abstract":"<div><div>The parallel groups pretest-posttest design has long been prominent in quantitative research of SLA. Ideally, groups are formed by random assignment of individuals. But with or without random assignment, groups may differ substantially on key pre-treatment measures such as pretest scores. When faced with non-equivalent groups, many SLA researchers have tested the difference(s) for statistical significance in the belief that <em>p</em> > .05 allows a main statistical analysis which assumes that the pretreatment group means do not differ. The literature of applied statistics includes numerous accounts of why such “baseline equivalence” (BE) testing is misguided. Yet BE tests continue to be reported in SLA journals at all levels of reputation. This paper describes BE testing, reviews its flaws, shows that the practice persists, and discusses possible reasons why BE tests may be thought to be legitimate, and considers options in study planning that lead to superior results and avoid conditions that appear to make BE testing necessary.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100254"},"PeriodicalIF":0.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144893404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards a preschooler corpus of Italian: an experimental journey","authors":"Chiara Bolognesi, Alessandra Cinini, Paola Cutugno, Melissa Ferretti, Davide Chiarella","doi":"10.1016/j.rmal.2025.100252","DOIUrl":"10.1016/j.rmal.2025.100252","url":null,"abstract":"<div><div>The paper surveys the process and reasonings behind the written sources section of the Corpus of Italian for Preschoolers (CIP), a corpus collecting child-directed speech targeted at Italian children aged 3–6. Beginning from an overview of the available child-speech and child-directed speech corpora, the article underlines the need for an Italian Corpus focusing on children’s passive vocabulary and how such a tool would be useful for future comparative studies on children’s own production and as a tool for professionals in children’s needs. The CIP aims at collecting 250,000 linguistic tokens across a selection of different sources (Written, Spoken, Signed) gathered with the help of schools and families. This paper focuses specifically on the selection criteria for the written sources and the first steps of their linguistic processing, explaining through a set of three experiments how three different linguistic annotation tools performed on the tasks of tokenizing, lemmatizing and POS-tagging three different children’s literature texts. The last part presents the results of the experiments with insight on the NLP tools’ performances, as well as the reasons for our choice of tool for the large-scale annotation process and the still-ongoing challenges for the finalization of our corpus.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100252"},"PeriodicalIF":0.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144893405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated vs. manual linguistic annotation for assessing pragmatic competence in English classes","authors":"Mohsen Mahmoudi-Dehaki , Nasim Nasr-Esfahani","doi":"10.1016/j.rmal.2025.100253","DOIUrl":"10.1016/j.rmal.2025.100253","url":null,"abstract":"<div><div>Evaluating pragmatic competence remains a complex and critical challenge in applied linguistics, particularly in English as a Foreign Language (EFL) contexts. This study aims to address this gap by examining the potential of automating pragmatic competence assessment using AI-powered text analytics. Employing an explanatory sequential mixed-methods design, the quantitative phase compares the accuracy of automated versus manual linguistic annotation in evaluating the pragmatic skills of EFL learners. In the qualitative phase, factors influencing the accuracy of manual annotation are explored. For automated annotation, ChatGPT-4 Omni (ChatGPT-4o) processed 116 transcriptions representing participants' performances across six verbal discourse completion tasks (DCTs), encompassing prosodic features and pragmatic functions such as requesting favors, apologizing, suggesting, complaining, inviting, and refusing invitations. The AI model was fine-tuned using a human-in-the-loop approach, incorporating ensemble techniques such as few-shot learning and instructional prompts. Manual annotation employed trained EFL teachers using standardized assessment cards. Results indicate that automated annotation surpasses manual accuracy in evaluating most pragmatic components, except cultural norms, where both methods exhibit limitations. Focus group findings reveal that annotator bias, fatigue, technological influences, linguistic background differences, and subjectivity impact manual annotation accuracy. This interdisciplinary investigation expands the methodological toolkit for pragmatic competence evaluation and holds significant implications for fields such as digital humanities, computational pragmatics, language education, machine learning, and natural language processing.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100253"},"PeriodicalIF":0.0,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144863503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The L-maze task and web-based data collection in second language sentence processing research","authors":"Hiroki Fujita","doi":"10.1016/j.rmal.2025.100251","DOIUrl":"10.1016/j.rmal.2025.100251","url":null,"abstract":"<div><div>In recent years, an increasing number of studies on sentence processing have used web-based data collection and the L-maze task. Web-based data collection has become particularly popular since the coronavirus pandemic, when access to laboratory-based experiments was severely restricted. In the L-maze task, participants read sentences word by word, with each word presented alongside a pseudoword that does not continue the sentence. During the task, participants need to select a word that continues the sentence. Previous research has shown that both web-based data collection and the L-maze task are useful for investigating first language sentence processing. However, little is known about their usefulness for second language sentence processing research. To address this gap in the literature, I conducted replication experiments using the web-based L-maze and self-paced reading (SPR) tasks, and investigated whether these tasks could detect garden path and gender mismatch effects during the processing of locally ambiguous sentences. The results showed these effects in both tasks, with the effects being more localised in the L-maze task. A prospective power analysis suggested that these tasks would be effective for detecting these effects, and that the L-maze task would be more reliable than the SPR task for detecting gender mismatch effects. These findings suggest that web-based data collection and the L-maze task are potentially useful tools for investigating second language sentence processing.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100251"},"PeriodicalIF":0.0,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144863502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing lab-based and remote data collection methods in second language acquisition research. A close replication study","authors":"Kevin McManus , Katherine Kerschen , Yulia Khoruzhaya , Jingyuan Zhuang","doi":"10.1016/j.rmal.2025.100249","DOIUrl":"10.1016/j.rmal.2025.100249","url":null,"abstract":"<div><div>Collecting data remotely is now increasingly common in second language acquisition (SLA) research. However, very little is known about extent to which research data collected remotely are comparable and generalizable to those collected face-to-face in a lab. To address this question, we closely replicated a theoretically and academically impactful line of SLA research, with the seminal study by Ellis and Sagarra (2010b) being frequently cited in the field. In addition, this study has been replicated three times (Ellis et al., 2014; Ellis & Sagarra, 2011; McManus et al., 2025), with all studies being conducted in lab-based contexts. In the current study, our close replication modified one variable to better understand the comparability of research data collected remotely and in a lab. Findings showed that use of the same materials in lab-based and remote modalities did not significantly impact the study’s conclusions, indicating high levels of comparability between these data collection modalities for accuracy data. The implications of these results for the use of remote data collection methods are discussed.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 3","pages":"Article 100249"},"PeriodicalIF":0.0,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144781784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}