{"title":"Exploring the potential of multiple CORE meanings in learning L2 verb-noun collocations: A corpus-based discovery learning approach","authors":"Satoshi Yamagata , Gareth Carrol , Crayton Walker","doi":"10.1016/j.acorp.2025.100166","DOIUrl":"10.1016/j.acorp.2025.100166","url":null,"abstract":"<div><div>Collocational knowledge is a critical component of second language (L2) learning. However, L2 learners often rely on first language (L1) translations, leading to the production of deviant collocations. To address this issue, this study investigates the pedagogical potential of teaching collocations through multiple CORE meanings (capitalised), in contrast to approaches that rely on a single core meaning of verbal nodes. Multiple CORE meanings are characterised not only by their typical nominal collocates, but also by other aspects of how they typically pattern. While previous accounts have tended to treat high-frequency verbal nodes as polysemous, we argue that many verbal nodes are better understood as examples of homonymy, which carries several semantically distinct CORE meanings (i.e., ‘draw’ meaning ‘to pull or move something’, ‘to divide something into two’, or ‘to make a picture’), and that this might offer a more logical way for learners to discover and learn collocational patterns. We first identified CORE meanings for six high-frequency verbal nodes through corpus-based analysis, and then tested their pedagogical potential with 240 EFL high school learners. Learners were taught verb-noun collocations using either a CORE meaning-based discovery approach or conventional L1 translations, and they completed a pre-test and two post-tests assessing productive recall and collocability judgement. Results showed that CORE meaning-based instruction enhanced productive recall, though the advantage did not extend to collocability judgement. These findings suggest that presenting learners with multiple CORE meanings can be a promising way to strengthen L2 collocational competence, although further refinement in instructional design is warranted.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100166"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Not so fast? A comparative study of pre-service teachers’ lesson design using corpora and generative artificial intelligence","authors":"Agnieszka Leńko-Szymańska","doi":"10.1016/j.acorp.2025.100168","DOIUrl":"10.1016/j.acorp.2025.100168","url":null,"abstract":"<div><div>The integration of corpora and generative artificial intelligence (GenAI) in language teacher education presents both opportunities and challenges. While corpus-based approaches have long been promoted for data-driven learning (DDL), their adoption remains limited due to complexity issues and time-demands. In contrast, GenAI tools offer immediate, user-friendly access to linguistic data, yet raise concerns about authenticity and reliability. This study compares pre-service teachers’ use of corpora and GenAI in pedagogically oriented language analysis, lesson planning, and materials development. Conducted within a graduate-level course, the study examines student teachers’ approaches to corpus-based and AI-based lesson design, focusing on their ability to retrieve and analyse linguistic data, plan lessons, create learning materials, and reflect on the effectiveness of these tools. Findings indicate the considerable potential of both corpora and GenAI for supporting data-informed, inductive approaches to language learning and teaching. Yet, the results also reveal that while pre-service teachers demonstrated operational proficiency in using both tools, they struggled to extract meaningful linguistic insights and integrate their findings into cohesive pedagogical frameworks. The study highlights the need for targeted training to develop teachers’ analytical and pedagogical skills in working with both types of resources. Ultimately, it argues that rather than replacing corpora, GenAI should complement data-driven learning, reinforcing the importance of linguistic accuracy and pedagogical soundness in technology-enhanced language teaching.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100168"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constructions of ‘sound’ in scientific discourses about cochlear implants","authors":"Emily Kecman , Stephanie Lloyd , Isabelle Boisvert","doi":"10.1016/j.acorp.2025.100183","DOIUrl":"10.1016/j.acorp.2025.100183","url":null,"abstract":"<div><div>The linguistic resources employed to discuss sensory experiences and phenomena can vary considerably between different cultural, disciplinary and socio-political contexts. Whilst questions about the discourses of sound have long been explored in some fields, within the field of cochlear implant research, such questions have received limited attention. This article draws together literature from diverse fields, highlighting the various complexities inherent in talking about “sound” in different contexts. The results of a collocation analysis of “sound” within the CIRCorpus - (a purpose-built 3-million-word corpus comprised of scientific research articles about cochlear implants published between 1960 and 2024) are then reported. The collocation analysis highlights a discursive environment in which sound is predominantly framed within a language of <em>testing</em> and <em>abilit</em>y, suggesting that discussions of sound within CI research have become distinctly psychologized and increasingly technicalized and homogenized over time. The implications of these patterns for informing future CI research agendas are discussed.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100183"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging large language models to supplement corpus-based inductive learning of Chinese as a second language","authors":"Tiffany Tsz-Yin Pang","doi":"10.1016/j.acorp.2025.100170","DOIUrl":"10.1016/j.acorp.2025.100170","url":null,"abstract":"<div><div>Corpus tools have proven effective for supporting inductive language learning by enabling learners to observe multiple examples, form hypotheses, and verify the hypotheses based on additional examples. However, when applied to Chinese as a Second Language (CSL), these tools encounter limitations that disrupt the observe-hypothesize-verify process. Sketch Engine, for example, misanalyzes Chinese word boundaries, topicalized objects, and <em>ba</em>-constructions, and provides inaccurate observational data that undermines the effectiveness of inductive learning. This paper proposes integrating Large Language Models (LLMs) with corpus tools to address the limitations. Using Sketch Engine and Claude Opus 4 as exemplars, I demonstrate how LLMs serve three pedagogical functions: (1) error detection to identify misanalyzed features in corpus outputs, (2) guided pattern discovery to help learners recognize linguistic regularities across examples, and (3) hypothesis verification to confirm/refine learners’ observations. Through analysis of specific Chinese features, I show how LLM integration maintains the discovery processes while ensuring accurate linguistic input for the learners. The proposed corpus-LLM integration represents an advancement in leveraging AI for language pedagogy. The paper concludes with future research directions for optimizing this integration in CSL acquisition, and emphasizes the need to balance technological innovation with pedagogical principles.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100170"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Register alignment of ChatGPT-generated academic texts","authors":"Nur Yağmur Demir, Jesse Egbert","doi":"10.1016/j.acorp.2025.100174","DOIUrl":"10.1016/j.acorp.2025.100174","url":null,"abstract":"<div><div>The rise of Artificial Intelligence (AI) tools such as ChatGPT has transformed language pedagogy and assessment. Despite their growing use in academic contexts—from classroom materials to standardized testing—questions remain about the register appropriateness of the texts they produce.</div><div>The humanlikeness of AI language must be defined not only by fluency or coherence, but by register appropriateness—functional language use that aligns with the situational characteristics of registers. This study investigates whether ChatGPT-generated academic texts mimic human-authored writing in two academic genres (journal articles and textbooks) across two disciplines (biology and history).</div><div>Using multi-dimensional analysis, we analyzed 200 texts (100 AI-generated and 100 human-authored) along three linguistic dimensions: (1) specialized information density vs. non-technical synthesis, (2) definition/evaluation of new concepts, and (3) author-centered stance. Our results reveal a mixed picture: while ChatGPT exhibits moderate success in mimicking register distinctions found in journal article registers, its performance is notably less aligned with textbooks. ChatGPT-generated textbook excerpts in biology, for instance, often resemble the dense, technical style of journal articles, as a result failing to match the simplified, pedagogically oriented discourse found in human-authored textbooks.</div><div>Our findings indicate that while ChatGPT can largely reproduce human-like register patterns in journal article writing, it struggles to achieve the same in textbook contexts, particularly within biology. Overall, the results suggest that ChatGPT-generated texts often lack sufficient functional appropriateness. We therefore recommend further quantitative linguistic analyses of AI-generated language and urge caution when using ChatGPT for content creation.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100174"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comparative analysis of AI-generated texts, corpus data, and speaker judgments: Subject honorification patterns in Korean","authors":"Yejin Jung , Kathy MinHye Kim","doi":"10.1016/j.acorp.2025.100171","DOIUrl":"10.1016/j.acorp.2025.100171","url":null,"abstract":"<div><div>Technological innovations can greatly enhance second language (L2) pragmatics instruction by providing learners with more natural and authentic communication opportunities. As Generative Artificial Intelligence (GenAI) tools become increasingly integrated into L2 teaching, questions arise as to whether they provide pedagogically appropriate input and how they can be used for inductive instruction (e.g., Data-driven Learning). To advance meaningful instructional approaches to Korean honorifics, understanding the nature of input is key; particularly, what exemplars of honorifics are available through GenAI and spoken corpora and how L2 learners perceive and evaluate different honorific forms. In response to these inquiries, we analyzed patterns of subject-verb honorific agreement in outputs from <em>ChatGPT 4.0</em> and the NIKL Korean Dialogue Summarization Corpus (Study 1), and conducted an acceptability judgment test of four subject-verb honorific (mis)match forms (Study 2). We found that ChatGPT predominantly favored a subject-verb matched form, whereas corpus data reflected the highly complex, context-dependent use and variations of honorifics. L1 judgments aligned more closely with the corpus results, reflecting sensitivity to nuanced (mis)match forms, whereas L2 judgments closely mirrored ChatGPT’s patterns, lacking sensitivity beyond the matched forms. These results underscore the challenges associated with Korean honorification for both learners and educators, highlighting the need for more refined inductive teaching.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100171"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linguistic stratification in academic publishing: A corpus-based analysis of lexicogrammatical variation across journal tiers","authors":"Ezra Alexander","doi":"10.1016/j.acorp.2026.100190","DOIUrl":"10.1016/j.acorp.2026.100190","url":null,"abstract":"<div><div>This study examines linguistic stratification in academic publishing through corpus analysis of lexicogrammatical variation between high-tier and low-tier scientific journals. Using a specialized corpus of 2.3 million words from biochemistry, cell biology, and genetics publications, the research employs contrastive intralingual analysis to investigate how journal prestige influences language choices. Through key bundle analysis and examination of multiword units, the study reveals systematic differences in passive voice usage, tense selection, modal constructions, and lexical choices between journal tiers. High-tier journals demonstrate greater use of present tense constructions, specific vocabulary, and confident assertions, while low-tier journals show preference for past tense passives, generic verbs, and tentative modal expressions. The findings indicate that journal tier creates distinct linguistic expectations that reflect confidence versus tentativeness in academic writing. These patterns suggest that publication contexts systematically influence lexicogrammatical choices, with implications for how journal prestige shapes acceptable academic discourse and may create differential barriers for scholars navigating research publication in English.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100190"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On corpus linguistics, the search for meaning, and a transversal–pluriversal turn in celebrating learner languaging","authors":"Meng Huat Chau","doi":"10.1016/j.acorp.2025.100172","DOIUrl":"10.1016/j.acorp.2025.100172","url":null,"abstract":"<div><div>This article revisits key insights from corpus linguistics, such as units of meaning and pattern grammar, in dialogue with cognitive linguistic understandings of form–meaning pairings, showing how meaning arises from patterned, contextualized, and emergent use rather than from isolated words. Foregrounding the language learner as a fully legitimate meaning maker alongside expert and other language users, it advances <em>communicative meaningfulness</em> as an ecological model grounded in relational resonance rather than formal accuracy or communicative effectiveness. Drawing on longitudinal corpus evidence from school students, the article demonstrates how learners rework patterned resources to express stance, negotiate values, and enact situated identities, revealing their languaging as meaning-in-motion. It further articulates a transversal–pluriversal turn in applied linguistics: <em>transversal</em> in its crossings of disciplinary, cultural, and linguistic boundaries; <em>pluriversal</em> in its affirmation of diverse epistemologies and ways of knowing. The article concludes that learners’ meaning making contributes to a living relational ecology of communication, positioning the study of corpora, learner languaging, and language as a whole as co-created, evolving, and interrelated resources. Such an orientation not only guides more inclusive, humane, and epistemically diverse practices in corpus linguistics and applied linguistics; it also, importantly, deepens and expands our shared human capacity for understanding and connection.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100172"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abeer Z. Al-Marridi , Samawiyah M. Ulde , Ahmed Bensaid , Tariq A. Khwaileh
{"title":"Speech and Language Disorders: A systematic review of corpora and future directions","authors":"Abeer Z. Al-Marridi , Samawiyah M. Ulde , Ahmed Bensaid , Tariq A. Khwaileh","doi":"10.1016/j.acorp.2025.100186","DOIUrl":"10.1016/j.acorp.2025.100186","url":null,"abstract":"<div><div>Speech and Language Disorders (SLDs) significantly impact social interaction, communication, and educational outcomes, making them a global health priority. According to data published by Komodo Health, speech disorder diagnoses among children aged 0–12 increased by 110% in 2022, reaching 1.2 million cases compared to the pre-pandemic average of 570,000. Addressing this growing challenge requires empowering the research community with diverse and comprehensive corpora to drive investigations and develop innovative tools. This paper systematically reviews existing SLD corpora, evaluating their relevance to research and technological innovation. The corpora are categorized based on target population, language, data modality, and task domain. Thirteen SLDs are explored, including neurological language breakdown, motor speech disorders, child language impairments, and communication challenges in autism spectrum disorder. The review identifies key research directions in the field of SLD and highlights critical gaps and challenges using statistical insights drawn from the analyzed search. Emerging trends such as multimodal data integration and artificial intelligence applications for advanced data analysis are emphasized. The review concludes with recommendations for enhancing the utility and accessibility of SLD corpora, underscoring the importance of interdisciplinary collaboration and community engagement to address existing limitations. This review serves as a valuable resource for clinicians and researchers, guiding them in selecting the most suitable database/corpora to address their clinical and investigative needs while advancing the field of SLD research and innovation.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100186"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The grammar-lexicon trade-off in lexicography: Corpus-based categorization of STONE’s modifying uses","authors":"Qin Luo, Renqiang Wang","doi":"10.1016/j.acorp.2026.100193","DOIUrl":"10.1016/j.acorp.2026.100193","url":null,"abstract":"<div><div>The categorization of words like <em>stone</em> in modifying constructions (e.g., <em>stone wall, stone deaf</em>) poses a persistent problem for both linguistic theory and lexicography. Is <em>stone</em> a noun, an adjective, or an adverb in these contexts, and which uses are established enough to merit dictionary inclusion? Both dictionaries and theoretical accounts diverge, often depending more on intuition than on evidence. The study draws on data from the Corpus of Contemporary American English, analyzed through Corpus Pattern Analysis and the Two-Level Lexical Categorization Theory. Thirteen patterns are identified, eight of which involve shifts from noun to adjective or adverb, accompanied by semantic extensions from entity-denoting senses to attributive or intensifying meanings. The results show that <em>stone</em> is conventionalized in two adjective patterns (e.g., <em>stone</em> in <em>stone wall</em> and <em>stone face</em>), one adverbial pattern (e.g., <em>stone</em> in <em>stone deaf</em>), and one bound form (e.g., <em>stone</em> in <em>stone cold sober</em>), meriting lexicographic recognition. These findings demonstrate how corpus methods can resolve long-standing categorization issues and provide an evidence-based basis for more consistent dictionary representation, while also highlighting the fluid boundary between grammar and lexicon.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100193"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146187362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}