{"title":"Each of them is one of a kind: A corpus-based study on two type-noun morphemes in spoken Mandarin","authors":"Chen-Yu Chester Hsieh","doi":"10.1016/j.acorp.2025.100185","DOIUrl":"10.1016/j.acorp.2025.100185","url":null,"abstract":"<div><div>This article presents a corpus-based analysis of how two near-synonymous type nouns (TNs) in Mandarin, <em>zhǒng</em> and <em>lèi</em>, diverge in their distributional patterns and interactional functions in spoken discourse. Using quantitative collocational profiling and qualitative analysis informed by Interactional Linguistics, the study examines 968 instances of <em>zhǒng</em> and 179 instances of <em>lèi</em> in the NCCU Corpus of Spoken Mandarin. The findings show that <em>zhǒng</em> forms a broad set of prefabricated expressions, each favoring particular lexico-grammatical constructions and serving evaluative, referential, and turn-projecting functions. In contrast, <em>lèi</em>, most prominently in the form <em>zhīlèi</em>, is more restricted in distribution, occurs predominantly in utterance-final position, and indexes uncertainty and turn completion. These results demonstrate that even near-synonymous TN morphemes differentiate in systematic ways shaped by linguistic form, sequential context, and interactional needs. The study contributes to research on TNs, classifier systems, and pragmatic markers in Mandarin, while offering implications for cross-linguistic comparison and Chinese language pedagogy.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100185"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constructing China’s national image through political discourse: A corpus-based diachronic analysis of government work reports (2001–2025)","authors":"Liai Ma, Peter Crosthwaite","doi":"10.1016/j.acorp.2025.100179","DOIUrl":"10.1016/j.acorp.2025.100179","url":null,"abstract":"<div><div>National image is one of the elements of a country’s soft power, and China’s Government Work Reports (GWRs) serve a critical function in shaping this national image, as the state constructs and communicates its political and economic narrative to both domestic and international audiences. This study addresses gaps in previous research by combining both quantitative and qualitative approaches to the analysis of China’s national image, focusing on self<strong>-</strong>representation and the other<strong>-</strong>perspective. Specifically, it examines the English editions of 25 GWRs (2001–2025) using Corpus-Assisted Discourse Studies (CADS), tracing keywords and their collocates over time and interpreting these patterns within the context of national image construction. Findings reveal a clear “global integration–domestic stabilization–global engagement” trajectory. During the global integration phase (2001–2010) terms including “World Trade Organization”, “opening up”, and “rapid growth” dominate, underscore China’s integration into the global economy. The domestic stabilization phase (2011–2015) foregrounds “structural adjustment”, narrowing “the rural–urban gap”, and “social harmony”, reflecting China’s efforts to manage internal social imbalances while maintaining stability. In the global engagement phase (2016–2025), phrases including “high-quality development”, “Belt and Road Initiative”, and “Chinese Path” signal China’s transformation from rule-taker to solution provider. Overall, China’s national image in its GWRs has transformed from a newcomer focused on speed, to that of a responsible leader setting global standards. The study offers a model case of applying CADS to the GWRs and provides a comprehensive account of how China’s national image has been constructed and repositioned in international communication.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100179"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CorGeS: The corpus of German suicide notes","authors":"Dana Roemling , Lucia Busso","doi":"10.1016/j.acorp.2025.100177","DOIUrl":"10.1016/j.acorp.2025.100177","url":null,"abstract":"<div><div>This paper introduces <em>CorGeS</em>, a historic corpus of authentic German suicide notes written between the 1910s and 1930s. Originally compiled and transcribed by a police officer, the corpus offers a rare and valuable resource for both linguistic and historical inquiry. We describe the provenance and structure of the corpus, as well as the methodological and ethical considerations involved in working with such sensitive material. While suicide note analysis is well established in English-language research, German-language material remains understudied, making <em>CorGeS</em> an important contribution to multilingual and cross-cultural perspectives in suicide note analysis. To illustrate the potential of the corpus, we present a preliminary topic modelling analysis, highlighting key thematic patterns in the texts, before using corpus methods to explore the most prevalent item in the corpus in more detail. These early results demonstrate the diversity and emotional complexity of the notes and suggest several avenues for further research at the intersection of linguistics, history, and suicide note analysis.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100177"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SCILIC and SCAWL: Developing a smart cities corpus and academic word list","authors":"Abdulaziz B Sanosi","doi":"10.1016/j.acorp.2026.100191","DOIUrl":"10.1016/j.acorp.2026.100191","url":null,"abstract":"<div><div>The rapid growth of smart city initiatives over the past two decades has led to a surge in research and practical applications. However, it has also resulted in significant terminological fragmentation across academic discourse, educational practices, and urban policy frameworks, posing challenges to achieving the educational and urban development targets outlined in the United Nations’ SDG 4: Quality Education and SDG 11: Sustainable Cities and Communities. To address this gap, the present study aims to develop and validate the Smart CIties LIterature Corpus (SCILIC) and to generate a word list from it through systematic corpus linguistic analysis. The corpus comprises 3.6 million tokens sourced from two primary domains: peer-reviewed articles indexed in Scopus and Web of Science (2015–2025) and technical reports from the UN<img>Habitat digital repository (2010–2025). Utilizing #LancsBox and complementary analytical tools, the study compiled a balanced and representative corpus and generated the Smart Cities Academic Word List (SCAWL), comprising 550-word families and 667 individual words. Quantitative analysis indicates that SCAWL accounts for 7.8% of the total corpus tokens. The findings underline the multidisciplinary nature of smart city vocabulary and highlight the importance of integrating both academic and policy-oriented sources. By supporting the development of targeted educational resources and promoting clearer conceptual understanding, this research contributes directly to the advancement of SDG 4 and SDG 11, fostering both educational quality and sustainable urban development.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100191"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Language of Hate and Offense: Understanding Linguistic Variation in Turkish Tweets","authors":"Hülya Mısır , Jack Grieve","doi":"10.1016/j.acorp.2026.100192","DOIUrl":"10.1016/j.acorp.2026.100192","url":null,"abstract":"<div><div>Social media has become a major site where harmful discourse proliferates. Computational detection of such discourse has advanced, yet the linguistic patterns underlying such expressions remain understudied, particularly in morphologically rich languages with extensive grammatical marking. This study examines linguistic patterns in Turkish hate speech and offensive language, comparing them to non-hateful discourse to understand how these types of posts vary in their linguistic and stylistic features. Through a short-text multidimensional analysis (STMDA) of a large-scale Twitter corpus, we identified key dimensions based on non-harmful language online: (1) <em>text length as a confounding variable,</em> (2) <em>conversational vs. informational,</em> (3) <em>temporality,</em> (4) <em>directive vs. introspective speech, and</em> (5) <em>in-group vs. out-group orientation.</em> We then compared the subsets of tweets based on these emergent dimensions and found that hate and offensive speech diverge from normal discourse. The analysis demonstrates that both hate and offensive speech rely on interactional, conversational styles rather than informational discourse. Hate speech tends to exhibit more directive and mobilizing language and stronger in-group/out-group orientation, while offensive language displays a comparatively more introspective and expressive style. Our results suggest functional linguistic variation across normal, hate and offensive speech and position STMDA as a complementary, top-down framework that identifies higher-level functional and stylistic dimensions within which such speech styles operate.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100192"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146187361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xina Jin, Rachael Ruegg, Stephen Skalicky, Averil Coxhead
{"title":"Vocabulary in academic texts across disciplinary fields","authors":"Xina Jin, Rachael Ruegg, Stephen Skalicky, Averil Coxhead","doi":"10.1016/j.acorp.2026.100189","DOIUrl":"10.1016/j.acorp.2026.100189","url":null,"abstract":"<div><div>This article reports on a corpus-based study examining vocabulary of texts used as reading materials in two academic fields at a higher education institution in New Zealand: Computer Science (CS) and Media Design (MD). It first presents the vocabulary profiles of academic texts from both fields using Nation’s (2020) British National Corpus/Corpus of Contemporary American English (BNC/COCA) word frequency lists. Then, it outlines the vocabulary load required to comprehend different types of academic texts within the two corpora. The results indicate that while CS texts contain a wide range of mathematical, statistical, and programming-related lexical items, MD texts include a considerable number of proper nouns, such as the names of brands, companies, designers, and locations, as well as many non-English words. In terms of vocabulary demand, reading CS texts requires knowledge of 4,000 to 6,000 word families to reach 95 % to 98 % lexical coverage. In contrast, MD texts require knowledge of up to 8,000 word families for optimal comprehension across various text types. Interestingly, journal articles in both corpora show lower lexical demands than other types of texts, such as book chapters, textbooks, and materials sourced from online platforms (e.g., magazines, newspapers). The findings suggest that lexical demands vary when handling reading materials across different disciplinary areas in higher education, and provide insights into the extent of vocabulary knowledge needed to understand and learn different subject content through texts.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100189"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing, compiling and profiling the Corpus of Arts and Humanities Academic Texts (CAHAT): A new resource for English for Specific Academic Purposes (ESAP)","authors":"James O’Flynn","doi":"10.1016/j.acorp.2025.100178","DOIUrl":"10.1016/j.acorp.2025.100178","url":null,"abstract":"<div><div>English for Academic Purposes (EAP) is broadly concerned with the use of English to perform academic tasks. Many corpus studies, though, have shown that the language used to perform academic tasks varies widely across the disciplines. Accordingly, EAP has come to be viewed as a continuum, with English for General Academic Purposes (EGAP) at one end and English for Specific Academic Purposes (ESAP) at the other. There is ever-growing interest in disciplinary corpus research in EAP, or ESAP research, but corpora to support it remain limited in availability and/or size. This paper therefore describes the development of a large and available ESAP corpus and then introduces it as the Corpus of Arts and Humanities Academic Texts (CAHAT). This c.25-million token (word) corpus of 288 PhD theses collected from three UK universities is organised into six disciplinary subcorpora and enriched with detailed text-external and text-internal metadata. The CAHAT is available on request for non-commercial ESAP research and pedagogy. The full corpus is available through Sketch Engine, while a miniature version of the corpus (c. 6.5 million tokens) is available via a free, purpose-made, user-friendly concordancing tool. The paper concludes by proposing potential applications of the CAHAT in ESAP research and pedagogy.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100178"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Upgraded literacy: teacher training approaches to integrating corpus data and AI tools for school text readability adaptation","authors":"Madalina Chitez, Karla Csürös, Roxana Rogobete","doi":"10.1016/j.acorp.2025.100181","DOIUrl":"10.1016/j.acorp.2025.100181","url":null,"abstract":"<div><div>This study examines how an inductive learning approach can foster e-literacy, defined as the ability to critically and effectively use digital and AI tools to support literacy. It presents the outcomes of a teacher training program carried out in Romania within a national professional development initiative, involving 56 in-service teachers across primary, lower secondary, and upper secondary levels. The training combined theoretical input with hands-on activities, introducing participants to corpus-based readability analysis and AI platforms for text adaptation. Teachers worked with tools such as LEMI, Text Inspector, ARTE, ChatGPT, and Perplexity. The corpus-based linguistic analysis indicates that teachers most often addressed challenges of vocabulary complexity and cognitive load. Participants used readability and AI tools to simplify syntactic structures, reformulate dense passages, and adapt discourse to students’ linguistic proficiency levels. Reflections further indicated that teachers came to view literacy-related digital and AI platforms in complementary roles: as simplifiers that made texts more accessible, as co-designers that supported creativity in instructional planning, and as validators that confirmed their professional judgment. The training strengthened the teachers’ digital pedagogical awareness and metalinguistic insight, positioning e-literacy as a key competence within an updated literacy paradigm capable of supporting inclusive, level-appropriate education.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100181"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing learners’ academic writing skills: a comparative analysis of traditional and AI-assisted instruction approaches","authors":"Kholida Begmatova, Iroda Saydazimova","doi":"10.1016/j.acorp.2025.100176","DOIUrl":"10.1016/j.acorp.2025.100176","url":null,"abstract":"<div><div>The study aimed to explore the impact of two instructional approaches – traditional and AI-assisted, in teaching academic writing to year-1 EAP students in an EMI university in Uzbekistan. Control group students (<em>n</em> = 75) learned to write a literature review within a traditional approach, developing a matrix of sources to organize and synthesize findings and receive lecturers’ constructive feedback. An inductive approach was adopted in the instruction with the treatment group (<em>n</em> = 78), where students wrote their papers integrating AI chatbots at several stages, from narrowing the topic scope to responding to lecturers’ language-related instructive feedback on drafts using AI tools to introduce corrections independently. A Learner Corpus comprised two subsets of texts produced in two instructional approaches, with two entries per subset, totaling 306 literature reviews. These texts were used for data analysis, which was performed with Coh-Metrix. This analysis involved the evaluation of linguistic properties and the overall writing quality of student papers produced within the two instructional methods. Also, thematic linguistic analysis was conducted to evaluate the academic features of the texts (<em>n</em> = 20). Our findings revealed that students have demonstrated comparable readability levels, syntactic/grammatical range, sophistication levels, and cohesion across ideas in their writing. The papers produced within an AI-assisted approach had a higher semantic complexity, referential cohesion, and lexical diversity. Thematic linguistic analysis revealed three key areas in the academic features of students’ papers, including the use of referencing conventions, integration of cohesive devices, and demonstration of argumentation and critical analysis.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100176"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topic-Specific corpus compilation: A componential approach to query formulation","authors":"Daniel Malone","doi":"10.1016/j.acorp.2025.100180","DOIUrl":"10.1016/j.acorp.2025.100180","url":null,"abstract":"<div><div>This paper presents a methodological approach to topic-specific corpus compilation when retrieving texts from databases such as news archives or document repositories. When search terms exhibit flexible or context-dependent meanings, a high proportion of returned texts may be unrelated to the intended target concept, increasing processing workload and risking distortion of corpus-analysis results. In addressing this issue, the present paper proposes an approach to query formulation grounded in a componential analysis of the target concept’s meaning which identifies its key semantic attributes. These attributes are operationalised in a complex two-part query, referred to herein as the Dual-Group Query (DGQ). Each query group realises a defining semantic attribute, ensuring that retrieved texts express both components of the target concept. To enable systematic query expansion, the Relative Query Term Relevance method (Gabrielatos, 2007) is procedurally adapted to the DGQ model to evaluate candidate-term relevance prior to inclusion. Evaluation results show that, when applied to the Lone Wolf Corpus, a corpus of British press reporting on lone-actor/lone-wolf terrorism, the approach significantly improved retrieval efficiency (i.e., precision and recall) compared with two non-complex queries. More broadly, the proposed approach offers a replicable framework for corpus compilation in studies concerned with domain-specific topics, fine-grained concepts, or distinct sense relations of particular words.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100180"},"PeriodicalIF":2.1,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}