{"title":"Review of Durrant (2023): Corpus linguistics for writing development","authors":"Joyce Lim","doi":"10.1075/ijcl.00059.lim","DOIUrl":"https://doi.org/10.1075/ijcl.00059.lim","url":null,"abstract":"This article reviews Corpus linguistics for writing development","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessing the potential of LLM-assisted annotation for corpus-based pragmatics and discourse analysis","authors":"Danni Yu, Luyang Li, Hang Su, Matteo Fuoli","doi":"10.1075/ijcl.23087.yu","DOIUrl":"https://doi.org/10.1075/ijcl.23087.yu","url":null,"abstract":"\u0000 Certain forms of linguistic annotation, like part of speech and semantic tagging, can be automated with high\u0000 accuracy. However, manual annotation is still necessary for complex pragmatic and discursive features that lack a direct mapping\u0000 to lexical forms. This manual process is time-consuming and error-prone, limiting the scalability of function-to-form approaches\u0000 in corpus linguistics. To address this, our study explores the possibility of using large language models (LLMs) to automate\u0000 pragma-discursive corpus annotation. We compare GPT-3.5 (the model behind the free-to-use version of ChatGPT), GPT-4 (the model\u0000 underpinning the precise mode of Bing chatbot), and a human coder in annotating apology components in English based on the local\u0000 grammar framework. We find that GPT-4 outperformed GPT-3.5, with accuracy approaching that of a human coder. These results suggest\u0000 that LLMs can be successfully deployed to aid pragma-discursive corpus annotation, making the process more efficient, scalable,\u0000 and accessible.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141272524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Case and agreement variation in contact","authors":"Yi Zhang, Ming Yue","doi":"10.1075/ijcl.22119.zha","DOIUrl":"https://doi.org/10.1075/ijcl.22119.zha","url":null,"abstract":"\u0000This study investigates the influence of language contact on morphosyntactic variation in World Englishes, specifically focusing on the joint variation of case and agreement in it-clefts with pronominal clefted constituents. Employing a multifactorial approach within the framework of probabilistic grammar, we examine the distribution of the four relevant it-cleft variants in the GloWbE corpus. We find that language contact, as a language-external factor, impacts the strengths and rankings of language-internal factors but not their directions. Additionally, we observe an intricate interplay between language contact and language-internal factors in shaping morphosyntactic patterns: low-contact varieties tend to display feature-based case and agreement with a high degree of variability, while high-contact varieties tend to exhibit position-based case and agreement with a low degree of variability. These findings shed light on the mechanisms underlying the development of language diversity and structural simplification in World Englishes.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140658051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A user-friendly corpus tool for disciplinary data-driven learning","authors":"Peter Crosthwaite, V. Baisa","doi":"10.1075/ijcl.23056.cro","DOIUrl":"https://doi.org/10.1075/ijcl.23056.cro","url":null,"abstract":"\u0000 Most corpus tools commonly used for corpus-based data-driven learning (DDL) are designed for research rather than\u0000 teaching purposes, with much DDL research suggesting learners and their teachers often stop DDL after initial training due to\u0000 tool-related issues like complex user interfaces and system settings. Based on feedback from secondary-age language learners and\u0000 their teachers in the Australian context, we present CorpusMate (https://corpusmate.com), a new, user-friendly corpus tool that incorporates several publicly available written and\u0000 spoken corpora across 20 disciplinary subjects. It offers a range of flexible concordancing, n-gram and data visualisation options\u0000 to ensure a fast, smooth and simple DDL experience for end users.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140695858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review of Flach & Hilpert (2022): Broadening the spectrum of corpus linguistics: New approaches to variability and change","authors":"Kristen Fleckenstein","doi":"10.1075/ijcl.00058.fle","DOIUrl":"https://doi.org/10.1075/ijcl.00058.fle","url":null,"abstract":"","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140744589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Down-sampling from hierarchically structured corpus data","authors":"Lukas Sönning","doi":"10.1075/ijcl.23079.son","DOIUrl":"https://doi.org/10.1075/ijcl.23079.son","url":null,"abstract":"\u0000Resource constraints often force researchers to downsize the list of tokens returned by a corpus query. This paper sketches a methodology for down-sampling and offers a survey of current practices. We build on earlier work and extend the evaluation of down-sampling designs to settings where tokens are clustered by text file and lexeme. Our case study deals with third-person present-tense verb inflection in Early Modern English and focuses on five predictors: year, gender, genre, frequency, and phonological context. We evaluate two strategies for selecting 2,000 (out of 11,645) tokens: simple down-sampling, where each hit has the same selection probability; and structured down-sampling, where this probability is inversely proportional to the author- and verb-specific token count. We form 500 subsamples using each scheme and compare regression results to a reference model fit to the full set of cases. We observe that structured down-sampling shows better performance on several evaluation criteria.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140383301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"“People should get their booster”","authors":"H. Zou, Ken Hyland","doi":"10.1075/ijcl.22110.zou","DOIUrl":"https://doi.org/10.1075/ijcl.22110.zou","url":null,"abstract":"\u0000Debates around the efficacy and dangers of vaccination have taken on critical importance with the Covid pandemic and WHO naming vaccine hesitancy as a major global health threat. We explore how writers use two types of blog, academic and journalistic, to promote key public health messages around the effectiveness and necessity of Covid-19 vaccinations to a broad, heterogeneous audience. Examining 120 Covid-19 vaccination themed posts from reputable news and academic blog sites, we compare the different ways writers present a stance and take a position towards vaccines and vaccinations in these different interactional contexts. Findings show that both types of bloggers are clearly aware of the need to convey a stance towards their topic and audiences feel entitled to position themselves in relation to vaccination issues, but with different emphases. The study has important implications for how healthcare information is disseminated and persuasion accomplished in these public arenas of discourse.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139840774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"“People should get their booster”","authors":"H. Zou, Ken Hyland","doi":"10.1075/ijcl.22110.zou","DOIUrl":"https://doi.org/10.1075/ijcl.22110.zou","url":null,"abstract":"\u0000Debates around the efficacy and dangers of vaccination have taken on critical importance with the Covid pandemic and WHO naming vaccine hesitancy as a major global health threat. We explore how writers use two types of blog, academic and journalistic, to promote key public health messages around the effectiveness and necessity of Covid-19 vaccinations to a broad, heterogeneous audience. Examining 120 Covid-19 vaccination themed posts from reputable news and academic blog sites, we compare the different ways writers present a stance and take a position towards vaccines and vaccinations in these different interactional contexts. Findings show that both types of bloggers are clearly aware of the need to convey a stance towards their topic and audiences feel entitled to position themselves in relation to vaccination issues, but with different emphases. The study has important implications for how healthcare information is disseminated and persuasion accomplished in these public arenas of discourse.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139780896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling the locative alternation in Mandarin Chinese","authors":"Mengmin Xu, F. Li, Benedikt Szmrecsanyi","doi":"10.1075/ijcl.22072.xu","DOIUrl":"https://doi.org/10.1075/ijcl.22072.xu","url":null,"abstract":"\u0000 The current study investigates the probabilistic conditioning of the Mandarin locative alternation. We adopt a\u0000 corpus-based multivariate approach to analyze 2,836 observations of locative variants from a large Chinese corpus and annotated\u0000 manually for various language-internal and language-external constraints. Multivariate modeling reveals that the Mandarin locative\u0000 alternation is not only influenced by semantic predictors like affectedness and telicity, but also by previously unexplored\u0000 syntactic and language-external constraints, such as complexity and animacy of locatum and location, accessibility of locatum,\u0000 pronominality, definiteness of location, length ratio and register. Notably, the effects of affectedness, definiteness and\u0000 pronominality are broadly parallel in both the Mandarin locative alternation and its English counterpart. We thus contribute to\u0000 theorizing in corpus-based variationist linguistics by uncovering the probabilistic grammar of the locative alternation in\u0000 Mandarin Chinese, and by identifying the constraints that may be universal across languages.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140488787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review of Dunn (2022): Natural Language Processing for Corpus Linguistics","authors":"Hanna Schmück","doi":"10.1075/ijcl.00057.sch","DOIUrl":"https://doi.org/10.1075/ijcl.00057.sch","url":null,"abstract":"","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138946500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}