{"title":"Is LIWC reliable, efficient, and effective for the analysis of large online datasets in forensic and security contexts?","authors":"Madison Hunter, Tim Grant","doi":"10.1016/j.acorp.2025.100118","DOIUrl":"10.1016/j.acorp.2025.100118","url":null,"abstract":"<div><div>This article evaluates the reliability, efficiency, and effectiveness of Linguistic Inquiry and Word Count (LIWC; Boyd et al., 2022) for the analysis of a white nationalist forum. This is important because LIWC has been the computational tool of choice for scores of studies generally and many examining extremist content in a forensic or security context. Our purpose, therefore, is to understand whether LIWC can be depended upon for large-scale analyses; we initially examine this here using a small sample of posts from a set of just eight users and manually checking the program's automated codings of a subset of categories. Our results show that the LIWC coding cannot be relied upon – precision falls to as low as 49.6 % and recall as low as 41.7 % for some categories. It would be possible to engage in considerable manual correction of these results, but this undermines its purported efficiency for large datasets.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 1","pages":"Article 100118"},"PeriodicalIF":0.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143159794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introductory editorial synthesis paper: Corpus linguistics and the language of COVID-19: Applications and outcomes","authors":"David Oakey , Benet Vincent","doi":"10.1016/j.acorp.2024.100110","DOIUrl":"10.1016/j.acorp.2024.100110","url":null,"abstract":"<div><div>This article provides an overview of the papers in the special issue of Applied Corpus Linguistics on “Corpus Linguistics and the Language of COVID-19: Applications and Outcomes”. As noted in our original call for contributions, we believe that, while traditional corpus linguistic work can reveal valuable insights into the emerging language around COVID-19, it should be complemented by more applied corpus linguistics work. The pandemic posed a real-world problem which applied corpus linguists were well equipped to address using linguistic evidence from a range of sources. This article presents an introduction to the papers in this special issue which will be of interest to applied corpus linguists due to the variety of perspectives they present in relation to a number of key issues of importance to the field: the data they draw on, the various theoretical frameworks which inform the research, the methods they use to collect and analyse the data, and the discussion of how their findings may be applicable to citizens, decision makers, consumers and other stakeholders in public and private contexts.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100110"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143098309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corrigendum to “here-, there-, and every where-: Exploring the role of pronominal adverbs in legal language” [Applied Corpus Linguistics Volume 4, Issue 1 (2024) 100087]","authors":"David Chandler, Brett Hashimoto","doi":"10.1016/j.acorp.2024.100112","DOIUrl":"10.1016/j.acorp.2024.100112","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100112"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143098310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lexical complexity in academic lectures: Comparative analysis of EMI and Non-EMI settings and influential factors","authors":"Chen Chen , Philip Durrant","doi":"10.1016/j.acorp.2024.100115","DOIUrl":"10.1016/j.acorp.2024.100115","url":null,"abstract":"<div><div>Despite the substantial body of research on vocabulary in English Medium Instruction (EMI), there is a noticeable dearth of corpus-based studies examining lexical complexity of EMI lectures, particularly in specific disciplines. To fill this gap, this study developed an EMI spoken academic corpus in Business (EMIB) with 120 lectures collected from 54 lecturers with nine different first languages (L1), reaching 1.12 million tokens. The study compared the lexical complexity of EMI Business lectures in China with academic lectures in Anglophone and non-Anglophone settings, represented by teachers’ speech in the British Academic Spoken English Corpus (BASE) and the Corpus of English as a Lingua Franca in Academic Settings (ELFA), respectively. Lexical complexity was conceptualised by lexical sophistication (operationalised by lexical frequency profile and mean frequency band score) and lexical diversity (operationalised by the VOCD-D). Results show that ELFA has significantly higher lexical sophistication than BASE, and significantly lower lexical diversity than BASE and EMIB. This study further explored whether speaker L1, speaker gender, and discipline contributed to the lexical complexity of lectures using multiple linear regression with interaction terms. Results show that speaker L1 and discipline significantly impacted the lexical complexity of lectures. Pedagogical implications are discussed.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100115"},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142701201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Getting into bed with embeddings? A comparison of collocations and word embeddings for corpus-assisted discourse analysis","authors":"Jordan Batchelor","doi":"10.1016/j.acorp.2024.100117","DOIUrl":"10.1016/j.acorp.2024.100117","url":null,"abstract":"<div><div>This paper discusses two approaches for identifying lexical patterns in discourse, namely the corpus linguistic method of collocation analysis and the natural language processing method of word embeddings. While both approaches can identify lexical patterns, they approach the task with different underlying frameworks, and the extent to which their results resemble one another has not been directly compared. This study uses two corpora, five collocation measures, and two word embedding algorithms to generate such comparisons. Results generally support the notion that many word pairs with similar embeddings are collocates, and that, to a lesser extent, many collocates have similar word embeddings. However, a major difference is that word pairs with similar embeddings do not need to co-occur often, or at all. Moreover, systematic differences in the kinds of words highlighted between the two word embedding algorithms were found and are discussed.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100117"},"PeriodicalIF":0.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Examining in-service senior high school English teachers’ perspectives on corpus use and the effects of corpus training","authors":"Hsiao-Ling Hsu , Shu-Li Lai , Hao-Jan Howard Chen","doi":"10.1016/j.acorp.2024.100116","DOIUrl":"10.1016/j.acorp.2024.100116","url":null,"abstract":"<div><div>Given the benefits of incorporating corpora into language learning, particularly in developing students’ abilities to observe and analyze language data, this study investigated Taiwanese in-service senior high school English teachers’ corpus literacy, their application of corpus tools in teaching, and the effects of an online corpus workshop. Conducted in two stages, the first involved collecting 151 teachers’ perceptions of corpus literacy and its applications from 141 schools across Taiwan. The second stage invited teachers across Taiwan to participate in an online corpus workshop, where corpus-based teaching and two tools (SKELL and Sketch Engine) were introduced, along with hands-on activities. Following the workshop, the participants completed a post-survey. The analysis of the pre-survey responses revealed a positive attitude toward but limited understanding of corpus use among teachers before attending the workshop. The Wilcoxon Signed Rank test, used to analyze the pre- and post-survey responses, showed significant improvements in the teachers’ corpus literacy and application skills after the workshop. The findings of this study offer valuable insights into corpus use among in-service teachers in various contexts. Future research should explore the further integration of corpus tools into classrooms and include in-depth interviews for more comprehensive insights.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100116"},"PeriodicalIF":0.0,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conventionalized phrases and disability policy: A corpus analysis of 2-year and 4-year public colleges in California","authors":"Stephen Eyman","doi":"10.1016/j.acorp.2024.100113","DOIUrl":"10.1016/j.acorp.2024.100113","url":null,"abstract":"<div><div>This corpus-based study analyzes the use of conventionalized phrases in disability policy. Specifically, it focuses on the three phrases made common by the Americans with Disabilities Act: qualified individual with a disability, reasonable accommodations, and interactive process. These three phrases are analyzed in the context of disability policy at 2-year and 4-year public colleges in California. A corpus of disability policies was created for each of these contexts and analyzed to better understand the varied implementation of conventionalized phrases across contexts. The study finds that the three phrases from the ADA have been diffused across higher education disability policies in the corpora created and are highly conventionalized in these contexts. Additionally, these phrases can be used with slightly different valences depending on the context. These differences in use appear to be directly related to the relationship between the three phrases themselves and they mirror debates in disability policy such as that around the modal ‘may’ in relation to whether or not an institution implements an interactive process. Furthermore, institutional differences in the implementation of these phrases is potentially related to the stances institutions take towards disability and disability policy.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100113"},"PeriodicalIF":0.0,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The effects of teacher, peer and self-feedback on error correction with corpus use","authors":"Yoshiho Satake","doi":"10.1016/j.acorp.2024.100114","DOIUrl":"10.1016/j.acorp.2024.100114","url":null,"abstract":"<div><div>The strengths of corpora in language learning have been stated, while not many studies have explored the effects of feedback on error correction in the settings of data-driven learning (DDL), which is an approach where learners use corpora to learn language patterns inductively. Therefore, this study examines the effects of feedback on second language (L2) error correction with corpus use. The author hypothesizes that seeing many example sentences of the target word(s) with corpus use is useful in correcting L2 errors and that different sources of feedback have different effects on error correction. To test the hypotheses, the effects of teacher feedback on 55 participants’ error correction with use of the Corpus of Contemporary American English (COCA) were compared with those of peer feedback along with those of self-feedback. The results show that teacher feedback especially worked well for correcting omission errors and agreement errors. The strength of teacher feedback was identifying correctable errors. The results suggest that efficient corpus use for error correction requires teachers to consider appropriate combinations of feedback and error types (e.g., teacher feedback for omission errors and agreement errors).</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100114"},"PeriodicalIF":0.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142586523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating the visual content of a commercialized academic listening test: Implications for validity","authors":"Zhuohan Hou , Vahid Aryadoust , Azrifah Zakaria","doi":"10.1016/j.acorp.2024.100109","DOIUrl":"10.1016/j.acorp.2024.100109","url":null,"abstract":"<div><div>As incorporating visual modes in listening tests is gradually gaining traction in second language (L2) assessment, the inclusion of such visuals brings up questions about the role of visual modes in meaning-making during listening and test validity. In this study, we investigated the visual features of the International English Language Testing System (IELTS) listening test through the application of the social semiotic multimodal framework. Our corpus comprised 300 visuals from 256 academic listening testlets published between 1996 and 2022. Unlike the past studies of social semiotic multimodal analyses that relied on qualitative methods, our study adopted a series of visualization and quantitative statistical analysis of frequency and dispersion measures, using the general linear model to examine the visuals from a social semiotic multimodal perspective. The results revealed significant variation in the visual structures of the testlets. Through applying a post-hoc analysis, we further proposed recommendations for further research on multimodal materials in listening assessment and discussed the implications of the observed variation for the validity of the IELTS listening test. This study may be considered the first attempt to examine L2 listening assessment from a corpus-based social semiotic multimodal perspective, which may inspire more investigations on multimodal listening.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100109"},"PeriodicalIF":0.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corpus linguistics will benefit from greater adoption of pre-registration: A novice-friendly split-corpus approach to pre-registration","authors":"Matthew H.C. Mak","doi":"10.1016/j.acorp.2024.100111","DOIUrl":"10.1016/j.acorp.2024.100111","url":null,"abstract":"<div><div>In this brief article, I contend that the field of corpus linguistics stands to gain significantly from an increased adoption of pre-registration. Pre-registration serves to constrain the almost infinite degree of analytic freedom inherent in corpus analysis, thereby enhancing the transparency, reliability, and potential impact of corpus research. While pre-registration is increasingly popular in fields such as psychology and medicine, its uptake in corpus linguistics remains notably limited. To facilitate the transition toward pre-registration, I describe a straightforward split-corpus approach, ideally suited for corpus linguists new to pre-registration and for both hypothesis-testing and exploratory research. This method involves dividing a corpus into an exploratory set (20–40 % of the corpus) and a confirmatory set (the remaining 60–80 %). The exploratory set allows researchers to freely generate hypotheses and develop analysis plans, while the confirmatory set is then used for a more structured and objective analysis according to the pre-specified protocols. By employing this approach, corpus linguists can effectively balance exploratory flexibility with the rigour of confirmatory analysis, boosting the reliability of corpus findings. An increased uptake of pre-registration may not only bolster recognition of corpus linguistics as a robust empirical field, but it may also encourage a stronger emphasis on the building of cumulative knowledge.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100111"},"PeriodicalIF":0.0,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}