{"title":"Hack your corpus analysis: How AI can assist corpus linguists deal with messy social media data","authors":"Michele Zappavigna","doi":"10.1016/j.acorp.2023.100067","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100067","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49774935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Larissa Goulart (Assistant Professor of Linguistics)
{"title":"Book review Vander Viana (2023) teaching English with corpora: A resource book","authors":"Larissa Goulart (Assistant Professor of Linguistics)","doi":"10.1016/j.acorp.2023.100061","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100061","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49816437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The gap between intentions and reality: Reasons for EAP writers’ non-use of corpora","authors":"Maggie Charles","doi":"10.1016/j.acorp.2022.100032","DOIUrl":"10.1016/j.acorp.2022.100032","url":null,"abstract":"<div><p>Over the last three decades, extensive research has been devoted to EAP students’ use of corpora for academic writing. However, corpus use has usually been ascertained immediately post-course; data on long-term use is sparse and little attention has been paid to those who give up using corpora. This study investigates the extent of corpus non-use and students’ reasons for discontinuing the practice in the long term. It draws on data from two questionnaires: (1) immediate post-course (ImmPQ); (2) delayed post-course (DelPQ) completed a year later. Participants were 182 graduates who took a six-week course during which they built and consulted do-it-yourself corpora in their own field. Results from ImmPQ showed that most students (63%) used their corpus regularly (≥ 1/week), but one year later DelPQ revealed that regular use had decreased to 36%. Although 87% of respondents to ImmPQ stated their intention to use their corpus in the future, DelPQ reported a total of 37% of non-users. There were 86 mentions of reasons for non-use; the most prevalent were: not doing any academic writing (29%), the use of other tools (20%), time issues and corpus issues (10% each). It is argued that students’ scarcity of time is a possible underlying cause of much non-use and the study suggests some ways in which long-term corpus take-up could be increased.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 3","pages":"Article 100032"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799122000156/pdfft?md5=f0528a6928b7b2511c7f7f2c8c8f18f7&pid=1-s2.0-S2666799122000156-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41858231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Usable Amharic text corpus for natural language processing applications","authors":"Michael Melese Woldeyohannis, Million Meshesha","doi":"10.1016/j.acorp.2022.100033","DOIUrl":"10.1016/j.acorp.2022.100033","url":null,"abstract":"<div><p>In this paper, we describe the preparation of a usable Amharic text corpus for different Natural Language Processing (NLP) applications. Natural language applications, such as document classification, topic modeling, machine translation, speech recognition, and others, suffer greatly from a lack of digital resources. This is especially true for Amharic, a resource-constrained, morphologically rich, and complex language. In response to this, a total of 67,739 Amharic news documents consisting of 8 different categories from online sources are collected. The collected corpus passes through a number of pre-processing steps including; data cleaning, text normalization and punctuation correction. To validate the usability of the collected corpora from different domains, a baseline document classification experiment was conducted. Experimental results show that, 84.53% accuracy is registered using deep learning in the absence of linguistic information. Finding indicated that it is possible to use the prepared corpora for different natural language applications in the absence of linguistic resources such as stemmer and dictionary despite the complexity of Amharic language. We are further working towards Amharic news document classification by incorporating a linguistic independent stop-word detection, stemming and unsupervised morphological segmentation of Amharic documents.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 3","pages":"Article 100033"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46475960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Replication as a means of assessing corpus representativeness and the generalizability of specialized word lists","authors":"Don Miller","doi":"10.1016/j.acorp.2022.100027","DOIUrl":"10.1016/j.acorp.2022.100027","url":null,"abstract":"<div><p>Considerable energy has gone into designing lists of words that are salient in discourse domains of varying breadth. Over the past two decades, most efforts in designing and validating corpus-based frequency lists have focused on three areas: corpus compilation, item selection criteria, and coverage-based demonstrations of list robustness. As a result, modern corpora are now often much larger and better balanced; the application of additional dispersion statistics allows for better targeting of items with desired distributions; and contemporary lexical frequency lists are proving increasingly efficient, providing ever higher coverage of target texts or achieving such coverage with fewer words. However, despite these important advances, relatively minimal attention has been paid to word list reliability—the extent to which lists can be generalized to the wider discourse domain that has been represented by the corpora upon which they are based. This study begins to address this gap, demonstrating via two word list development case studies (one for Environmental Science and one for Applied Linguistics) that adding iterative reliability analysis—via methodological replication with corpora of increasing size and comparison of items on resulting lists—can be used to: 1) inform corpus design beyond what Biber (1991) terms “situational” parameters, allowing us to see whether corpora are adequately representative of lexical distributions in target discourse domains; and 2) provide valuable insight into the degree of generalizability of word lists we have developed.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 3","pages":"Article 100027"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799122000120/pdfft?md5=99bdd61e7345f961aa3e0dbbbda0d186&pid=1-s2.0-S2666799122000120-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49471849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Frankenberg-Garcia , Paula Tavares Pinto , Ana Eliza Pereira Bocorny , Simone Sarmento
{"title":"Corpus-aided EAP writing workshops to support international scholarly publication","authors":"Ana Frankenberg-Garcia , Paula Tavares Pinto , Ana Eliza Pereira Bocorny , Simone Sarmento","doi":"10.1016/j.acorp.2022.100029","DOIUrl":"10.1016/j.acorp.2022.100029","url":null,"abstract":"<div><p>Writing for international scholarly publication is hard, and arguably harder for researchers with English as an additional language. English teachers could help them, but most teachers have little or no experience of research writing or the specialized languages researchers use. This study trialled and evaluated workshops for Brazilian researchers and English teachers learning together to use corpora and corpus-based tools to develop autonomy in writing and teaching academic English writing for scholarly publication.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 3","pages":"Article 100029"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799122000144/pdfft?md5=fa1c82c2ee110a621abaa295dc402598&pid=1-s2.0-S2666799122000144-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47583185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review of Durrant, Brenchley, and McCallum (2021) Understanding development and proficiency in writing: Quantitative corpus linguistic approaches","authors":"Ashleigh Cox","doi":"10.1016/j.acorp.2022.100024","DOIUrl":"10.1016/j.acorp.2022.100024","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 3","pages":"Article 100024"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47207770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Principled pattern curation to guide data-driven learning design","authors":"Anne O'Keeffe , Geraldine Mark","doi":"10.1016/j.acorp.2022.100028","DOIUrl":"10.1016/j.acorp.2022.100028","url":null,"abstract":"<div><p>Insights from corpus linguistics (CL) have informed language learning and materials design, among many other areas. An important nexus between CL and language learning is the use of Data-Driven Learning (DDL), which draws on the use of corpus data in the classroom and which brings opportunities for inductive language discovery.</p><p>Within the ethos of DDL, learners are encouraged to discover patterns of language and, in so doing, foster more complex cognitive processes such as making inferences. While many studies on DDL concur on the success of this approach, it is still perceived as a marginal practice. Its success so far has been largely limited to intermediate to advanced level learners in higher education settings (Boulton and Cobb 2017). This paper aims to offer guiding principles for how DDL might have wider application across all levels (not just at Intermediate and above) and to set out exemplars for their application at different levels of proficiency. Based on insights from second language acquisition (SLA) and learner corpus research (LCR), the focus of this paper will be on identifying principles for the curation of language patterns that are differentiated for stage of learning. In particular, we are keen to build on recent and important work which looks at SLA through the lens of the usage-based (UB) models (that is, models that view language as being acquired through the use of and exposure to language).</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 3","pages":"Article 100028"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799122000132/pdfft?md5=f53afdebc49d6e7b54500fd05f50d11b&pid=1-s2.0-S2666799122000132-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49216980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}