{"title":"Replication as a means of assessing corpus representativeness and the generalizability of specialized word lists","authors":"Don Miller","doi":"10.1016/j.acorp.2022.100027","DOIUrl":"10.1016/j.acorp.2022.100027","url":null,"abstract":"<div><p>Considerable energy has gone into designing lists of words that are salient in discourse domains of varying breadth. Over the past two decades, most efforts in designing and validating corpus-based frequency lists have focused on three areas: corpus compilation, item selection criteria, and coverage-based demonstrations of list robustness. As a result, modern corpora are now often much larger and better balanced; the application of additional dispersion statistics allows for better targeting of items with desired distributions; and contemporary lexical frequency lists are proving increasingly efficient, providing ever higher coverage of target texts or achieving such coverage with fewer words. However, despite these important advances, relatively minimal attention has been paid to word list reliability—the extent to which lists can be generalized to the wider discourse domain that has been represented by the corpora upon which they are based. This study begins to address this gap, demonstrating via two word list development case studies (one for Environmental Science and one for Applied Linguistics) that adding iterative reliability analysis—via methodological replication with corpora of increasing size and comparison of items on resulting lists—can be used to: 1) inform corpus design beyond what Biber (1991) terms “situational” parameters, allowing us to see whether corpora are adequately representative of lexical distributions in target discourse domains; and 2) provide valuable insight into the degree of generalizability of word lists we have developed.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799122000120/pdfft?md5=99bdd61e7345f961aa3e0dbbbda0d186&pid=1-s2.0-S2666799122000120-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49471849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Frankenberg-Garcia , Paula Tavares Pinto , Ana Eliza Pereira Bocorny , Simone Sarmento
{"title":"Corpus-aided EAP writing workshops to support international scholarly publication","authors":"Ana Frankenberg-Garcia , Paula Tavares Pinto , Ana Eliza Pereira Bocorny , Simone Sarmento","doi":"10.1016/j.acorp.2022.100029","DOIUrl":"10.1016/j.acorp.2022.100029","url":null,"abstract":"<div><p>Writing for international scholarly publication is hard, and arguably harder for researchers with English as an additional language. English teachers could help them, but most teachers have little or no experience of research writing or the specialized languages researchers use. This study trialled and evaluated workshops for Brazilian researchers and English teachers learning together to use corpora and corpus-based tools to develop autonomy in writing and teaching academic English writing for scholarly publication.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799122000144/pdfft?md5=fa1c82c2ee110a621abaa295dc402598&pid=1-s2.0-S2666799122000144-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47583185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review of Durrant, Brenchley, and McCallum (2021) Understanding development and proficiency in writing: Quantitative corpus linguistic approaches","authors":"Ashleigh Cox","doi":"10.1016/j.acorp.2022.100024","DOIUrl":"10.1016/j.acorp.2022.100024","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47207770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Usable Amharic text corpus for natural language processing applications","authors":"Michael Melese Woldeyohannis, Million Meshesha","doi":"10.1016/j.acorp.2022.100033","DOIUrl":"10.1016/j.acorp.2022.100033","url":null,"abstract":"<div><p>In this paper, we describe the preparation of a usable Amharic text corpus for different Natural Language Processing (NLP) applications. Natural language applications, such as document classification, topic modeling, machine translation, speech recognition, and others, suffer greatly from a lack of digital resources. This is especially true for Amharic, a resource-constrained, morphologically rich, and complex language. In response to this, a total of 67,739 Amharic news documents consisting of 8 different categories from online sources are collected. The collected corpus passes through a number of pre-processing steps including; data cleaning, text normalization and punctuation correction. To validate the usability of the collected corpora from different domains, a baseline document classification experiment was conducted. Experimental results show that, 84.53% accuracy is registered using deep learning in the absence of linguistic information. Finding indicated that it is possible to use the prepared corpora for different natural language applications in the absence of linguistic resources such as stemmer and dictionary despite the complexity of Amharic language. We are further working towards Amharic news document classification by incorporating a linguistic independent stop-word detection, stemming and unsupervised morphological segmentation of Amharic documents.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46475960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Teaching, learning, and researching with corpora","authors":"Tove Larsson , Shelley Staples , Jesse Egbert","doi":"10.1016/j.acorp.2022.100025","DOIUrl":"10.1016/j.acorp.2022.100025","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799122000107/pdfft?md5=f51d5341aae2c12e60f6219cf05a08ee&pid=1-s2.0-S2666799122000107-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46245949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Principled pattern curation to guide data-driven learning design","authors":"Anne O'Keeffe , Geraldine Mark","doi":"10.1016/j.acorp.2022.100028","DOIUrl":"10.1016/j.acorp.2022.100028","url":null,"abstract":"<div><p>Insights from corpus linguistics (CL) have informed language learning and materials design, among many other areas. An important nexus between CL and language learning is the use of Data-Driven Learning (DDL), which draws on the use of corpus data in the classroom and which brings opportunities for inductive language discovery.</p><p>Within the ethos of DDL, learners are encouraged to discover patterns of language and, in so doing, foster more complex cognitive processes such as making inferences. While many studies on DDL concur on the success of this approach, it is still perceived as a marginal practice. Its success so far has been largely limited to intermediate to advanced level learners in higher education settings (Boulton and Cobb 2017). This paper aims to offer guiding principles for how DDL might have wider application across all levels (not just at Intermediate and above) and to set out exemplars for their application at different levels of proficiency. Based on insights from second language acquisition (SLA) and learner corpus research (LCR), the focus of this paper will be on identifying principles for the curation of language patterns that are differentiated for stage of learning. In particular, we are keen to build on recent and important work which looks at SLA through the lens of the usage-based (UB) models (that is, models that view language as being acquired through the use of and exposure to language).</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799122000132/pdfft?md5=f53afdebc49d6e7b54500fd05f50d11b&pid=1-s2.0-S2666799122000132-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49216980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification and identification level ambiguity in error annotation","authors":"Alexandros Tantos, Nikolaos Amvrazis","doi":"10.1016/j.acorp.2022.100035","DOIUrl":"10.1016/j.acorp.2022.100035","url":null,"abstract":"<div><p>The vast majority of corpus annotation projects goes through a piloting phase in which the annotation scheme is gradually shaped through iterative annotation cycles until its final version is produced and applied to the collected data. The differences in annotators’ choices are usually recorded and reflected by the ‘Inter-annotator Agreement’ (IAA) that serves as a proxy to understand and resolve the raised issues. However, little has been reported on how to formulate a systematic approach to: (i) tracing the source of the differences in the annotators’ choices and (ii) provide attainable solutions that would considerably increase IAA. In this paper, the ‘Greek Learner Corpus II’ (GLCII) -the largest online greek learner corpus will serve as a basis to shed light on two commonly met types of ambiguity in error annotation that are closely related to target languages in which syncretism is ubiquitous in grammar (e.g., Greek and Romanian): a classification level and an identification level ambiguity.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46834109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oliver Delgaram-Nejad , Gerasimos Chatzidamianos , Dawn Archer , Alex Bartha , Louise Robinson
{"title":"A tutorial on norming linguistic stimuli for clinical populations","authors":"Oliver Delgaram-Nejad , Gerasimos Chatzidamianos , Dawn Archer , Alex Bartha , Louise Robinson","doi":"10.1016/j.acorp.2022.100022","DOIUrl":"10.1016/j.acorp.2022.100022","url":null,"abstract":"<div><p>Stimuli norming (the process of controlling experimental items to minimise bias) is important for the validity of psycholinguistic experiments. Survey norming (asking large numbers of people to rate or otherwise define the items) is typically used for this purpose but requires large samples. Clinical populations are not always large, nor easy to reach. Clinical participants often have ongoing symptomatology, and some cohorts experience language and communication difficulties. We present a corpus-linguistic method suitable for clinical populations for which survey norming is difficult or inappropriate. We also include the experiment generated, which measures metaphor-creation behaviour in schizophrenia to test Cognitive Constraint Theory (CCT) in clinical and nonclinical populations (see S2.1). We describe the design rationale before outlining the design stages in tutorial form. This allows us to show readers why the approach was needed and support them to consider and respond to the challenges that we encountered. We conclude that it is easier to consider norming and design practices in parallel when experimental units are defined linguistically. Corpus stimuli norming provides a versatile alternative when survey norming is prohibitive, especially in speech pathology.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799122000077/pdfft?md5=40b8aaab346c1faa805c35598a6254f4&pid=1-s2.0-S2666799122000077-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45726550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"","authors":"Jamie McKeown","doi":"10.1016/j.acorp.2022.100034","DOIUrl":"10.1016/j.acorp.2022.100034","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46229291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing the situational and linguistic characteristics of first year writing and engineering writing","authors":"Shelley Staples , Ashley JoEtta","doi":"10.1016/j.acorp.2022.100031","DOIUrl":"10.1016/j.acorp.2022.100031","url":null,"abstract":"<div><p>First year writing (FYW) courses aim to prepare students for disciplinary writing. However, research suggests that FYW often fails to provide sufficient preparation for writing across genres and disciplines (Leki, 2007). A register-functional approach to corpus linguistics has elucidated key differences across disciplines and genres for both published and student academic writing (Biber and Gray, 2016; Staples et al., 2016; Staples and Reppen, 2016). To date, however, no studies have compared these features across FYW and First Year Engineering (FYE) writing.</p><p>This research uses a corpus of FYE and FYW texts developed by the authors. The subset for this study includes papers written by undergraduate students majoring in Engineering and taking FYE and FYW courses in the same semester. Technical Briefs (TB) and Design Reports (DR) were selected from the FYE corpus and Rhetorical Analysis (RA) and Research Reports (RR) from the FYW corpus. We investigated the situational context and normed frequencies of linguistic features hypothesized to show similarities and differences.</p><p>Our situational analysis shows key differences in characteristics of the RA and TB, particularly regarding audiences (clients for the TB, and instructors for the RA) and the object of analysis (advertisements for the RA and mathematical models for the TB). There were more similarities between the RR and DR, including a shared focus on a solution to a problem and the presence of both a methods and results section. Results from the linguistic analysis show the impact of the situational characteristics. For example, conditional clauses and premodifying nouns were used at similar rates of occurrence in the DR and RR, reflecting their inclusion of research questions and their sharing detailed information about the problem and solution. Implications of these findings for teaching in these contexts will be discussed.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799122000168/pdfft?md5=495e055e62e32825e71ff86704ea1eec&pid=1-s2.0-S2666799122000168-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47181612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}