{"title":"Stative verbs and perceptions of intensity: The case of ‘believe’ in simple and progressive aspect","authors":"Naoko Taguchi , Marianna Gracheva","doi":"10.1016/j.acorp.2023.100072","DOIUrl":"10.1016/j.acorp.2023.100072","url":null,"abstract":"<div><p><span>This study assessed the validity of descriptive findings from corpus linguistics research by analyzing human participants’ performance and perception data. While the stative verb </span><em>believe</em> usually occurs in the simple aspect, a corpus-based analysis has revealed that <em>believe</em> also occurs in the progressive form in communicative situations conveying a heightened degree of intensity and marked with specific linguistic features such as intensifying adjectives, adverbs of certainty, direct addresses, and others (<span>Gracheva, in press</span>). This study adopted an experimental approach to further assess the link between the progressive form in situations of use conducive to assertive stance and emotional involvement and its surrounding linguistic characteristics. Eighty-six native English speakers were presented with 24 naturally-occurring texts from corpora. Half of the texts involved linguistic features of intensity (progressive aspect condition), while half involved no such features (simple aspect condition). Participants read the texts and selected the form of <em>believe</em> (simple or progressive aspect) which they thought was appropriate in each text. Results showed that participants selected the progressive aspect 47% of the times for the texts featuring language of intensity, while their selection of that aspect was less than 3% in the simple condition texts. Follow-up interviews revealed that participants sensed the intensity conveyed by the texts (e.g., strong emotion, urgency, emphasis), leading to their choice of the progressive over the simple aspect<em>.</em></p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100072"},"PeriodicalIF":0.0,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49497487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oliver Delgaram-Nejad , Dawn Archer , Gerasimos Chatzidamianos , Louise Robinson , Alex Bartha
{"title":"The DAIS-C: A small, specialised, spoken, schizophrenia corpus","authors":"Oliver Delgaram-Nejad , Dawn Archer , Gerasimos Chatzidamianos , Louise Robinson , Alex Bartha","doi":"10.1016/j.acorp.2023.100069","DOIUrl":"10.1016/j.acorp.2023.100069","url":null,"abstract":"<div><p>This paper describes the design and development of the DAIS-C (Discussing Abstract Ideas in Schizophrenia Corpus), a small, specialised corpus of spoken language in which speakers with a diagnosis of schizophrenia and those with no self-reported psychiatric or neuroleptic history were interviewed on the same topics. The corpus was constructed to allow for comparative analyses of speech behaviour in relation to linguistic creativity and formal thought disorder (FTD), but additional steps were taken to ensure that the corpus could be of use to other researchers and research questions. The present paper covers design decisions relevant to the construction of clinical corpora alongside information about the corpus of potential use to researchers interested in its use.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100069"},"PeriodicalIF":0.0,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48630178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Development of word count data corpus for Hindi and Marathi literature","authors":"Vivek Belhekar, Radhika Bhargava","doi":"10.1016/j.acorp.2023.100070","DOIUrl":"10.1016/j.acorp.2023.100070","url":null,"abstract":"<div><p><span>India has a huge diversity of languages, and Hindi and Marathi are the most spoken languages in the northern and western parts of India. Hindi and Marathi have more than 528 million and 83 million speakers, respectively. The paper describes the development of the Hindi Word Corpus (Hindi WordCorp) and the Marathi Word Corpus (Marathi WordCorp), reporting the frequency of single words (1-gram) used in written texts of the respective languages using the bag-of-words model (BoW). The word frequencies are provided for eleven decades (pre-1920, 1920 to 2020). These texts include books (fiction, non-fiction, history, autobiographies, etc.) and magazines. Academic and reference books were not used. The Hindi WordCorp and Marathi WordCorp used 640 and 712 texts, respectively. An analysis was employed to check whether the texts used were enough to stabilize the rank-order of the total frequencies of the words. Zipf's and Heaps’ law coefficients indicated the sufficiency of the texts. Researchers in various areas like linguistics, social sciences, text mining, machine learning, etc., can use the dataset to answer research questions about language and culture. Some demonstrative examples are provided for using the datasets in the two languages. The dataset is made available on an </span>open data<span> repository. The paper is an account of dataset creation for Hindi and Marathi WordCorp. Hence, no empirical results or conclusions are made based on the data created. A WebApp named Indian Languages Word Corpus (ILWC) has been developed for users. Future directions for text mining and language models are discussed.</span></p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100070"},"PeriodicalIF":0.0,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48742031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The identification of YouTube videos that feature the linguistic features of English informal speech","authors":"Christopher R. Cooper","doi":"10.1016/j.acorp.2023.100068","DOIUrl":"10.1016/j.acorp.2023.100068","url":null,"abstract":"<div><p>YouTube is becoming an increasingly popular entertainment platform, with videos catering to a wide range of interests. If L2 users are to become proficient in the primary form of language, conversation, then the affordances created by YouTube videos containing informal speech could be very useful. In the current study a near-random corpus of 2602 YouTube video transcripts was compiled and 200 randomly selected texts from the Spoken BNC2014 (Love et al., 2017) were used as a reference corpus representing informal spoken English. The texts were tagged with 67 linguistic features as part of an additive multi-dimensional analysis. The dimension scores for each text were used in a cluster analysis to investigate which texts clustered with the Spoken BNC2014 texts. A two-cluster solution was chosen with 666 YouTube texts and 171 Spoken BNC2014 texts in one cluster, and the remaining texts in the other cluster. A small sample of texts from each cluster was analysed in detail. It is shown that this method has the potential to identify videos featuring informal speech and that some videos with similar categories have a very different linguistic style.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 3","pages":"Article 100068"},"PeriodicalIF":0.0,"publicationDate":"2023-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42633528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review of McEnery and Brezina (2022) Fundamental Principles of Corpus Linguistics","authors":"Rickey Lu","doi":"10.1016/j.acorp.2023.100055","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100055","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 2","pages":"Article 100055"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49858142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dawn Knight , Tess Fitzpatrick , Steve Morris , Bethan Tovey-Walsh , Helen Prosser , Emyr Davies
{"title":"Corpus to curriculum: Developing word lists for adult learners of Welsh","authors":"Dawn Knight , Tess Fitzpatrick , Steve Morris , Bethan Tovey-Walsh , Helen Prosser , Emyr Davies","doi":"10.1016/j.acorp.2023.100052","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100052","url":null,"abstract":"<div><p>The launch of a language's first comprehensive general corpus promises a sea-change in teaching and learning resources. Effective transition from corpus to classroom is not necessarily straightforward, though; expert and end-user input is essential for the potential of the corpus resource to be realised. This paper outlines the process by which fit-for-purpose vocabulary lists were derived from the new National Corpus of Contemporary Welsh (<em>Corpws Cenedlaethol Cymraeg Cyfoes</em> – CorCenCC). The immediate purpose in this case was to inform the revision of A1 and A2 level course materials for adult learners. A longer-term aim was to put in place a method by which vocabulary lists for more advanced level learners and learners of different ages could be extracted and developed from the corpus. The new corpus means that for the first time, the Welsh language curriculum is able to use word frequency information; teaching and assessment materials in major languages have been informed by word frequencies for several decades. Raw frequency lists, though, include troublesome content, and can exclude items with high relevance to learners. This paper demonstrates how, by working in partnership, Welsh language curriculum writers, assessors, language experts and corpus linguists can effectively manipulate corpus data into curriculum content. The methods and approaches reported here are replicable for use in other language contexts.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 2","pages":"Article 100052"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49817971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The interface between specialized translation and institutional translation: A selection of candidate terms validated by Aeronautical Meteorology corpora","authors":"Rafaela Araújo Jordão Rigaud Peixoto","doi":"10.1016/j.acorp.2023.100051","DOIUrl":"10.1016/j.acorp.2023.100051","url":null,"abstract":"<div><p><span><span>The purpose of this work is to revise and expand an aeronautical meteorology glossary, available at REDEMET, a homepage hosted on the Department of Airspace Control website, taking into consideration corpus data in the field. For that, to best meet the needs of institutions and users, data were compiled from some segments of the Aeronautical Meteorology domain. During the compilation of this corpus, it was noticed that there was a great scarcity of specialized sources of this Aviation subdomain in English and, mainly, in Portuguese, including material by the Department of Airspace Control (DECEA), the only official Brazilian institution with the role of regulating standards relevant to Aeronautical Meteorology. By taking into account that a given government institution is considered an authoritative source concerning terms used in a specialized domain, it would be advisable to align professional and academic expertise, and institutional interests. Therefore, based on contributions of corpus linguistics theories, terminology, and institutional translation, this work relied on established parameters for the compilation and processing of information for inclusion in the corpus, and focused, in this first stage, on the selection of candidate terms, according to </span>corpus analysis. The first results showed that institutional and academic segments present some subtleties regarding terminology, as, on the one hand, some words are more specific to the academic register and, on the other hand, there are different uses of terms in the institutional setting, by </span>ICAO, WMO, or FAA.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 2","pages":"Article 100051"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48255097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"“I will say the picture of the background is not related to the words”: using corpus linguistics and focus groups to reveal how speakers of English as an additional language perceive the effectiveness of the phraseology and imagery in UK public health tweets during COVID-19","authors":"Christian Jones, David Oakey, Kay L. O'Halloran","doi":"10.1016/j.acorp.2023.100053","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100053","url":null,"abstract":"<div><p>This paper reports on an application of a multimodal corpus-based study into the effectiveness of public health information about COVID-19 for speakers of English as an additional language (EAL) in the UK. A corpus of information tweets from 13 UK public health agencies totalling 560,000 words, with concomitant images and videos, was collected between March 2020 and February 2021. The most frequent n-grams occurring across all 13 public health agencies, and sample images occurring alongside these, were identified. In this study, we examine how images and videos combine with the phraseology to shape these COVID-19 public health information messages. Following this, six illustrative tweets were used as prompts for three focus groups of EAL participants based in the UK representing a range of first languages and occupations. Data from the focus groups was analysed in order to identify how common public health phraseology and images were received, understood and responded to by participants and how they felt they could be amended to increase their effectiveness for EAL speakers. We conclude with suggestions for making the language of public health messages simpler and more direct, aligning images more clearly with the language used and removing linguistic ambiguity. These recommendations for how such messaging could be improved in future public health campaigns could ensure a more effective and inclusive public health response.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 2","pages":"Article 100053"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49817931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review of Deignan, Candarli, & Oxley (2023). The linguistic challenge of the transition to secondary school: A corpus study of academic language","authors":"Philip Durrant","doi":"10.1016/j.acorp.2023.100049","DOIUrl":"10.1016/j.acorp.2023.100049","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 2","pages":"Article 100049"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47466935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The corpus of United States state statutes—design, construction and use","authors":"Jesse Egbert, Margaret Wood","doi":"10.1016/j.acorp.2023.100047","DOIUrl":"10.1016/j.acorp.2023.100047","url":null,"abstract":"<div><p>There is a need for more publicly available corpora of legal language. To help fill this gap, we have developed the Corpus of U.S. State Statutes, or CorUSSS, a new corpus comprising the statutory code from all 50 U.S. states. In total the corpus contains 1,785,742 texts, each of which represents the statutory text associated with a unique Universal Citation in one of the 50 U.S. states’ codes. This corpus provides us with the ability to explore language use in statutes within or across all 50 states. After motivating the need for this corpus, we describe its design and the methods we used to collect, clean and store the texts. We then report on a case study that illustrates the utility of this corpus for addressing important questions in statutory interpretation by investigating whether the word <em>information</em><span> can be used to refer to statements that are non-factual. We conclude with a call for researchers in law and corpus linguistics to rely on both legal and ordinary language when investigating questions of interpretation.</span></p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"3 2","pages":"Article 100047"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48380661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}