CorporaPub Date : 2024-04-01DOI: 10.3366/cor.2024.0296
Henrik Kaatari, Ying Wang, Tove Larsson
{"title":"Introducing the Swedish Learner English Corpus: a corpus that enables investigations of the impact of extramural activities on L2 writing","authors":"Henrik Kaatari, Ying Wang, Tove Larsson","doi":"10.3366/cor.2024.0296","DOIUrl":"https://doi.org/10.3366/cor.2024.0296","url":null,"abstract":"This paper introduces the Swedish Learner English Corpus (slec), which consists of argumentative texts in English that are written by Swedish junior and senior high school students. slec includes rich metadata, enabling empirical studies of various extra-linguistic variables. Most noteworthy is the inclusion of detailed information on students’ extramural English activities (ee), such as reading, watching, conversing, gaming and engaging in social media in English. In addition, a sub-set of texts from slec have been assessed for proficiency using the Common European Framework of Reference for Languages (cefr). This paper provides an overview of the corpus compilation process, the metadata, and the available versions of slec. Researchers, teachers and students can access this resource to investigate various aspects of second language use and development, such as the impact of extramural language activities on linguistic complexity.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140756954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CorporaPub Date : 2024-04-01DOI: 10.3366/cor.2024.0300
Daniel H. Dixon
{"title":"Introducing the Single Player Offline Game Corpus (spoc): a corpus of seven registers from digital role-playing games","authors":"Daniel H. Dixon","doi":"10.3366/cor.2024.0300","DOIUrl":"https://doi.org/10.3366/cor.2024.0300","url":null,"abstract":"This paper describes the compilation and design of the Single Player Offline Game Corpus (spoc), which is being made freely available for research and educational purposes. The spoc was compiled by extracting the localisation files from the digital directories of four popular commercial digital role-playing games: Divinity: Original Sin II, Fallout 4, the Elder Scrolls V: Skyrim, and the Witcher 3: Wild Hunt. The 3.7 million word corpus contains more than 30,000 texts and is unique compared with other game corpora in that it has the following three characteristics: ( 1) the texts are categorised into seven registers using Biber and Conrad’s (2019) register framework, ( 2) texts are systematically parsed into the smallest meaningful units of observation, and ( 3) all texts were compiled from the data files of the games themselves. Nearly all language use in the four games is accounted for and parsed into register categories based on their underlying situational characteristics – in particular, the communicative purposes and the associated contexts in which the texts appear in the games.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140787030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CorporaPub Date : 2024-04-01DOI: 10.3366/cor.2024.0299
Stephanie Rennick, Seán Roberts
{"title":"The Video Game Dialogue Corpus","authors":"Stephanie Rennick, Seán Roberts","doi":"10.3366/cor.2024.0299","DOIUrl":"https://doi.org/10.3366/cor.2024.0299","url":null,"abstract":"This paper presents the Video Game Dialogue Corpus, the first large-scale, consistently coded, open source corpus of dialogue from video games. It contains over 6.2 million words of English dialogue from fifty games in the Role Playing Game (rpg) genre. This includes games produced between 1985 and 2020, rated for children, teenagers and adults, and in both ‘Western’ and ‘Japanese’ sub-genres. The corpus design is described, including custom data formats for representing branching dialogue. We demonstrate the use of the corpus by comparing the dialogue of female and male characters, where we find reflections of gendered language in other media as well as patterns that seem specific to video games. We provide the source code for a ‘self-inflating corpus’ – a pipeline that obtains the data then processes and parses it into a standard format. This makes the corpus available for teaching and research purposes, providing the first such resource for empirical analysis of video game dialogue.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140771700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CorporaPub Date : 2024-04-01DOI: 10.3366/cor.2024.0295
Yu-Hua Chen, Simon Harrison, Michaël Stevens, Qianqian Zhou
{"title":"Developing a multimodal corpus of L2 academic English from an English medium of instruction university in China","authors":"Yu-Hua Chen, Simon Harrison, Michaël Stevens, Qianqian Zhou","doi":"10.3366/cor.2024.0295","DOIUrl":"https://doi.org/10.3366/cor.2024.0295","url":null,"abstract":"This paper describes the rationale for and design of a new multimodal corpus of L2 academic English from a Sino-British university in China: the Corpus of Chinese Academic Written and Spoken English (cawse). The unique context for this corpus provides language samples from Chinese students who use English as a second language (L2) in a preliminary-year programme, which prepares students for academic studies at university level, at a campus where English is used as the Medium of Instruction (emi). Data were collected from a variety of settings, including written (i.e., exam scripts and essays) and spoken assessments (i.e., interviews and presentations), covering the full range of grades awarded to those language samples, as well as from student group interactions during teaching and learning activities. The multimodal nature of the corpus is realised through the availability of selected audio/video recordings accompanied by the orthographically transcribed text. This open-access corpus is designed to help shed light on Chinese students' academic L2 English language use in a variety of written, spoken and multimodal discourses.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140773869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CorporaPub Date : 2024-04-01DOI: 10.3366/cor.2024.0302
Mohsen Shirazizadeh, Narges Moeini
{"title":"Review: Barth and Schnell. 2022. Understanding Corpus Linguistics. New York: Routledge","authors":"Mohsen Shirazizadeh, Narges Moeini","doi":"10.3366/cor.2024.0302","DOIUrl":"https://doi.org/10.3366/cor.2024.0302","url":null,"abstract":"","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140769001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CorporaPub Date : 2024-04-01DOI: 10.3366/cor.2024.0297
Joyce Dong Ok Lim, Geraldine Mark, P. Pérez-Paredes, Anne O’Keeffe
{"title":"Exploring part of speech (pos) tag sequences in a large-scale learner corpus of L2 English: a developmental perspective","authors":"Joyce Dong Ok Lim, Geraldine Mark, P. Pérez-Paredes, Anne O’Keeffe","doi":"10.3366/cor.2024.0297","DOIUrl":"https://doi.org/10.3366/cor.2024.0297","url":null,"abstract":"This research explores the pos tag sequences that shape the transition from upper intermediate (B2 cefr) to near-native proficiency (C2 cefr) in a corpus of essays ( n=32,410) from the Cambridge Learner Corpus. Gilquin (2018) and others have shown that pos tag sequences offer a holistic approach to extracting the most commonly used patterns without a starting point of an a priori set of words and word sequences. Using corpus linguistics informed by usage-based theories of language learning, this paper examines the frequency and distribution of 4-slot pos-tag sequences in L2 English writing, drawing on the taxonomy of pattern grammar ( Francis et al., 1996 , 1998 ; and Hunston and Francis, 2000 ). Findings point to the presence of both core and emergent pos-tag sequences in learner language in the two proficiency levels analysed. These sequences point to the presence of dynamic language restructuring processes as learners become more proficient and re-evaluate their understanding of frequency and distribution in English. This paper shows evidence of how language competence increases with proficiency. The research offers new evidence in our understanding of the development of L2 writing in efl contexts.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140764945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CorporaPub Date : 2024-04-01DOI: 10.3366/cor.2024.0298
Dina Sibai, Sylvia Jaworska
{"title":"Triangulating visual and textual corpus-assisted discourse analysis to study social actor representations: the case of Saudi women in the British and Saudi news media","authors":"Dina Sibai, Sylvia Jaworska","doi":"10.3366/cor.2024.0298","DOIUrl":"https://doi.org/10.3366/cor.2024.0298","url":null,"abstract":"Investigations of social actor representations across media present a large and important body of research in corpus-assisted discourse studies (cads). However, most studies focus exclusively on one mode, the text, whilst other modes of communication (for example, visuals) are either considered partially or not at all. Whilst insights from textual analyses are invaluable in revealing salient and nuanced patterns of social actor representations in the media, visual accompaniments can reinforce particular ‘angles’ creating lasting perceptions for readers and viewers. Though some approaches exist to study considerable numbers of images, visual media data can be complex rendering them difficult to be studied alongside textual cads. This paper uses a triangulation of visual and textual cads analysis to explore social actor representations in media texts and images. It does so by focussing on the representations of Saudi women in the UK and Saudi news media within the context of evolving women’s rights in Saudi Arabia. The study shows how such triangulation can be conducted in a doable and systematic way and how it can enrich cads research on discursive representations of social actors across contexts.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140777820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CorporaPub Date : 2023-08-01DOI: 10.3366/cor.2023.0285
A. Black
{"title":"Review: Islentyeva. 2020. Corpus-based Analysis of Ideological Bias: Migration in the British Press. London: Routledge","authors":"A. Black","doi":"10.3366/cor.2023.0285","DOIUrl":"https://doi.org/10.3366/cor.2023.0285","url":null,"abstract":"","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48989826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CorporaPub Date : 2023-08-01DOI: 10.3366/cor.2023.0280
Shannon Fitzsimmons‐Doolan
{"title":"Twenty-first century ideological discourses about US migrant education that transcend registers","authors":"Shannon Fitzsimmons‐Doolan","doi":"10.3366/cor.2023.0280","DOIUrl":"https://doi.org/10.3366/cor.2023.0280","url":null,"abstract":"Widely distributed and often repeated discursive patterns which represent migrants can influence the education of migrant students ( Calavita, 1996 ; Santa Ana, 2002 ; Cutler, 2017 ; and Dabach et al., 2017 ). Ideological discourses (e.g., ‘immigrants are threats’) are particularly potent structures that mediate language, cognition and social life. Whilst there has been a recent increase in studies of texts on the topic of migration generally, there are few that focus on the intersection of migration and education or on discursive patterns that transcend registers. This study introduces a multi-dimensional analysis approach for the identification of ideological discourses from a 9 million-word corpus of twenty-first century, US texts about migrant education from multiple registers (online comments, national and regional newspaper texts, and federal and state government webpages) using the distribution of lexical variables that characterise variants of migrant/ migration. Eleven ideological discourses (e.g., ‘US immigration policies are problematic, but there is no consensus for solutions’) were found. Of these, several had not been previously identified, one confirmed a previously identified discourse, and several complemented and extended previously identified discursive patterns on this topic. Together, these findings reveal the highly naturalised ideologically discursive landscape that shapes educational opportunities for US migrant students.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48339384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CorporaPub Date : 2023-08-01DOI: 10.3366/cor.2023.0284
Nicole Hober, Tülay Dixon, Tove Larsson
{"title":"Towards increased reliability and transparency in projects with manual linguistic coding","authors":"Nicole Hober, Tülay Dixon, Tove Larsson","doi":"10.3366/cor.2023.0284","DOIUrl":"https://doi.org/10.3366/cor.2023.0284","url":null,"abstract":"Manually coded data form the basis of many of our analyses in corpus linguistics. It is thus imperative that we work towards increased reliability and enhanced transparency in our coding practices, since failing to do so may ultimately lead us to draw erroneous conclusions about language. Using spoken data from a study on adverb usage for illustration, this methods paper discusses some strategies for identifying threats to the reliability of our coding and offers suggestions for how to mitigate these and ensure that our coding can be assessed and replicated. The paper also includes suggestions for best practices for manual linguistic coding and concludes with a discussion of the benefits of such practices. With this paper, we expand on the ongoing discussions in the field on issues of reliability and transparency as they relate to manual coding. We argue that while tests of inter-rater reliability offer a helpful starting point, further steps are needed to ensure increased reliability and transparency.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41419454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}