{"title":"Innovation on screen","authors":"Susan A. Reichelt","doi":"10.1075/IJCL.00038.REI","DOIUrl":"https://doi.org/10.1075/IJCL.00038.REI","url":null,"abstract":"Abstract This study explores marked affixation as a possible cue for characterization in scripted television dialogue. The data used here is the newly compiled TV Corpus, which encompasses over 265 million words in its North American English context. An initial corpus-based analysis quantifies the innovative use of affixes in word-formation processes across the corpus to allow for comparison with a following character analysis, which investigates how derivational word-formation supports characterization patterns within a specific series, Buffy the Vampire Slayer. For this, a list of productive prefixes (e.g. de-, un-) and suffixes (e.g. -y, -ish) is used to elicit relevant contexts. The study thus combines two approaches to word-formation processes in scripted contexts. On a large scale, it shows how derivational neologisms are spread across TV dialogue and on a much smaller scale, it highlights particular instances where these neologisms are used to aid character construction.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48191280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi-Feng Huang, Akira Murakami, T. Alexopoulou, A. Korhonen
{"title":"Subcategorization frame identification for learner English","authors":"Yi-Feng Huang, Akira Murakami, T. Alexopoulou, A. Korhonen","doi":"10.1075/ijcl.18097.hua","DOIUrl":"https://doi.org/10.1075/ijcl.18097.hua","url":null,"abstract":"Abstract As large-scale learner corpora become increasingly available, it is vital that natural language processing (NLP) technology is developed to provide rich linguistic annotations necessary for second language (L2) research. We present a system for automatically analyzing subcategorization frames (SCFs) for learner English. SCFs link lexis with morphosyntax, shedding light on the interplay between lexical and structural information in learner language. Meanwhile, SCFs are crucial to the study of a wide range of phenomena including individual verbs, verb classes and varying syntactic structures. To illustrate the usefulness of our system for learner corpus research and second language acquisition (SLA), we investigate how L2 learners diversify their use of SCFs in text and how this diversity changes with L2 proficiency.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":"1 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41839430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speech acts in corpus pragmatics","authors":"M. Weisser","doi":"10.1075/IJCL.19023.WEI","DOIUrl":"https://doi.org/10.1075/IJCL.19023.WEI","url":null,"abstract":"\u0000 In corpus pragmatics, most of the research into speech acts still tends to be limited to working with the original, highly\u0000 abstract, speech-act taxonomies devised by ordinary language philosophers like Austin and Searle. The aim of this article is to illustrate\u0000 how the use of such restricted taxonomies may lead to oversimplified or potentially misleading impressions regarding the communicative\u0000 functions expressed in spoken interaction, and to demonstrate how a more elaborate taxonomy, the DART taxonomy (Weisser, 2018), may help us gain better insights into the pragmatic strategies that occur in dialogues. To this end,\u0000 I will draw on a small sample of dialogues, both from a task-oriented domain and unconstrained interaction, and contrast selected speech-act\u0000 categorisations on the basis of Searle’s and the DART taxonomy, demonstrating the advantages that arise from using a more fine-grained\u0000 taxonomy to describe complex verbal exchanges.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":"25 1","pages":"400-425"},"PeriodicalIF":1.0,"publicationDate":"2020-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"58658304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Keyword analysis and the indexing of Aboriginal and Torres Strait Islander identity","authors":"M. Bednarek","doi":"10.1075/ijcl.00031.bed","DOIUrl":"https://doi.org/10.1075/ijcl.00031.bed","url":null,"abstract":"Abstract This article presents a corpus-driven sociolinguistic study of Redfern Now – the first major television drama series commissioned, written, acted, directed and produced by Indigenous industry professionals in Australia. The study examines whether corpus linguistic keyword analysis can identify evidence for type indexicality (social demographics, personae) and trait indexicality (stance, personality), with particular attention paid to the potential indexing of Aboriginal and Torres Strait Islander identity. More specifically, the study’s goal is to retrieve and analyse words that are associated with varieties of English in Australia, and with Australian Aboriginal Englishes in particular. To this end, a corpus with dialogue from Redfern Now is compared to a reference corpus of US television dialogue. Results show that Redfern Now features the use of easily recognisable and familiar words (e.g. blackfella[s], deadly; kinship terms), but also shows clear variation among characters. The case study concludes by evaluating the use of keyword analysis for identifying indexicality in telecinematic discourse.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":"25 1","pages":"369-399"},"PeriodicalIF":1.0,"publicationDate":"2020-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44255321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classifying heuristic textual practices in academic discourse","authors":"Maria Becker, M. Bender, Marcus Müller","doi":"10.1075/ijcl.19097.bec","DOIUrl":"https://doi.org/10.1075/ijcl.19097.bec","url":null,"abstract":"In this paper, we investigate how deep learning techniques can be applied to discourse pragmatics. As a testcase we analyse heuristic textual practices, defined as linguistic implementations of decision routines in research processes in academic discourse. We develop a complex annotation scheme of pragmalinguistic categories on different levels of granularity and manually annotate a corpus of texts across various scientific disciplines. This is the basis for training recurrent neural networks to classify heuristic textual practices. Our experiments show that the annotation categories are robust enough to be recognised by our models which learn similarities of the sentence-surfaces represented as word embeddings. Our study aims at an iterative human-in-the-loop process in which manual-hermeneutic and algorithmic procedures mutually advance the insight process. It underlines the fact that the interaction between manual and automated methods opens up a promising field for further research, allowing interpretative analyses of complex pragmatic phenomena in large corpora.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2020-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44259463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Love, R. (2020). Overcoming Challenges in Corpus Construction: The spoken British National Corpus 2014","authors":"Jiawei Wang","doi":"10.1075/ijcl.00032.wan","DOIUrl":"https://doi.org/10.1075/ijcl.00032.wan","url":null,"abstract":"This article reviews Overcoming Challenges in Corpus Construction: The Spoken British National Corpus 2014","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":"25 1","pages":"504-510"},"PeriodicalIF":1.0,"publicationDate":"2020-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42299166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lima or cima?","authors":"C. Posch, Gerhard Rampl","doi":"10.1075/IJCL.19094.POS","DOIUrl":"https://doi.org/10.1075/IJCL.19094.POS","url":null,"abstract":"Abstract This paper outlines the construction of the corpus Alpenwort, a large, genre-based corpus of German texts on alpinism. We report on issues related to building the corpus from the Austrian Alpine Club Journal (1869–2010). First, a general description of our data and the project phases from digitization and annotation to publication is given. We focus on the most interesting challenges that the diverse layouts and the extensive use of Fraktur typefacing posed for optical layout recognition and optical character recognition (OCR) as well as post correction. The corrected data was lemmatized and annotated with part-of-speech information including named entities as well as TEI-conformant metadata. The resulting 19.9-million-word corpus is designed to be queried using CQPweb and Hyperbase and can be accessed freely online. Lastly, we give a short roadmap of current and future expansions and improvements as corpus data has been and is being enhanced in follow-up projects.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":"25 1","pages":"489-503"},"PeriodicalIF":1.0,"publicationDate":"2020-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47764052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A linguistic typology of American television","authors":"Tony Berber Sardinha, M. Pinto","doi":"10.1075/IJCL.00039.BER","DOIUrl":"https://doi.org/10.1075/IJCL.00039.BER","url":null,"abstract":"Abstract This paper presents the first entirely linguistic typology of contemporary American television, derived from a multi-dimensional (MD) analysis of the USTV corpus. The USTV corpus comprises 930 texts from 191 different TV programs, classified into 31 different registers (including nine telecinematic ones: drama series, miniseries, movies, sitcoms, soap operas, general animation, children’s animation, short-feature animation, and children’s and teens’ shows). The linguistic typology we present in this study is based on the linguistic characteristics present in the individual programs, with no a priori textual categorizations. A cluster analysis grouped the individual programs into clusters that shared similar dimensional profiles. The resulting typology comprises nine different text types – namely Presentation of information, Opinion and discussion, Analysis and debate, Description, Interactive recount, Engaging demonstration, Playful discourse, Simplified interaction, and Simulated conversation. The paper discusses and illustrates each text type and considers how telecinematic discourse relates to each of them.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47273436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A diachronic perspective on telecinematic language","authors":"Valentin Werner","doi":"10.1075/IJCL.00036.WER","DOIUrl":"https://doi.org/10.1075/IJCL.00036.WER","url":null,"abstract":"Abstract Previous corpus-based studies, which have mostly focused on a particular film or series, have identified various key characteristics of telecinematic language. However, a restriction on those results applies as regards the stability of findings across time and across individual productions. To address this gap, and following calls for more nuanced perspectives on telecinematic language as a whole, this study re-assesses a number of claims pertaining to lexical and lexicogrammatical aspects through a diachronic lens. To this end, it uses the Northern American sections of the new Movie and TV Corpora, multi-million word corpora compiled from subtitles of a wide range of film and series genres in the English-speaking world from the 20th and 21st century. Overall, the diachronic view of the data is suggestive of a highly complex nature of telecinematic language, with levels of emotionality and informality increasing over time for most items tested.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47515883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Language use in pop culture over three decades","authors":"Enikó Csomay, Ryan Young","doi":"10.1075/IJCL.00037.CSO","DOIUrl":"https://doi.org/10.1075/IJCL.00037.CSO","url":null,"abstract":"Abstract Analyzing variation in language features in literature and telecinematic discourse provides valuable insights into society’s shifting values and perspectives. In this study, we carry out a keyword analysis on the language of three series of Star Trek television dialogues, broadcast in the 1960s, 1980s, and 1990s, from two perspectives: (i) keywords across the three series highlighting words that are unique to one series in contrast to the other two, providing insights about changes of foci across time; (ii) keywords in relation to gender depicting potential differences in gender roles and how these may change through time across the series.","PeriodicalId":46843,"journal":{"name":"International Journal of Corpus Linguistics","volume":" ","pages":""},"PeriodicalIF":1.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44270333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}