{"title":"Categorising speakers’ language background: Theoretical assumptions and methodological challenges for learner corpus research","authors":"Olga Lopopolo , Arianna Bienati , Jennifer-Carmen Frey , Aivars Glaznieks , Stefania Spina","doi":"10.1016/j.rmal.2024.100170","DOIUrl":null,"url":null,"abstract":"<div><div>In this article, we investigate how speakers can be categorised based on their language background in the field of Learner Corpus Research (LCR). Specifically, we discuss three key aspects: first, the theoretical assumptions and methodological choices made in learner corpus design, second the integration of a holistic perspective for speaker categorisation in LCR and third the consequences that different categorisations might have on study outcomes. Through a comprehensive review of corpora used in the field, we identify the most common terms, definitions and criteria of categorisation used to describe a speaker's language background. Focusing on the most central metadata encoding language backgrounds, the <em>L1</em> metadata, we inspect different operationalisations made and scrutinise the theoretical assumptions underlying them. Drawing on research on plurilingualism, we propose a holistic view of speaker's language background for Learner Corpus Research, combining various aspects of speaker's language use by methods inspired from the Dominant Language Constellation framework. We apply this methodology to re-evaluate the language categorisation system in LEONIDE, a multilingual corpus of Italian, German and English texts from secondary school students of diverse language backgrounds. We use the same corpus to evaluate the consequences of using different categorisations of the students on the outcome of possible linguistic studies. Despite a generally high overlap between study results across categorisations, we observe that variables combining multiple aspects of the speakers’ language backgrounds seem to explain group differences for more of the linguistic features investigated.</div></div>","PeriodicalId":101075,"journal":{"name":"Research Methods in Applied Linguistics","volume":"4 1","pages":"Article 100170"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Methods in Applied Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772766124000764","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this article, we investigate how speakers can be categorised based on their language background in the field of Learner Corpus Research (LCR). Specifically, we discuss three key aspects: first, the theoretical assumptions and methodological choices made in learner corpus design, second the integration of a holistic perspective for speaker categorisation in LCR and third the consequences that different categorisations might have on study outcomes. Through a comprehensive review of corpora used in the field, we identify the most common terms, definitions and criteria of categorisation used to describe a speaker's language background. Focusing on the most central metadata encoding language backgrounds, the L1 metadata, we inspect different operationalisations made and scrutinise the theoretical assumptions underlying them. Drawing on research on plurilingualism, we propose a holistic view of speaker's language background for Learner Corpus Research, combining various aspects of speaker's language use by methods inspired from the Dominant Language Constellation framework. We apply this methodology to re-evaluate the language categorisation system in LEONIDE, a multilingual corpus of Italian, German and English texts from secondary school students of diverse language backgrounds. We use the same corpus to evaluate the consequences of using different categorisations of the students on the outcome of possible linguistic studies. Despite a generally high overlap between study results across categorisations, we observe that variables combining multiple aspects of the speakers’ language backgrounds seem to explain group differences for more of the linguistic features investigated.