论学习者语料库研究中的过度和不足以及语料库语言学中的多因素性

IF 1 4区教育学 Q2 EDUCATION & EDUCATIONAL RESEARCH

Journal of Second Language Studies Pub Date : 2018-08-27 DOI:10.1075/JSLS.00005.GRI

S. Gries

{"title":"论学习者语料库研究中的过度和不足以及语料库语言学中的多因素性","authors":"S. Gries","doi":"10.1075/JSLS.00005.GRI","DOIUrl":null,"url":null,"abstract":"\n This paper critically discusses how corpus linguistics in general, but learner corpus research in particular, has been dealing with\n all sorts of frequency data in general, but over- and underuse frequencies in particular. I demonstrate on the basis of learner\n corpus data the pitfalls of using aggregate data and lacking statistical control that much work is unfortunately characterized by.\n In fact, I will demonstrate that monofactorial methods have very little to offer at all to research on observational data. While\n this paper is admittedly very didactic and methodological, I think the discussion of the empirical data offered here – a\n reanalysis of previously published work – shows how misleading many studies potentially and provides far-reaching implications for\n much of corpus linguistics and learner corpus research. Ideally/maximally, this paper together with Paquot & Plonsky (2017, Intntl. J. of Learner Corpus Research) would lead to a complete\n revision of how learner corpus linguists use quantitative methods and study over-/underuse; minimally, this paper would stimulate\n a much-needed discussion of currently lacking methodological sophistication.","PeriodicalId":29903,"journal":{"name":"Journal of Second Language Studies","volume":" ","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"On over- and underuse in learner corpus research and multifactoriality in corpus linguistics more\\n generally\",\"authors\":\"S. Gries\",\"doi\":\"10.1075/JSLS.00005.GRI\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n This paper critically discusses how corpus linguistics in general, but learner corpus research in particular, has been dealing with\\n all sorts of frequency data in general, but over- and underuse frequencies in particular. I demonstrate on the basis of learner\\n corpus data the pitfalls of using aggregate data and lacking statistical control that much work is unfortunately characterized by.\\n In fact, I will demonstrate that monofactorial methods have very little to offer at all to research on observational data. While\\n this paper is admittedly very didactic and methodological, I think the discussion of the empirical data offered here – a\\n reanalysis of previously published work – shows how misleading many studies potentially and provides far-reaching implications for\\n much of corpus linguistics and learner corpus research. Ideally/maximally, this paper together with Paquot & Plonsky (2017, Intntl. J. of Learner Corpus Research) would lead to a complete\\n revision of how learner corpus linguists use quantitative methods and study over-/underuse; minimally, this paper would stimulate\\n a much-needed discussion of currently lacking methodological sophistication.\",\"PeriodicalId\":29903,\"journal\":{\"name\":\"Journal of Second Language Studies\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2018-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Second Language Studies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1075/JSLS.00005.GRI\",\"RegionNum\":4,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Second Language Studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/JSLS.00005.GRI","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 29

摘要

本文批判性地讨论了语料库语言学，特别是学习者语料库研究，是如何处理各种频率数据的，尤其是过度使用和未充分使用的频率。我在学习者语料库数据的基础上展示了使用汇总数据和缺乏统计控制的陷阱，不幸的是，许多工作的特点是。事实上，我将证明单因素方法对观测数据的研究几乎没有什么帮助。虽然这篇论文无可否认是非常说教和方法论的，但我认为这里提供的经验数据的讨论-对先前发表的工作的重新分析-表明了许多研究可能会产生多大的误导，并为许多语料库语言学和学习者语料库研究提供了深远的影响。理想/最大程度上，本文与Paquot & Plonsky (2017, intintl)。学习者语料库研究的J.)将导致学习者语料库语言学家如何使用定量方法和研究过度/不充分使用;至少，这篇论文将激发当前缺乏的方法复杂性的急需的讨论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On over- and underuse in learner corpus research and multifactoriality in corpus linguistics more generally

This paper critically discusses how corpus linguistics in general, but learner corpus research in particular, has been dealing with all sorts of frequency data in general, but over- and underuse frequencies in particular. I demonstrate on the basis of learner corpus data the pitfalls of using aggregate data and lacking statistical control that much work is unfortunately characterized by. In fact, I will demonstrate that monofactorial methods have very little to offer at all to research on observational data. While this paper is admittedly very didactic and methodological, I think the discussion of the empirical data offered here – a reanalysis of previously published work – shows how misleading many studies potentially and provides far-reaching implications for much of corpus linguistics and learner corpus research. Ideally/maximally, this paper together with Paquot & Plonsky (2017, Intntl. J. of Learner Corpus Research) would lead to a complete revision of how learner corpus linguists use quantitative methods and study over-/underuse; minimally, this paper would stimulate a much-needed discussion of currently lacking methodological sophistication.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Second Language Studies Multiple-

CiteScore

1.90

自引率

10.00%

发文量