Detecting innovations in a parsed corpus of learner English

G. Schneider, Gaëtanelle Gilquin
{"title":"Detecting innovations in a parsed corpus of learner English","authors":"G. Schneider, Gaëtanelle Gilquin","doi":"10.1075/IJLCR.2.2.03SCH","DOIUrl":null,"url":null,"abstract":"The concept of linguistic innovation in English has so far mainly been limited to the description of native and indigenized varieties (ESL). In foreign varieties of English (EFL), on the other hand, non-standard forms are typically considered as errors. Such a treatment, however, (i) fails to acknowledge those cases when foreign learners intend to be creative, as underlined by Rimmer (2008), and (ii) misses commonalities between ESL and EFL. Recent corpus-based studies have provided preliminary evidence that some non-standard forms are shared by indigenized and foreign varieties of English. Nesselhauf (2009) has brought to light similarities in the way of new prepositional verbs like comprise of or emphasise on, while Gilquin (2011) has drawn parallels between phrasal verbs in ESL and EFL (see also Gotz & Schilk 2011, Davydova 2012, Laporte 2012 and Deshors 2014, among others). Such commonalities challenge the idea of a clear dichotomy between innovations and errors, and encourage us to look for more similarities between ESL and EFL. We present a data-driven method to detect potential innovations in EFL on a large scale, test it on verb-preposition structures, and describe similarities and differences between ESL and EFL. Relying on the whole of the International Corpus of Learner English (ICLE), which has been parsed with the probabilistic dependency parser Pro3Gres (Schneider 2008), we have automatically extracted potential innovations, defined here as patterns of overuse in ICLE compared to a reference corpus, for which we use the British National Corpus (BNC). We measure overuse by means of various collocation measures such as O/E or T-score (e.g. Evert 2009). Our approach is related to Schneider & Zipp (2013), which allows us to conduct a detailed comparison with novel combinations of verbs and prepositions found in Schneider & Zipp (2013) for ESL, based on the International Corpus of English (ICE). We find both striking similarities (e.g. discuss about) and dissimilarities (e.g. accuse for, only distinctive for EFL). The quantitative study is followed by a qualitative step, in which we aim to explain origins of non-native-like combinations in EFL (e.g. viewed upon as, probably built by analogy with looked upon as) and try to find criteria to determine what could be identified as actual innovations. We discuss total frequency, recurrence limited to learners from the same L1, which could point to L1 transfer innovations, and recurrence across different L1s, which could point to psycholinguistically based innovations that are the result of, e.g., processing load or semantic explicitness.","PeriodicalId":440472,"journal":{"name":"Rethinking Linguistic Creativity in Non-native Englishes","volume":"163 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Rethinking Linguistic Creativity in Non-native Englishes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/IJLCR.2.2.03SCH","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

The concept of linguistic innovation in English has so far mainly been limited to the description of native and indigenized varieties (ESL). In foreign varieties of English (EFL), on the other hand, non-standard forms are typically considered as errors. Such a treatment, however, (i) fails to acknowledge those cases when foreign learners intend to be creative, as underlined by Rimmer (2008), and (ii) misses commonalities between ESL and EFL. Recent corpus-based studies have provided preliminary evidence that some non-standard forms are shared by indigenized and foreign varieties of English. Nesselhauf (2009) has brought to light similarities in the way of new prepositional verbs like comprise of or emphasise on, while Gilquin (2011) has drawn parallels between phrasal verbs in ESL and EFL (see also Gotz & Schilk 2011, Davydova 2012, Laporte 2012 and Deshors 2014, among others). Such commonalities challenge the idea of a clear dichotomy between innovations and errors, and encourage us to look for more similarities between ESL and EFL. We present a data-driven method to detect potential innovations in EFL on a large scale, test it on verb-preposition structures, and describe similarities and differences between ESL and EFL. Relying on the whole of the International Corpus of Learner English (ICLE), which has been parsed with the probabilistic dependency parser Pro3Gres (Schneider 2008), we have automatically extracted potential innovations, defined here as patterns of overuse in ICLE compared to a reference corpus, for which we use the British National Corpus (BNC). We measure overuse by means of various collocation measures such as O/E or T-score (e.g. Evert 2009). Our approach is related to Schneider & Zipp (2013), which allows us to conduct a detailed comparison with novel combinations of verbs and prepositions found in Schneider & Zipp (2013) for ESL, based on the International Corpus of English (ICE). We find both striking similarities (e.g. discuss about) and dissimilarities (e.g. accuse for, only distinctive for EFL). The quantitative study is followed by a qualitative step, in which we aim to explain origins of non-native-like combinations in EFL (e.g. viewed upon as, probably built by analogy with looked upon as) and try to find criteria to determine what could be identified as actual innovations. We discuss total frequency, recurrence limited to learners from the same L1, which could point to L1 transfer innovations, and recurrence across different L1s, which could point to psycholinguistically based innovations that are the result of, e.g., processing load or semantic explicitness.
在已解析的学习者英语语料库中发现创新
到目前为止,英语语言创新的概念主要局限于对本土和本土化变体(ESL)的描述。另一方面,在外国英语变体中,不标准的形式通常被认为是错误的。然而,这样的处理方法(i)没有认识到外国学习者想要创造性的情况,正如Rimmer(2008)所强调的那样,(ii)忽略了ESL和EFL之间的共性。最近的基于语料库的研究提供了初步证据,表明一些非标准形式在本土英语和外国英语变体中是共同的。Nesselhauf(2009)揭示了新介词动词的相似之处,如comprise of或强调on,而Gilquin(2011)则指出了ESL和EFL短语动词之间的相似之处(另见Gotz & Schilk 2011, Davydova 2012, Laporte 2012和Deshors 2014等)。这些共性挑战了创新和错误之间的明确二分法,并鼓励我们在ESL和EFL之间寻找更多的相似之处。我们提出了一种数据驱动的方法来大规模地检测英语的潜在创新,在动词-介词结构上进行测试,并描述ESL和EFL之间的异同。依靠整个国际英语学习者语料库(ICLE),它已经被概率依赖解析器Pro3Gres (Schneider 2008)解析,我们自动提取了潜在的创新,这里定义为与参考语料库相比,ICLE中过度使用的模式,我们使用英国国家语料库(BNC)。我们通过各种搭配测量来衡量过度使用,例如O/E或T-score(例如Evert 2009)。我们的方法与Schneider & Zipp(2013)有关,它允许我们对Schneider & Zipp(2013)中基于国际英语语料库(ICE)的ESL动词和介词的新组合进行详细的比较。我们发现了惊人的相似之处(如讨论)和不同之处(如指责,只有在EFL中才有特色)。定量研究之后是定性步骤,在此步骤中,我们旨在解释EFL中非母语组合的起源(例如,viewed upon as,可能通过类比与look upon as建立),并试图找到确定哪些可以被视为实际创新的标准。我们讨论了总频率,局限于同一母语学习者的递归,这可能指向母语迁移创新,以及跨不同母语学习者的递归,这可能指向基于心理语言学的创新,这些创新是加工负荷或语义明确性等因素的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信