{"title":"Detecting innovations in a parsed corpus of learner English","authors":"G. Schneider, Gaëtanelle Gilquin","doi":"10.1075/IJLCR.2.2.03SCH","DOIUrl":null,"url":null,"abstract":"The concept of linguistic innovation in English has so far mainly been limited to the description of native and indigenized varieties (ESL). In foreign varieties of English (EFL), on the other hand, non-standard forms are typically considered as errors. Such a treatment, however, (i) fails to acknowledge those cases when foreign learners intend to be creative, as underlined by Rimmer (2008), and (ii) misses commonalities between ESL and EFL. Recent corpus-based studies have provided preliminary evidence that some non-standard forms are shared by indigenized and foreign varieties of English. Nesselhauf (2009) has brought to light similarities in the way of new prepositional verbs like comprise of or emphasise on, while Gilquin (2011) has drawn parallels between phrasal verbs in ESL and EFL (see also Gotz & Schilk 2011, Davydova 2012, Laporte 2012 and Deshors 2014, among others). Such commonalities challenge the idea of a clear dichotomy between innovations and errors, and encourage us to look for more similarities between ESL and EFL. We present a data-driven method to detect potential innovations in EFL on a large scale, test it on verb-preposition structures, and describe similarities and differences between ESL and EFL. Relying on the whole of the International Corpus of Learner English (ICLE), which has been parsed with the probabilistic dependency parser Pro3Gres (Schneider 2008), we have automatically extracted potential innovations, defined here as patterns of overuse in ICLE compared to a reference corpus, for which we use the British National Corpus (BNC). We measure overuse by means of various collocation measures such as O/E or T-score (e.g. Evert 2009). Our approach is related to Schneider & Zipp (2013), which allows us to conduct a detailed comparison with novel combinations of verbs and prepositions found in Schneider & Zipp (2013) for ESL, based on the International Corpus of English (ICE). We find both striking similarities (e.g. discuss about) and dissimilarities (e.g. accuse for, only distinctive for EFL). The quantitative study is followed by a qualitative step, in which we aim to explain origins of non-native-like combinations in EFL (e.g. viewed upon as, probably built by analogy with looked upon as) and try to find criteria to determine what could be identified as actual innovations. We discuss total frequency, recurrence limited to learners from the same L1, which could point to L1 transfer innovations, and recurrence across different L1s, which could point to psycholinguistically based innovations that are the result of, e.g., processing load or semantic explicitness.","PeriodicalId":440472,"journal":{"name":"Rethinking Linguistic Creativity in Non-native Englishes","volume":"163 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Rethinking Linguistic Creativity in Non-native Englishes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/IJLCR.2.2.03SCH","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
The concept of linguistic innovation in English has so far mainly been limited to the description of native and indigenized varieties (ESL). In foreign varieties of English (EFL), on the other hand, non-standard forms are typically considered as errors. Such a treatment, however, (i) fails to acknowledge those cases when foreign learners intend to be creative, as underlined by Rimmer (2008), and (ii) misses commonalities between ESL and EFL. Recent corpus-based studies have provided preliminary evidence that some non-standard forms are shared by indigenized and foreign varieties of English. Nesselhauf (2009) has brought to light similarities in the way of new prepositional verbs like comprise of or emphasise on, while Gilquin (2011) has drawn parallels between phrasal verbs in ESL and EFL (see also Gotz & Schilk 2011, Davydova 2012, Laporte 2012 and Deshors 2014, among others). Such commonalities challenge the idea of a clear dichotomy between innovations and errors, and encourage us to look for more similarities between ESL and EFL. We present a data-driven method to detect potential innovations in EFL on a large scale, test it on verb-preposition structures, and describe similarities and differences between ESL and EFL. Relying on the whole of the International Corpus of Learner English (ICLE), which has been parsed with the probabilistic dependency parser Pro3Gres (Schneider 2008), we have automatically extracted potential innovations, defined here as patterns of overuse in ICLE compared to a reference corpus, for which we use the British National Corpus (BNC). We measure overuse by means of various collocation measures such as O/E or T-score (e.g. Evert 2009). Our approach is related to Schneider & Zipp (2013), which allows us to conduct a detailed comparison with novel combinations of verbs and prepositions found in Schneider & Zipp (2013) for ESL, based on the International Corpus of English (ICE). We find both striking similarities (e.g. discuss about) and dissimilarities (e.g. accuse for, only distinctive for EFL). The quantitative study is followed by a qualitative step, in which we aim to explain origins of non-native-like combinations in EFL (e.g. viewed upon as, probably built by analogy with looked upon as) and try to find criteria to determine what could be identified as actual innovations. We discuss total frequency, recurrence limited to learners from the same L1, which could point to L1 transfer innovations, and recurrence across different L1s, which could point to psycholinguistically based innovations that are the result of, e.g., processing load or semantic explicitness.