Nicolas Ballier, S. Canu, C. Petitjean, G. Gasso, C. Balhana, T. Alexopoulou, Thomas Gaillat
{"title":"学习英语的机器学习","authors":"Nicolas Ballier, S. Canu, C. Petitjean, G. Gasso, C. Balhana, T. Alexopoulou, Thomas Gaillat","doi":"10.1075/ijlcr.18012.bal","DOIUrl":null,"url":null,"abstract":"\n This paper discusses machine learning techniques for the prediction of Common European Framework of Reference (CEFR)\n levels in a learner corpus. We summarise the CAp 2018 Machine Learning (ML) competition, a\n classification task of the six CEFR levels, which map linguistic competence in a foreign language onto six reference levels. The goal of\n this competition was to produce a machine learning system to predict learners’ competence levels from written productions comprising between\n 20 and 300 words and a set of characteristics computed for each text extracted from the French component of the EFCAMDAT data (Geertzen et al., 2013). Together with the description of the competition, we provide an analysis of\n the results and methods proposed by the participants and discuss the benefits of this kind of competition for the learner corpus research\n (LCR) community. The main findings address the methods used and lexical bias introduced by the task.","PeriodicalId":29715,"journal":{"name":"International Journal of Learner Corpus Research","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2020-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Machine learning for learner English\",\"authors\":\"Nicolas Ballier, S. Canu, C. Petitjean, G. Gasso, C. Balhana, T. Alexopoulou, Thomas Gaillat\",\"doi\":\"10.1075/ijlcr.18012.bal\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n This paper discusses machine learning techniques for the prediction of Common European Framework of Reference (CEFR)\\n levels in a learner corpus. We summarise the CAp 2018 Machine Learning (ML) competition, a\\n classification task of the six CEFR levels, which map linguistic competence in a foreign language onto six reference levels. The goal of\\n this competition was to produce a machine learning system to predict learners’ competence levels from written productions comprising between\\n 20 and 300 words and a set of characteristics computed for each text extracted from the French component of the EFCAMDAT data (Geertzen et al., 2013). Together with the description of the competition, we provide an analysis of\\n the results and methods proposed by the participants and discuss the benefits of this kind of competition for the learner corpus research\\n (LCR) community. The main findings address the methods used and lexical bias introduced by the task.\",\"PeriodicalId\":29715,\"journal\":{\"name\":\"International Journal of Learner Corpus Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2020-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Learner Corpus Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1075/ijlcr.18012.bal\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Learner Corpus Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/ijlcr.18012.bal","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
This paper discusses machine learning techniques for the prediction of Common European Framework of Reference (CEFR)
levels in a learner corpus. We summarise the CAp 2018 Machine Learning (ML) competition, a
classification task of the six CEFR levels, which map linguistic competence in a foreign language onto six reference levels. The goal of
this competition was to produce a machine learning system to predict learners’ competence levels from written productions comprising between
20 and 300 words and a set of characteristics computed for each text extracted from the French component of the EFCAMDAT data (Geertzen et al., 2013). Together with the description of the competition, we provide an analysis of
the results and methods proposed by the participants and discuss the benefits of this kind of competition for the learner corpus research
(LCR) community. The main findings address the methods used and lexical bias introduced by the task.