学习英语的机器学习

IF 2.2 0 LANGUAGE & LINGUISTICS

International Journal of Learner Corpus Research Pub Date : 2020-04-14 DOI:10.1075/ijlcr.18012.bal

Nicolas Ballier, S. Canu, C. Petitjean, G. Gasso, C. Balhana, T. Alexopoulou, Thomas Gaillat

{"title":"学习英语的机器学习","authors":"Nicolas Ballier, S. Canu, C. Petitjean, G. Gasso, C. Balhana, T. Alexopoulou, Thomas Gaillat","doi":"10.1075/ijlcr.18012.bal","DOIUrl":null,"url":null,"abstract":"\n This paper discusses machine learning techniques for the prediction of Common European Framework of Reference (CEFR)\n levels in a learner corpus. We summarise the CAp 2018 Machine Learning (ML) competition, a\n classification task of the six CEFR levels, which map linguistic competence in a foreign language onto six reference levels. The goal of\n this competition was to produce a machine learning system to predict learners’ competence levels from written productions comprising between\n 20 and 300 words and a set of characteristics computed for each text extracted from the French component of the EFCAMDAT data (Geertzen et al., 2013). Together with the description of the competition, we provide an analysis of\n the results and methods proposed by the participants and discuss the benefits of this kind of competition for the learner corpus research\n (LCR) community. The main findings address the methods used and lexical bias introduced by the task.","PeriodicalId":29715,"journal":{"name":"International Journal of Learner Corpus Research","volume":"1 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2020-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Machine learning for learner English\",\"authors\":\"Nicolas Ballier, S. Canu, C. Petitjean, G. Gasso, C. Balhana, T. Alexopoulou, Thomas Gaillat\",\"doi\":\"10.1075/ijlcr.18012.bal\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n This paper discusses machine learning techniques for the prediction of Common European Framework of Reference (CEFR)\\n levels in a learner corpus. We summarise the CAp 2018 Machine Learning (ML) competition, a\\n classification task of the six CEFR levels, which map linguistic competence in a foreign language onto six reference levels. The goal of\\n this competition was to produce a machine learning system to predict learners’ competence levels from written productions comprising between\\n 20 and 300 words and a set of characteristics computed for each text extracted from the French component of the EFCAMDAT data (Geertzen et al., 2013). Together with the description of the competition, we provide an analysis of\\n the results and methods proposed by the participants and discuss the benefits of this kind of competition for the learner corpus research\\n (LCR) community. The main findings address the methods used and lexical bias introduced by the task.\",\"PeriodicalId\":29715,\"journal\":{\"name\":\"International Journal of Learner Corpus Research\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2020-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Learner Corpus Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1075/ijlcr.18012.bal\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Learner Corpus Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/ijlcr.18012.bal","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 8

摘要

本文讨论了在学习者语料库中预测欧洲通用参考框架（CEFR）水平的机器学习技术。我们总结了CAp 2018机器学习（ML）竞赛，这是一项由六个CEFR级别组成的分类任务，将外语的语言能力映射到六个参考级别上。本次比赛的目标是开发一个机器学习系统，根据书面作品预测学习者的能力水平，该书面作品包括20至300个单词，以及从EFCAMDAT数据的法语部分提取的每一篇文本计算的一组特征（Geertzen等人，2013）。在描述比赛的同时，我们对参与者提出的结果和方法进行了分析，并讨论了这种比赛对学习者语料库研究（LCR）社区的好处。主要研究结果涉及任务所使用的方法和引入的词汇偏见。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Machine learning for learner English

This paper discusses machine learning techniques for the prediction of Common European Framework of Reference (CEFR) levels in a learner corpus. We summarise the CAp 2018 Machine Learning (ML) competition, a classification task of the six CEFR levels, which map linguistic competence in a foreign language onto six reference levels. The goal of this competition was to produce a machine learning system to predict learners’ competence levels from written productions comprising between 20 and 300 words and a set of characteristics computed for each text extracted from the French component of the EFCAMDAT data (Geertzen et al., 2013). Together with the description of the competition, we provide an analysis of the results and methods proposed by the participants and discuss the benefits of this kind of competition for the learner corpus research (LCR) community. The main findings address the methods used and lexical bias introduced by the task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Learner Corpus Research

CiteScore

3.40

自引率

27.30%

发文量