发展性教育安置的用语学方法

Proceedings of the International Conference EUROPHRAS 2022 (short papers, posters and MUMTTT workshop contributions) Pub Date : 1900-01-01 DOI:10.26615/978-954-452-080-9_011

Miguel Da Corte, J. Baptista

{"title":"发展性教育安置的用语学方法","authors":"Miguel Da Corte, J. Baptista","doi":"10.26615/978-954-452-080-9_011","DOIUrl":null,"url":null,"abstract":". This study focuses on an automatic classification task aiming at placing community college students into the appropriate level (Level 1 and 2) of Developmental Education (DevEd) courses, according to their English L1 proficiency. DevEd courses are designed to remediate and support students’ communication skills in reading and writing be - fore they can fully participate in college-level or college-bearing courses. This paper uses machine-learning methods to investigate the impact of considering multiword expressions (MWE) as entire tokens on the automatic classification task. Since many MWE are often non-compositional in meaning and constitute a large percentage of the textual units of many texts, they are likely to have a relevant role in the data representation of texts and, hence, improve subsequent classification task. Information is scarce regarding the tokenization of MWE and how this affects automatic placement. To this end, a random, balanced corpus of 186 sample texts (93 from each level) was used. Experiments compared the performance of a set of classifiers on the plain text corpus and on a version of the same corpus annotated for MWE. Results showed that using MWE as lexical features improved the classification accuracy by 8.1% above the baseline.","PeriodicalId":215919,"journal":{"name":"Proceedings of the International Conference EUROPHRAS 2022 (short papers, posters and MUMTTT workshop contributions)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Phraseology Approach in Developmental Education Placement\",\"authors\":\"Miguel Da Corte, J. Baptista\",\"doi\":\"10.26615/978-954-452-080-9_011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\". This study focuses on an automatic classification task aiming at placing community college students into the appropriate level (Level 1 and 2) of Developmental Education (DevEd) courses, according to their English L1 proficiency. DevEd courses are designed to remediate and support students’ communication skills in reading and writing be - fore they can fully participate in college-level or college-bearing courses. This paper uses machine-learning methods to investigate the impact of considering multiword expressions (MWE) as entire tokens on the automatic classification task. Since many MWE are often non-compositional in meaning and constitute a large percentage of the textual units of many texts, they are likely to have a relevant role in the data representation of texts and, hence, improve subsequent classification task. Information is scarce regarding the tokenization of MWE and how this affects automatic placement. To this end, a random, balanced corpus of 186 sample texts (93 from each level) was used. Experiments compared the performance of a set of classifiers on the plain text corpus and on a version of the same corpus annotated for MWE. Results showed that using MWE as lexical features improved the classification accuracy by 8.1% above the baseline.\",\"PeriodicalId\":215919,\"journal\":{\"name\":\"Proceedings of the International Conference EUROPHRAS 2022 (short papers, posters and MUMTTT workshop contributions)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference EUROPHRAS 2022 (short papers, posters and MUMTTT workshop contributions)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.26615/978-954-452-080-9_011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference EUROPHRAS 2022 (short papers, posters and MUMTTT workshop contributions)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26615/978-954-452-080-9_011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

。本研究的重点是一个自动分类任务，旨在根据社区大学学生的英语水平将他们划分到适当的发展教育(DevEd)课程水平(1级和2级)。DevEd课程旨在纠正和支持学生在阅读和写作方面的沟通技巧，然后才能充分参与大学水平或大学课程。本文利用机器学习方法研究了将多词表达式(MWE)视为完整标记对自动分类任务的影响。由于许多MWE在意义上通常是非组合的，并且在许多文本的文本单元中占很大比例，因此它们可能在文本的数据表示中发挥相关作用，从而改进后续的分类任务。关于MWE的标记化及其如何影响自动放置的信息很少。为此，使用了186个样本文本(每个级别93个)的随机平衡语料库。实验比较了一组分类器在纯文本语料库和为MWE注释的同一语料库上的性能。结果表明，使用MWE作为词汇特征，分类准确率比基线提高了8.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Phraseology Approach in Developmental Education Placement

. This study focuses on an automatic classification task aiming at placing community college students into the appropriate level (Level 1 and 2) of Developmental Education (DevEd) courses, according to their English L1 proficiency. DevEd courses are designed to remediate and support students’ communication skills in reading and writing be - fore they can fully participate in college-level or college-bearing courses. This paper uses machine-learning methods to investigate the impact of considering multiword expressions (MWE) as entire tokens on the automatic classification task. Since many MWE are often non-compositional in meaning and constitute a large percentage of the textual units of many texts, they are likely to have a relevant role in the data representation of texts and, hence, improve subsequent classification task. Information is scarce regarding the tokenization of MWE and how this affects automatic placement. To this end, a random, balanced corpus of 186 sample texts (93 from each level) was used. Experiments compared the performance of a set of classifiers on the plain text corpus and on a version of the same corpus annotated for MWE. Results showed that using MWE as lexical features improved the classification accuracy by 8.1% above the baseline.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the International Conference EUROPHRAS 2022 (short papers, posters and MUMTTT workshop contributions)

自引率

0.00%

发文量