基于词频分布和篇章统计的母语和非母语英语作文分析

H. Tsubaki
{"title":"基于词频分布和篇章统计的母语和非母语英语作文分析","authors":"H. Tsubaki","doi":"10.1145/3342827.3342856","DOIUrl":null,"url":null,"abstract":"In this paper, word-frequency distribution of JACET 8000 basic words and text statistics were researched to compare and analyze differentials of English compositions (essays) written by native speakers and non-native speakers. As for the native speakers' essays, the Guiraud Index in each Level 2-8 to Average sentence length and Automated Readability Index had higher correlation coefficients. Meanwhile, on the non-native speakers' essays, the index values to Sentence count showed moderate correlation coefficients. It was observed that the productivity and readability of the compositions seem to depend on ranges of basic content words which native or non-native writers have acquired and can use in English. To verify the word-frequency distribution as proficiency rating measurement for non-native speakers, the estimation experiment was carried out based on a multiple-regression model using word-frequency distribution of 68 English compositions written by the non-native writers. The estimated scores of the learners showed a correlation score 0.475 to their actual TOEIC scores. These results confirmed the possibility of the word usage statistics for the objective evaluation of L2 (second language) learners' language proficiency.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analysis of Native and Non-native Speakers' English Compositions based on Word-frequency Distribution and Text Statistics\",\"authors\":\"H. Tsubaki\",\"doi\":\"10.1145/3342827.3342856\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, word-frequency distribution of JACET 8000 basic words and text statistics were researched to compare and analyze differentials of English compositions (essays) written by native speakers and non-native speakers. As for the native speakers' essays, the Guiraud Index in each Level 2-8 to Average sentence length and Automated Readability Index had higher correlation coefficients. Meanwhile, on the non-native speakers' essays, the index values to Sentence count showed moderate correlation coefficients. It was observed that the productivity and readability of the compositions seem to depend on ranges of basic content words which native or non-native writers have acquired and can use in English. To verify the word-frequency distribution as proficiency rating measurement for non-native speakers, the estimation experiment was carried out based on a multiple-regression model using word-frequency distribution of 68 English compositions written by the non-native writers. The estimated scores of the learners showed a correlation score 0.475 to their actual TOEIC scores. These results confirmed the possibility of the word usage statistics for the objective evaluation of L2 (second language) learners' language proficiency.\",\"PeriodicalId\":254461,\"journal\":{\"name\":\"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3342827.3342856\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3342827.3342856","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文通过对JACET 8000基本词的词频分布和文本统计进行研究,比较分析了英语为母语者和非英语为母语者的英语作文(短文)的差异。对于以英语为母语者的作文,2-8级的Guiraud指数与平均句子长度和自动可读性指数有较高的相关系数。与此同时,在非母语者的作文中,句子数的指标值呈中等相关系数。文章的写作效率和可读性似乎取决于英语母语或非英语母语作者所掌握和能够使用的基本实词的范围。为了验证词频分布作为非母语写作者水平评定的衡量标准,本研究基于多元回归模型,利用68篇非母语写作者的英语作文词频分布进行了估计实验。学习者的预估成绩与实际托业成绩的相关系数为0.475。这些结果证实了词汇使用统计对客观评价第二语言学习者语言能力的可能性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Analysis of Native and Non-native Speakers' English Compositions based on Word-frequency Distribution and Text Statistics
In this paper, word-frequency distribution of JACET 8000 basic words and text statistics were researched to compare and analyze differentials of English compositions (essays) written by native speakers and non-native speakers. As for the native speakers' essays, the Guiraud Index in each Level 2-8 to Average sentence length and Automated Readability Index had higher correlation coefficients. Meanwhile, on the non-native speakers' essays, the index values to Sentence count showed moderate correlation coefficients. It was observed that the productivity and readability of the compositions seem to depend on ranges of basic content words which native or non-native writers have acquired and can use in English. To verify the word-frequency distribution as proficiency rating measurement for non-native speakers, the estimation experiment was carried out based on a multiple-regression model using word-frequency distribution of 68 English compositions written by the non-native writers. The estimated scores of the learners showed a correlation score 0.475 to their actual TOEIC scores. These results confirmed the possibility of the word usage statistics for the objective evaluation of L2 (second language) learners' language proficiency.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信