词频与文本复杂性:对年轻俄语读者的眼动追踪研究

IF 1.5 0 LANGUAGE & LINGUISTICS
A. Laposhina, M. Lebedeva, Alexandra Berlin Khenis
{"title":"词频与文本复杂性:对年轻俄语读者的眼动追踪研究","authors":"A. Laposhina, M. Lebedeva, Alexandra Berlin Khenis","doi":"10.22363/2687-0088-30084","DOIUrl":null,"url":null,"abstract":"Although word frequency is often associated with the cognitive load on the reader and is widely used for automated text complexity assessment, to date, no eye-tracking data have been obtained on the effectiveness of this parameter for text complexity prediction for the Russian primary school readers. Besides, the optimal ways for taking into account the frequency of individual words to assess an entire text complexity have not yet been precisely determined. This article aims to fill these gaps. The study was conducted on a sample of 53 children of primary school age. As a stimulus material, we used 6 texts that differ in the classical Flesch readability formula and data on the frequency of words in texts. As sources of the frequency data, we used the common frequency dictionary based on the material of the Russian National Corpus and DetCorpus - the corpus of literature addressed to children. The speed of reading the text aloud in words per minute averaged over the grades was employed as a measure of the text complexity. The best predictive results of the relative reading time were obtained using the lemma frequency data from the DetCorpus. At the text level, the highest correlation with the reading speed was shown by the text coverage with a list of 5,000 most frequent words, while both sources of the lists - Russian National Corpus and DetCorpus - showed almost the same correlation values. For a more detailed analysis, we also calculated the correlation of the frequency parameters of specific word forms and lemmas with three parameters of oculomotor activity: the dwell time, fixations count, and the average duration of fixations. At the word-by-word level, the lemma frequency by DetCorpus demonstrated the highest correlation with the relative reading time. The results we obtained confirm the feasibility of using frequency data in the text complexity assessment task for primary school children and demonstrate the optimal ways to calculate frequency data.","PeriodicalId":53426,"journal":{"name":"Russian Journal of Linguistics","volume":"8 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2022-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Word frequency and text complexity: an eye-tracking study of young Russian readers\",\"authors\":\"A. Laposhina, M. Lebedeva, Alexandra Berlin Khenis\",\"doi\":\"10.22363/2687-0088-30084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although word frequency is often associated with the cognitive load on the reader and is widely used for automated text complexity assessment, to date, no eye-tracking data have been obtained on the effectiveness of this parameter for text complexity prediction for the Russian primary school readers. Besides, the optimal ways for taking into account the frequency of individual words to assess an entire text complexity have not yet been precisely determined. This article aims to fill these gaps. The study was conducted on a sample of 53 children of primary school age. As a stimulus material, we used 6 texts that differ in the classical Flesch readability formula and data on the frequency of words in texts. As sources of the frequency data, we used the common frequency dictionary based on the material of the Russian National Corpus and DetCorpus - the corpus of literature addressed to children. The speed of reading the text aloud in words per minute averaged over the grades was employed as a measure of the text complexity. The best predictive results of the relative reading time were obtained using the lemma frequency data from the DetCorpus. At the text level, the highest correlation with the reading speed was shown by the text coverage with a list of 5,000 most frequent words, while both sources of the lists - Russian National Corpus and DetCorpus - showed almost the same correlation values. For a more detailed analysis, we also calculated the correlation of the frequency parameters of specific word forms and lemmas with three parameters of oculomotor activity: the dwell time, fixations count, and the average duration of fixations. At the word-by-word level, the lemma frequency by DetCorpus demonstrated the highest correlation with the relative reading time. The results we obtained confirm the feasibility of using frequency data in the text complexity assessment task for primary school children and demonstrate the optimal ways to calculate frequency data.\",\"PeriodicalId\":53426,\"journal\":{\"name\":\"Russian Journal of Linguistics\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2022-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Russian Journal of Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22363/2687-0088-30084\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Russian Journal of Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22363/2687-0088-30084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 1

摘要

虽然词频通常与读者的认知负荷有关,并被广泛用于自动文本复杂性评估,但迄今为止,还没有关于该参数对俄罗斯小学读者文本复杂性预测有效性的眼动追踪数据。此外,考虑单个单词的频率来评估整个文本复杂性的最佳方法尚未精确确定。本文旨在填补这些空白。这项研究对53名小学适龄儿童进行了抽样调查。作为刺激材料,我们使用了6个不同于经典Flesch可读性公式的文本和文本中单词频率的数据。作为频率数据的来源,我们使用了基于俄罗斯国家语料库和DetCorpus(面向儿童的文学语料库)材料的公共频率词典。每分钟大声朗读课文的平均速度(单词数)被用来衡量课文的复杂程度。利用DetCorpus的引理频率数据获得了相对阅读时间的最佳预测结果。在文本层面,与阅读速度的相关性最高的是包含5000个最常见单词的文本覆盖,而列表的两个来源——俄罗斯国家语料库和DetCorpus——显示出几乎相同的相关性值。为了进行更详细的分析,我们还计算了特定词形和引理的频率参数与眼动活动的三个参数的相关性:停留时间、注视次数和平均注视时间。在逐词水平上,DetCorpus的引词频率与相对阅读时间的相关性最高。研究结果证实了频率数据在小学生文本复杂性评价任务中的可行性,并给出了频率数据的最佳计算方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Word frequency and text complexity: an eye-tracking study of young Russian readers
Although word frequency is often associated with the cognitive load on the reader and is widely used for automated text complexity assessment, to date, no eye-tracking data have been obtained on the effectiveness of this parameter for text complexity prediction for the Russian primary school readers. Besides, the optimal ways for taking into account the frequency of individual words to assess an entire text complexity have not yet been precisely determined. This article aims to fill these gaps. The study was conducted on a sample of 53 children of primary school age. As a stimulus material, we used 6 texts that differ in the classical Flesch readability formula and data on the frequency of words in texts. As sources of the frequency data, we used the common frequency dictionary based on the material of the Russian National Corpus and DetCorpus - the corpus of literature addressed to children. The speed of reading the text aloud in words per minute averaged over the grades was employed as a measure of the text complexity. The best predictive results of the relative reading time were obtained using the lemma frequency data from the DetCorpus. At the text level, the highest correlation with the reading speed was shown by the text coverage with a list of 5,000 most frequent words, while both sources of the lists - Russian National Corpus and DetCorpus - showed almost the same correlation values. For a more detailed analysis, we also calculated the correlation of the frequency parameters of specific word forms and lemmas with three parameters of oculomotor activity: the dwell time, fixations count, and the average duration of fixations. At the word-by-word level, the lemma frequency by DetCorpus demonstrated the highest correlation with the relative reading time. The results we obtained confirm the feasibility of using frequency data in the text complexity assessment task for primary school children and demonstrate the optimal ways to calculate frequency data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Russian Journal of Linguistics
Russian Journal of Linguistics Arts and Humanities-Language and Linguistics
CiteScore
3.00
自引率
33.30%
发文量
43
审稿时长
14 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信