Corpus of Marathi Word Frequencies from Touch-Screen Devices Using Swarachakra Android Keyboard

Anirudha N. Joshi, G. Dalvi, Manjiri Joshi
{"title":"Corpus of Marathi Word Frequencies from Touch-Screen Devices Using Swarachakra Android Keyboard","authors":"Anirudha N. Joshi, G. Dalvi, Manjiri Joshi","doi":"10.1145/2676702.2676705","DOIUrl":null,"url":null,"abstract":"We describe and publish online a corpus containing word frequencies of Marathi texts that were actually typed by 27,474 users using the Android version of the Swarachakra Marathi keyboard on their mobile devices between August 2013 and September 2014. The corpus has 1,484,059 total words and 184,257 unique words. The paper also provides a preliminary analysis of the word frequencies and some comparisons with two existing corpora. It also provides a qualitative review of the nature of errors that users have made while typing and some idiosyncrasies that they have exhibited. We hope and expect that this corpus will be useful for future researchers, particularly those involved in word completion and auto-correction of user errors.","PeriodicalId":284460,"journal":{"name":"Proceedings of the 6th Indian Conference on Human-Computer Interaction","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th Indian Conference on Human-Computer Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2676702.2676705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

We describe and publish online a corpus containing word frequencies of Marathi texts that were actually typed by 27,474 users using the Android version of the Swarachakra Marathi keyboard on their mobile devices between August 2013 and September 2014. The corpus has 1,484,059 total words and 184,257 unique words. The paper also provides a preliminary analysis of the word frequencies and some comparisons with two existing corpora. It also provides a qualitative review of the nature of errors that users have made while typing and some idiosyncrasies that they have exhibited. We hope and expect that this corpus will be useful for future researchers, particularly those involved in word completion and auto-correction of user errors.
使用Swarachakra Android键盘的触屏设备马拉地语词频语料库
我们描述并在线发布了一个包含马拉地语文本词频的语料库,这些文本实际上是由27,474名用户在2013年8月至2014年9月期间在移动设备上使用Android版本的Swarachakra马拉地语键盘输入的。该语料库共有1,484,059个单词和184,257个唯一单词。本文还对词频进行了初步分析,并与两种现有语料库进行了比较。它还提供了用户在打字时所犯错误的性质和他们所表现出的一些特质的定性审查。我们希望并期待这个语料库对未来的研究人员有用,特别是那些涉及单词补全和用户错误自动纠正的研究人员。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信