Corpus of Marathi Word Frequencies from Touch-Screen Devices Using Swarachakra Android Keyboard

Proceedings of the 6th Indian Conference on Human-Computer Interaction Pub Date : 2014-12-07 DOI:10.1145/2676702.2676705

Anirudha N. Joshi, G. Dalvi, Manjiri Joshi

引用次数: 6

Abstract

We describe and publish online a corpus containing word frequencies of Marathi texts that were actually typed by 27,474 users using the Android version of the Swarachakra Marathi keyboard on their mobile devices between August 2013 and September 2014. The corpus has 1,484,059 total words and 184,257 unique words. The paper also provides a preliminary analysis of the word frequencies and some comparisons with two existing corpora. It also provides a qualitative review of the nature of errors that users have made while typing and some idiosyncrasies that they have exhibited. We hope and expect that this corpus will be useful for future researchers, particularly those involved in word completion and auto-correction of user errors.

查看原文本刊更多论文

使用Swarachakra Android键盘的触屏设备马拉地语词频语料库

我们描述并在线发布了一个包含马拉地语文本词频的语料库，这些文本实际上是由27,474名用户在2013年8月至2014年9月期间在移动设备上使用Android版本的Swarachakra马拉地语键盘输入的。该语料库共有1,484,059个单词和184,257个唯一单词。本文还对词频进行了初步分析，并与两种现有语料库进行了比较。它还提供了用户在打字时所犯错误的性质和他们所表现出的一些特质的定性审查。我们希望并期待这个语料库对未来的研究人员有用，特别是那些涉及单词补全和用户错误自动纠正的研究人员。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 6th Indian Conference on Human-Computer Interaction

自引率

0.00%

发文量