{"title":"Corpus of Marathi Word Frequencies from Touch-Screen Devices Using Swarachakra Android Keyboard","authors":"Anirudha N. Joshi, G. Dalvi, Manjiri Joshi","doi":"10.1145/2676702.2676705","DOIUrl":null,"url":null,"abstract":"We describe and publish online a corpus containing word frequencies of Marathi texts that were actually typed by 27,474 users using the Android version of the Swarachakra Marathi keyboard on their mobile devices between August 2013 and September 2014. The corpus has 1,484,059 total words and 184,257 unique words. The paper also provides a preliminary analysis of the word frequencies and some comparisons with two existing corpora. It also provides a qualitative review of the nature of errors that users have made while typing and some idiosyncrasies that they have exhibited. We hope and expect that this corpus will be useful for future researchers, particularly those involved in word completion and auto-correction of user errors.","PeriodicalId":284460,"journal":{"name":"Proceedings of the 6th Indian Conference on Human-Computer Interaction","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th Indian Conference on Human-Computer Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2676702.2676705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
We describe and publish online a corpus containing word frequencies of Marathi texts that were actually typed by 27,474 users using the Android version of the Swarachakra Marathi keyboard on their mobile devices between August 2013 and September 2014. The corpus has 1,484,059 total words and 184,257 unique words. The paper also provides a preliminary analysis of the word frequencies and some comparisons with two existing corpora. It also provides a qualitative review of the nature of errors that users have made while typing and some idiosyncrasies that they have exhibited. We hope and expect that this corpus will be useful for future researchers, particularly those involved in word completion and auto-correction of user errors.