{"title":"使用Swarachakra Android键盘的触屏设备马拉地语词频语料库","authors":"Anirudha N. Joshi, G. Dalvi, Manjiri Joshi","doi":"10.1145/2676702.2676705","DOIUrl":null,"url":null,"abstract":"We describe and publish online a corpus containing word frequencies of Marathi texts that were actually typed by 27,474 users using the Android version of the Swarachakra Marathi keyboard on their mobile devices between August 2013 and September 2014. The corpus has 1,484,059 total words and 184,257 unique words. The paper also provides a preliminary analysis of the word frequencies and some comparisons with two existing corpora. It also provides a qualitative review of the nature of errors that users have made while typing and some idiosyncrasies that they have exhibited. We hope and expect that this corpus will be useful for future researchers, particularly those involved in word completion and auto-correction of user errors.","PeriodicalId":284460,"journal":{"name":"Proceedings of the 6th Indian Conference on Human-Computer Interaction","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Corpus of Marathi Word Frequencies from Touch-Screen Devices Using Swarachakra Android Keyboard\",\"authors\":\"Anirudha N. Joshi, G. Dalvi, Manjiri Joshi\",\"doi\":\"10.1145/2676702.2676705\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We describe and publish online a corpus containing word frequencies of Marathi texts that were actually typed by 27,474 users using the Android version of the Swarachakra Marathi keyboard on their mobile devices between August 2013 and September 2014. The corpus has 1,484,059 total words and 184,257 unique words. The paper also provides a preliminary analysis of the word frequencies and some comparisons with two existing corpora. It also provides a qualitative review of the nature of errors that users have made while typing and some idiosyncrasies that they have exhibited. We hope and expect that this corpus will be useful for future researchers, particularly those involved in word completion and auto-correction of user errors.\",\"PeriodicalId\":284460,\"journal\":{\"name\":\"Proceedings of the 6th Indian Conference on Human-Computer Interaction\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 6th Indian Conference on Human-Computer Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2676702.2676705\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th Indian Conference on Human-Computer Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2676702.2676705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Corpus of Marathi Word Frequencies from Touch-Screen Devices Using Swarachakra Android Keyboard
We describe and publish online a corpus containing word frequencies of Marathi texts that were actually typed by 27,474 users using the Android version of the Swarachakra Marathi keyboard on their mobile devices between August 2013 and September 2014. The corpus has 1,484,059 total words and 184,257 unique words. The paper also provides a preliminary analysis of the word frequencies and some comparisons with two existing corpora. It also provides a qualitative review of the nature of errors that users have made while typing and some idiosyncrasies that they have exhibited. We hope and expect that this corpus will be useful for future researchers, particularly those involved in word completion and auto-correction of user errors.