Manu Seth, Sourya Basu, Shivam Chaturvedi, R. Hegde
{"title":"基于多字符频率的印度语高效短信编码","authors":"Manu Seth, Sourya Basu, Shivam Chaturvedi, R. Hegde","doi":"10.1109/NCC.2016.7561128","DOIUrl":null,"url":null,"abstract":"Short Message Service (SMS) via cell phones is a widely used mode of data communication. Currently employed encoding schemes allow the transmission of 160 characters per SMS in English. This drops to 70 characters per SMS if any Indian language including Hindi is used, due to the UNICODE format used therein. Schemes proposed to improve the encoding efficiency of short text messaging generally encode one character at a time. Table splitting schemes that reduce the average number of bits per character are generally used in this context. In this paper, a novel multi-character frequency-based encoding scheme is proposed for efficient messaging of short text messages in four Indian Languages. Both uni-gram and bi-gram modelling based schemes are proposed herein. The efficiency of the proposed schemes is evaluated by conducting experiments on a large multilingual database of short text messages collected from twitter using a dictionary learning approach. Performance evaluation shows that these encoding schemes can allow the transmission of around 190 characters per SMS in English and more than 165 characters per SMS for Four Indian Languages. Encoding efficiency is significantly improved when compared to existing state of the art table marker algorithms and is motivating enough to be used in practice for transmission of short text messages in Indian Languages.","PeriodicalId":279637,"journal":{"name":"2016 Twenty Second National Conference on Communication (NCC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi character frequency based encoding for efficient text messaging in Indian Languages\",\"authors\":\"Manu Seth, Sourya Basu, Shivam Chaturvedi, R. Hegde\",\"doi\":\"10.1109/NCC.2016.7561128\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Short Message Service (SMS) via cell phones is a widely used mode of data communication. Currently employed encoding schemes allow the transmission of 160 characters per SMS in English. This drops to 70 characters per SMS if any Indian language including Hindi is used, due to the UNICODE format used therein. Schemes proposed to improve the encoding efficiency of short text messaging generally encode one character at a time. Table splitting schemes that reduce the average number of bits per character are generally used in this context. In this paper, a novel multi-character frequency-based encoding scheme is proposed for efficient messaging of short text messages in four Indian Languages. Both uni-gram and bi-gram modelling based schemes are proposed herein. The efficiency of the proposed schemes is evaluated by conducting experiments on a large multilingual database of short text messages collected from twitter using a dictionary learning approach. Performance evaluation shows that these encoding schemes can allow the transmission of around 190 characters per SMS in English and more than 165 characters per SMS for Four Indian Languages. Encoding efficiency is significantly improved when compared to existing state of the art table marker algorithms and is motivating enough to be used in practice for transmission of short text messages in Indian Languages.\",\"PeriodicalId\":279637,\"journal\":{\"name\":\"2016 Twenty Second National Conference on Communication (NCC)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Twenty Second National Conference on Communication (NCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCC.2016.7561128\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Twenty Second National Conference on Communication (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2016.7561128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi character frequency based encoding for efficient text messaging in Indian Languages
Short Message Service (SMS) via cell phones is a widely used mode of data communication. Currently employed encoding schemes allow the transmission of 160 characters per SMS in English. This drops to 70 characters per SMS if any Indian language including Hindi is used, due to the UNICODE format used therein. Schemes proposed to improve the encoding efficiency of short text messaging generally encode one character at a time. Table splitting schemes that reduce the average number of bits per character are generally used in this context. In this paper, a novel multi-character frequency-based encoding scheme is proposed for efficient messaging of short text messages in four Indian Languages. Both uni-gram and bi-gram modelling based schemes are proposed herein. The efficiency of the proposed schemes is evaluated by conducting experiments on a large multilingual database of short text messages collected from twitter using a dictionary learning approach. Performance evaluation shows that these encoding schemes can allow the transmission of around 190 characters per SMS in English and more than 165 characters per SMS for Four Indian Languages. Encoding efficiency is significantly improved when compared to existing state of the art table marker algorithms and is motivating enough to be used in practice for transmission of short text messages in Indian Languages.