{"title":"孟加拉文文本压缩方案评价的有效语料库设计","authors":"R. Islam, S. Rajon","doi":"10.1109/ICCITECHN.2008.4802992","DOIUrl":null,"url":null,"abstract":"In this paper, we propose an effective platform for evaluation of Bengali text compression schemes. We perform a methodical study on the formulation-approaches of text corpus for data compression and present an effective corpus named Ekushe-Khul for evaluating the Bengali text compression schemes, which is the first initiative in the context of Bengali text compression. To design the Bengali text compression corpus, we consider type to token ratio as the selection criteria with a number of secondary considerations. This paper also presents a mathematical analysis on data compression performance with structural aspects of corpora. The proposed corpus is effective for evaluating compression efficiency of small and middle sized text files.","PeriodicalId":335795,"journal":{"name":"2008 11th International Conference on Computer and Information Technology","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"On the design of an effective corpus for evaluation of Bengali Text Compression Schemes\",\"authors\":\"R. Islam, S. Rajon\",\"doi\":\"10.1109/ICCITECHN.2008.4802992\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose an effective platform for evaluation of Bengali text compression schemes. We perform a methodical study on the formulation-approaches of text corpus for data compression and present an effective corpus named Ekushe-Khul for evaluating the Bengali text compression schemes, which is the first initiative in the context of Bengali text compression. To design the Bengali text compression corpus, we consider type to token ratio as the selection criteria with a number of secondary considerations. This paper also presents a mathematical analysis on data compression performance with structural aspects of corpora. The proposed corpus is effective for evaluating compression efficiency of small and middle sized text files.\",\"PeriodicalId\":335795,\"journal\":{\"name\":\"2008 11th International Conference on Computer and Information Technology\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 11th International Conference on Computer and Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCITECHN.2008.4802992\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 11th International Conference on Computer and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCITECHN.2008.4802992","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the design of an effective corpus for evaluation of Bengali Text Compression Schemes
In this paper, we propose an effective platform for evaluation of Bengali text compression schemes. We perform a methodical study on the formulation-approaches of text corpus for data compression and present an effective corpus named Ekushe-Khul for evaluating the Bengali text compression schemes, which is the first initiative in the context of Bengali text compression. To design the Bengali text compression corpus, we consider type to token ratio as the selection criteria with a number of secondary considerations. This paper also presents a mathematical analysis on data compression performance with structural aspects of corpora. The proposed corpus is effective for evaluating compression efficiency of small and middle sized text files.