{"title":"使用最小上下文无关语法的相似性度量","authors":"D. Cerra, M. Datcu","doi":"10.1109/DCC.2010.37","DOIUrl":null,"url":null,"abstract":"This work presents a new approximation for the Kolmogorov complexity of strings based on compression with smallest Context Free Grammars (CFG). If, for a given string, a dictionary containing its relevant patterns may be regarded as a model, a Context-Free Grammar may represent a generative model, with all of its rules (and as a consequence its own size) being meaningful. Thus, we define a new complexity approximation which takes into account the size of the string model, in a representation similar to the Minimum Description Length. These considerations result in the definition of a new compression-based similarity measure: its novelty lies in the fact that the impact of complexity overestimations, due to the limits that a real compressor has, can be accounted for and decreased.","PeriodicalId":299459,"journal":{"name":"2010 Data Compression Conference","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A Similarity Measure Using Smallest Context-Free Grammars\",\"authors\":\"D. Cerra, M. Datcu\",\"doi\":\"10.1109/DCC.2010.37\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work presents a new approximation for the Kolmogorov complexity of strings based on compression with smallest Context Free Grammars (CFG). If, for a given string, a dictionary containing its relevant patterns may be regarded as a model, a Context-Free Grammar may represent a generative model, with all of its rules (and as a consequence its own size) being meaningful. Thus, we define a new complexity approximation which takes into account the size of the string model, in a representation similar to the Minimum Description Length. These considerations result in the definition of a new compression-based similarity measure: its novelty lies in the fact that the impact of complexity overestimations, due to the limits that a real compressor has, can be accounted for and decreased.\",\"PeriodicalId\":299459,\"journal\":{\"name\":\"2010 Data Compression Conference\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Data Compression Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.2010.37\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2010.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Similarity Measure Using Smallest Context-Free Grammars
This work presents a new approximation for the Kolmogorov complexity of strings based on compression with smallest Context Free Grammars (CFG). If, for a given string, a dictionary containing its relevant patterns may be regarded as a model, a Context-Free Grammar may represent a generative model, with all of its rules (and as a consequence its own size) being meaningful. Thus, we define a new complexity approximation which takes into account the size of the string model, in a representation similar to the Minimum Description Length. These considerations result in the definition of a new compression-based similarity measure: its novelty lies in the fact that the impact of complexity overestimations, due to the limits that a real compressor has, can be accounted for and decreased.