{"title":"Measuring the Similarity of Files by Data Compression","authors":"Hubert Schölnast","doi":"10.1109/DCC55655.2023.00063","DOIUrl":null,"url":null,"abstract":"The two meta-algorithms Concat Compress and Cross Compress, which can be used to measure the similarity of files, were subjected to an extensive practical test together with the compression algorithms Re-Pair, gzip and bz2:Five labeled datasets with 6533 entries and approximately 10 MB were subjected to a classification procedure using these algorithms. Theoretical considerations of the two meta-algorithms have been made in the past [1], but the practical implementation of these methods is still in its infancy. The results from our experiments are promising and show the great potential of this approach.","PeriodicalId":209029,"journal":{"name":"2023 Data Compression Conference (DCC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Data Compression Conference (DCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC55655.2023.00063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The two meta-algorithms Concat Compress and Cross Compress, which can be used to measure the similarity of files, were subjected to an extensive practical test together with the compression algorithms Re-Pair, gzip and bz2:Five labeled datasets with 6533 entries and approximately 10 MB were subjected to a classification procedure using these algorithms. Theoretical considerations of the two meta-algorithms have been made in the past [1], but the practical implementation of these methods is still in its infancy. The results from our experiments are promising and show the great potential of this approach.