{"title":"使用动态相似度保持散列加速基于相似度的模型匹配","authors":"Xiaoyu He, Letian Tang, Yutong Li","doi":"10.1145/3550355.3552406","DOIUrl":null,"url":null,"abstract":"Similarity-based model matching is the foundation of model versioning. It pairs model elements based on a distance metric (e.g., edit distance). Because it is expensive to calculate the distance between two elements, a similarity-based matcher usually suffers from performance issues when the model size increases. This paper proposes a hash-based approach to accelerate similarity-based model matching. Firstly, we design a novel similarity-preserving hash function that maps a model element to a 64-bit hash value. If two elements are similar, their hashes are also very close. Secondly, we propose a 3-layer index structure and a query algorithm to quickly filter out impossible candidates for the element to be matched based on their hashes. For the remaining candidates, we employ the classical similarity-based matching algorithm to determine the final matches. Our approach has been realized and integrated into EMF Compare. The evaluation results show that our hash function is effective to preserve the similarity between model elements and our matching approach reduces 16%--72% of time costs while assuring the matching results consistent with EMF Compare.","PeriodicalId":303547,"journal":{"name":"Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accelerating similarity-based model matching using on-the-fly similarity preserving hashing\",\"authors\":\"Xiaoyu He, Letian Tang, Yutong Li\",\"doi\":\"10.1145/3550355.3552406\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Similarity-based model matching is the foundation of model versioning. It pairs model elements based on a distance metric (e.g., edit distance). Because it is expensive to calculate the distance between two elements, a similarity-based matcher usually suffers from performance issues when the model size increases. This paper proposes a hash-based approach to accelerate similarity-based model matching. Firstly, we design a novel similarity-preserving hash function that maps a model element to a 64-bit hash value. If two elements are similar, their hashes are also very close. Secondly, we propose a 3-layer index structure and a query algorithm to quickly filter out impossible candidates for the element to be matched based on their hashes. For the remaining candidates, we employ the classical similarity-based matching algorithm to determine the final matches. Our approach has been realized and integrated into EMF Compare. The evaluation results show that our hash function is effective to preserve the similarity between model elements and our matching approach reduces 16%--72% of time costs while assuring the matching results consistent with EMF Compare.\",\"PeriodicalId\":303547,\"journal\":{\"name\":\"Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3550355.3552406\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3550355.3552406","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Accelerating similarity-based model matching using on-the-fly similarity preserving hashing
Similarity-based model matching is the foundation of model versioning. It pairs model elements based on a distance metric (e.g., edit distance). Because it is expensive to calculate the distance between two elements, a similarity-based matcher usually suffers from performance issues when the model size increases. This paper proposes a hash-based approach to accelerate similarity-based model matching. Firstly, we design a novel similarity-preserving hash function that maps a model element to a 64-bit hash value. If two elements are similar, their hashes are also very close. Secondly, we propose a 3-layer index structure and a query algorithm to quickly filter out impossible candidates for the element to be matched based on their hashes. For the remaining candidates, we employ the classical similarity-based matching algorithm to determine the final matches. Our approach has been realized and integrated into EMF Compare. The evaluation results show that our hash function is effective to preserve the similarity between model elements and our matching approach reduces 16%--72% of time costs while assuring the matching results consistent with EMF Compare.