使用动态相似度保持散列加速基于相似度的模型匹配

Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems Pub Date : 2022-10-23 DOI:10.1145/3550355.3552406

Xiaoyu He, Letian Tang, Yutong Li

{"title":"使用动态相似度保持散列加速基于相似度的模型匹配","authors":"Xiaoyu He, Letian Tang, Yutong Li","doi":"10.1145/3550355.3552406","DOIUrl":null,"url":null,"abstract":"Similarity-based model matching is the foundation of model versioning. It pairs model elements based on a distance metric (e.g., edit distance). Because it is expensive to calculate the distance between two elements, a similarity-based matcher usually suffers from performance issues when the model size increases. This paper proposes a hash-based approach to accelerate similarity-based model matching. Firstly, we design a novel similarity-preserving hash function that maps a model element to a 64-bit hash value. If two elements are similar, their hashes are also very close. Secondly, we propose a 3-layer index structure and a query algorithm to quickly filter out impossible candidates for the element to be matched based on their hashes. For the remaining candidates, we employ the classical similarity-based matching algorithm to determine the final matches. Our approach has been realized and integrated into EMF Compare. The evaluation results show that our hash function is effective to preserve the similarity between model elements and our matching approach reduces 16%--72% of time costs while assuring the matching results consistent with EMF Compare.","PeriodicalId":303547,"journal":{"name":"Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accelerating similarity-based model matching using on-the-fly similarity preserving hashing\",\"authors\":\"Xiaoyu He, Letian Tang, Yutong Li\",\"doi\":\"10.1145/3550355.3552406\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Similarity-based model matching is the foundation of model versioning. It pairs model elements based on a distance metric (e.g., edit distance). Because it is expensive to calculate the distance between two elements, a similarity-based matcher usually suffers from performance issues when the model size increases. This paper proposes a hash-based approach to accelerate similarity-based model matching. Firstly, we design a novel similarity-preserving hash function that maps a model element to a 64-bit hash value. If two elements are similar, their hashes are also very close. Secondly, we propose a 3-layer index structure and a query algorithm to quickly filter out impossible candidates for the element to be matched based on their hashes. For the remaining candidates, we employ the classical similarity-based matching algorithm to determine the final matches. Our approach has been realized and integrated into EMF Compare. The evaluation results show that our hash function is effective to preserve the similarity between model elements and our matching approach reduces 16%--72% of time costs while assuring the matching results consistent with EMF Compare.\",\"PeriodicalId\":303547,\"journal\":{\"name\":\"Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3550355.3552406\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3550355.3552406","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

基于相似度的模型匹配是模型版本控制的基础。它基于距离度量(例如，编辑距离)对模型元素进行配对。由于计算两个元素之间的距离是非常昂贵的，所以当模型大小增加时，基于相似性的匹配器通常会遇到性能问题。本文提出了一种基于哈希的方法来加速基于相似度的模型匹配。首先，我们设计了一个新的保持相似度的哈希函数，将模型元素映射到64位哈希值。如果两个元素相似，它们的哈希值也非常接近。其次，我们提出了一种3层索引结构和查询算法，可以根据元素的哈希值快速过滤掉不可能匹配的元素。对于剩余的候选，我们采用经典的基于相似性的匹配算法来确定最终的匹配。我们的方法已经实现并集成到EMF比较中。评估结果表明，我们的哈希函数有效地保持了模型元素之间的相似性，我们的匹配方法在确保匹配结果与EMF比较一致的同时减少了16%- 72%的时间成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Accelerating similarity-based model matching using on-the-fly similarity preserving hashing

Similarity-based model matching is the foundation of model versioning. It pairs model elements based on a distance metric (e.g., edit distance). Because it is expensive to calculate the distance between two elements, a similarity-based matcher usually suffers from performance issues when the model size increases. This paper proposes a hash-based approach to accelerate similarity-based model matching. Firstly, we design a novel similarity-preserving hash function that maps a model element to a 64-bit hash value. If two elements are similar, their hashes are also very close. Secondly, we propose a 3-layer index structure and a query algorithm to quickly filter out impossible candidates for the element to be matched based on their hashes. For the remaining candidates, we employ the classical similarity-based matching algorithm to determine the final matches. Our approach has been realized and integrated into EMF Compare. The evaluation results show that our hash function is effective to preserve the similarity between model elements and our matching approach reduces 16%--72% of time costs while assuring the matching results consistent with EMF Compare.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems

自引率

0.00%

发文量