利用剪枝技术检测图数据集上的近重复项

P. Naveena, P. S. Rao
{"title":"利用剪枝技术检测图数据集上的近重复项","authors":"P. Naveena, P. S. Rao","doi":"10.1109/INDISCON50162.2020.00068","DOIUrl":null,"url":null,"abstract":"Graphs are widely used formalism to model data in various domains such as natural language processing, chemoinformatics, computer vision, information retrieval and software engineering. Finding similar graphs is essential for many applications in these domains. Graph isomorphism finds exact duplicate graphs. However, it fails to quantify similarity and it's computationally expensive. To overcome both these bottlenecks, a number of graph similarity measures have been proposed. Graph Similarity Self-Join (GSSJ) is the problem of finding all pairs of graphs that have similarity score above a predefined threshold. For a graph dataset with n graphs, Naive solution involves similarity score computation for all (n/2) pairs of graphs. This problem is both compute and data intensive. Existing algorithms for this problem support only graph edit distance as the similarity measure. Overarching goal of this research is to develop algorithms for graph similarity self-join that support multiple graph similarity measures. Major contribution of this research will be better indexing mechanisms for graphs and tight bounds on graph similarity.","PeriodicalId":371571,"journal":{"name":"2020 IEEE India Council International Subsections Conference (INDISCON)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detection of Near Duplicates over Graph Datasets Using Pruning\",\"authors\":\"P. Naveena, P. S. Rao\",\"doi\":\"10.1109/INDISCON50162.2020.00068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphs are widely used formalism to model data in various domains such as natural language processing, chemoinformatics, computer vision, information retrieval and software engineering. Finding similar graphs is essential for many applications in these domains. Graph isomorphism finds exact duplicate graphs. However, it fails to quantify similarity and it's computationally expensive. To overcome both these bottlenecks, a number of graph similarity measures have been proposed. Graph Similarity Self-Join (GSSJ) is the problem of finding all pairs of graphs that have similarity score above a predefined threshold. For a graph dataset with n graphs, Naive solution involves similarity score computation for all (n/2) pairs of graphs. This problem is both compute and data intensive. Existing algorithms for this problem support only graph edit distance as the similarity measure. Overarching goal of this research is to develop algorithms for graph similarity self-join that support multiple graph similarity measures. Major contribution of this research will be better indexing mechanisms for graphs and tight bounds on graph similarity.\",\"PeriodicalId\":371571,\"journal\":{\"name\":\"2020 IEEE India Council International Subsections Conference (INDISCON)\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE India Council International Subsections Conference (INDISCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INDISCON50162.2020.00068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE India Council International Subsections Conference (INDISCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDISCON50162.2020.00068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

图是一种广泛应用于自然语言处理、化学信息学、计算机视觉、信息检索和软件工程等领域的数据建模形式。对于这些领域中的许多应用程序来说,找到相似的图是必不可少的。图同构找到完全重复的图。然而,它无法量化相似性,而且计算成本很高。为了克服这两个瓶颈,已经提出了许多图相似度度量。图相似度自连接(GSSJ)是寻找相似度得分高于预定义阈值的所有图对的问题。对于有n个图的图数据集,朴素解决方案涉及所有(n/2)对图的相似度评分计算。这个问题是计算和数据密集型的。现有算法只支持图编辑距离作为相似度度量。本研究的首要目标是开发支持多种图相似度度量的图相似度自连接算法。本研究的主要贡献是更好的图的索引机制和图相似度的严格界限。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Detection of Near Duplicates over Graph Datasets Using Pruning
Graphs are widely used formalism to model data in various domains such as natural language processing, chemoinformatics, computer vision, information retrieval and software engineering. Finding similar graphs is essential for many applications in these domains. Graph isomorphism finds exact duplicate graphs. However, it fails to quantify similarity and it's computationally expensive. To overcome both these bottlenecks, a number of graph similarity measures have been proposed. Graph Similarity Self-Join (GSSJ) is the problem of finding all pairs of graphs that have similarity score above a predefined threshold. For a graph dataset with n graphs, Naive solution involves similarity score computation for all (n/2) pairs of graphs. This problem is both compute and data intensive. Existing algorithms for this problem support only graph edit distance as the similarity measure. Overarching goal of this research is to develop algorithms for graph similarity self-join that support multiple graph similarity measures. Major contribution of this research will be better indexing mechanisms for graphs and tight bounds on graph similarity.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信