Weighted Edge Sampling for Static Graphs

IF 0.5 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Data Mining Modelling and Management Pub Date : 2023-01-01 DOI:10.1504/ijdmmm.2023.10059714

Muhammad Irfan Yousuf, Raheel Anwar

{"title":"Weighted Edge Sampling for Static Graphs","authors":"Muhammad Irfan Yousuf, Raheel Anwar","doi":"10.1504/ijdmmm.2023.10059714","DOIUrl":null,"url":null,"abstract":"Graph Sampling provides an efficient yet inexpensive solution for analyzing large graphs. While extracting small representative subgraphs from large graphs, the challenge is to capture the properties of the original graph. Several sampling algorithms have been proposed in previous studies, but they lack in extracting good samples. In this paper, we propose a new sampling method called Weighted Edge Sampling. In this method, we give equal weight to all the edges in the beginning. During the sampling process, we sample an edge with the probability proportional to its weight. When an edge is sampled, we increase the weight of its neighboring edges and this increases their probability to be sampled. Our method extracts the neighborhood of a sampled edge more efficiently than previous approaches. We evaluate the efficacy of our sampling approach empirically using several real-world data sets and compare it with some of the previous approaches. We find that our method produces samples that better match the original graphs. We also calculate the Root Mean Square Error and Kolmogorov Smirnov distance to compare the results quantitatively.","PeriodicalId":43061,"journal":{"name":"International Journal of Data Mining Modelling and Management","volume":"6 1","pages":"0"},"PeriodicalIF":0.5000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Mining Modelling and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijdmmm.2023.10059714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 1

Abstract

Graph Sampling provides an efficient yet inexpensive solution for analyzing large graphs. While extracting small representative subgraphs from large graphs, the challenge is to capture the properties of the original graph. Several sampling algorithms have been proposed in previous studies, but they lack in extracting good samples. In this paper, we propose a new sampling method called Weighted Edge Sampling. In this method, we give equal weight to all the edges in the beginning. During the sampling process, we sample an edge with the probability proportional to its weight. When an edge is sampled, we increase the weight of its neighboring edges and this increases their probability to be sampled. Our method extracts the neighborhood of a sampled edge more efficiently than previous approaches. We evaluate the efficacy of our sampling approach empirically using several real-world data sets and compare it with some of the previous approaches. We find that our method produces samples that better match the original graphs. We also calculate the Root Mean Square Error and Kolmogorov Smirnov distance to compare the results quantitatively.

查看原文本刊更多论文

静态图的加权边缘采样

图采样为分析大型图提供了一种高效而廉价的解决方案。在从大图中提取具有代表性的小子图时，挑战在于捕获原始图的属性。在以往的研究中提出了几种采样算法，但它们都缺乏提取好的样本的能力。本文提出了一种新的采样方法——加权边缘采样。在这种方法中，我们一开始就给所有的边赋予相等的权重。在采样过程中，我们以与权值成比例的概率对边缘进行采样。当一条边被采样时，我们增加其相邻边的权重，这增加了它们被采样的概率。我们的方法比以前的方法更有效地提取采样边缘的邻域。我们使用几个真实世界的数据集来评估我们的抽样方法的有效性，并将其与之前的一些方法进行比较。我们发现我们的方法产生的样本与原始图更匹配。我们还计算了均方根误差和Kolmogorov - Smirnov距离来定量比较结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Data Mining Modelling and Management COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

1.10

自引率

0.00%

发文量

期刊介绍： Facilitating transformation from data to information to knowledge is paramount for organisations. Companies are flooded with data and conflicting information, but with limited real usable knowledge. However, rarely should a process be looked at from limited angles or in parts. Isolated islands of data mining, modelling and management (DMMM) should be connected. IJDMMM highlightes integration of DMMM, statistics/machine learning/databases, each element of data chain management, types of information, algorithms in software; from data pre-processing to post-processing; between theory and applications. Topics covered include: -Artificial intelligence- Biomedical science- Business analytics/intelligence, process modelling- Computer science, database management systems- Data management, mining, modelling, warehousing- Engineering- Environmental science, environment (ecoinformatics)- Information systems/technology, telecommunications/networking- Management science, operations research, mathematics/statistics- Social sciences- Business/economics, (computational) finance- Healthcare, medicine, pharmaceuticals- (Computational) chemistry, biology (bioinformatics)- Sustainable mobility systems, intelligent transportation systems- National security