Weighted edge sampling for static graphs

IF 0.5 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Data Mining Modelling and Management Pub Date : 2023-01-01 DOI:10.1504/ijdmmm.2023.134612

Muhammad Irfan Yousuf, Raheel Anwar

{"title":"Weighted edge sampling for static graphs","authors":"Muhammad Irfan Yousuf, Raheel Anwar","doi":"10.1504/ijdmmm.2023.134612","DOIUrl":null,"url":null,"abstract":"Graph sampling provides an efficient yet inexpensive solution for analysing large graphs. The purpose of sampling a graph is to extract a small representative subgraph from a big graph so that the sample can be used in place of the big graph for studying and analysing it. In this paper, we propose a new sampling method called weighted edge sampling. In this method, we give equal weight to all the edges in the beginning. During the sampling process, we sample an edge with the probability proportional to its weight. When an edge is sampled, we increase the weight of its neighbouring edges and this increases their probability to be sampled. Our method extracts the neighbourhood of a sampled edge more efficiently than previous approaches. We evaluate the efficacy of our sampling approach empirically using several real-world datasets. We find that our method produces better samples than the previous approaches. Our results show that our samples better estimate the degree and path length of the original graphs whereas our samples are less efficient in estimating the clustering coefficient of a graph.","PeriodicalId":43061,"journal":{"name":"International Journal of Data Mining Modelling and Management","volume":"21 1","pages":"0"},"PeriodicalIF":0.5000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Mining Modelling and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijdmmm.2023.134612","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Graph sampling provides an efficient yet inexpensive solution for analysing large graphs. The purpose of sampling a graph is to extract a small representative subgraph from a big graph so that the sample can be used in place of the big graph for studying and analysing it. In this paper, we propose a new sampling method called weighted edge sampling. In this method, we give equal weight to all the edges in the beginning. During the sampling process, we sample an edge with the probability proportional to its weight. When an edge is sampled, we increase the weight of its neighbouring edges and this increases their probability to be sampled. Our method extracts the neighbourhood of a sampled edge more efficiently than previous approaches. We evaluate the efficacy of our sampling approach empirically using several real-world datasets. We find that our method produces better samples than the previous approaches. Our results show that our samples better estimate the degree and path length of the original graphs whereas our samples are less efficient in estimating the clustering coefficient of a graph.

查看原文本刊更多论文

静态图的加权边缘采样

图采样为分析大型图提供了一种高效而廉价的解决方案。对一个图进行抽样的目的是从一个大图中提取一个小的有代表性的子图，这样这个样本就可以代替大图进行研究和分析。本文提出了一种新的采样方法——加权边缘采样。在这种方法中，我们一开始就给所有的边赋予相等的权重。在采样过程中，我们以与权值成比例的概率对边缘进行采样。当采样一条边时，我们增加其相邻边的权重，这增加了它们被采样的概率。我们的方法比以前的方法更有效地提取采样边缘的邻域。我们使用几个真实世界的数据集来评估我们的抽样方法的有效性。我们发现我们的方法比以前的方法产生更好的样本。我们的结果表明，我们的样本更好地估计了原始图的程度和路径长度，而我们的样本在估计图的聚类系数方面效率较低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Data Mining Modelling and Management COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

1.10

自引率

0.00%

发文量

期刊介绍： Facilitating transformation from data to information to knowledge is paramount for organisations. Companies are flooded with data and conflicting information, but with limited real usable knowledge. However, rarely should a process be looked at from limited angles or in parts. Isolated islands of data mining, modelling and management (DMMM) should be connected. IJDMMM highlightes integration of DMMM, statistics/machine learning/databases, each element of data chain management, types of information, algorithms in software; from data pre-processing to post-processing; between theory and applications. Topics covered include: -Artificial intelligence- Biomedical science- Business analytics/intelligence, process modelling- Computer science, database management systems- Data management, mining, modelling, warehousing- Engineering- Environmental science, environment (ecoinformatics)- Information systems/technology, telecommunications/networking- Management science, operations research, mathematics/statistics- Social sciences- Business/economics, (computational) finance- Healthcare, medicine, pharmaceuticals- (Computational) chemistry, biology (bioinformatics)- Sustainable mobility systems, intelligent transportation systems- National security