{"title":"Weighted Edge Sampling for Static Graphs","authors":"Muhammad Irfan Yousuf, Raheel Anwar","doi":"10.1504/ijdmmm.2023.10059714","DOIUrl":null,"url":null,"abstract":"Graph Sampling provides an efficient yet inexpensive solution for analyzing large graphs. While extracting small representative subgraphs from large graphs, the challenge is to capture the properties of the original graph. Several sampling algorithms have been proposed in previous studies, but they lack in extracting good samples. In this paper, we propose a new sampling method called Weighted Edge Sampling. In this method, we give equal weight to all the edges in the beginning. During the sampling process, we sample an edge with the probability proportional to its weight. When an edge is sampled, we increase the weight of its neighboring edges and this increases their probability to be sampled. Our method extracts the neighborhood of a sampled edge more efficiently than previous approaches. We evaluate the efficacy of our sampling approach empirically using several real-world data sets and compare it with some of the previous approaches. We find that our method produces samples that better match the original graphs. We also calculate the Root Mean Square Error and Kolmogorov Smirnov distance to compare the results quantitatively.","PeriodicalId":43061,"journal":{"name":"International Journal of Data Mining Modelling and Management","volume":"6 1","pages":"0"},"PeriodicalIF":0.4000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Mining Modelling and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijdmmm.2023.10059714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 1
Abstract
Graph Sampling provides an efficient yet inexpensive solution for analyzing large graphs. While extracting small representative subgraphs from large graphs, the challenge is to capture the properties of the original graph. Several sampling algorithms have been proposed in previous studies, but they lack in extracting good samples. In this paper, we propose a new sampling method called Weighted Edge Sampling. In this method, we give equal weight to all the edges in the beginning. During the sampling process, we sample an edge with the probability proportional to its weight. When an edge is sampled, we increase the weight of its neighboring edges and this increases their probability to be sampled. Our method extracts the neighborhood of a sampled edge more efficiently than previous approaches. We evaluate the efficacy of our sampling approach empirically using several real-world data sets and compare it with some of the previous approaches. We find that our method produces samples that better match the original graphs. We also calculate the Root Mean Square Error and Kolmogorov Smirnov distance to compare the results quantitatively.
期刊介绍:
Facilitating transformation from data to information to knowledge is paramount for organisations. Companies are flooded with data and conflicting information, but with limited real usable knowledge. However, rarely should a process be looked at from limited angles or in parts. Isolated islands of data mining, modelling and management (DMMM) should be connected. IJDMMM highlightes integration of DMMM, statistics/machine learning/databases, each element of data chain management, types of information, algorithms in software; from data pre-processing to post-processing; between theory and applications. Topics covered include: -Artificial intelligence- Biomedical science- Business analytics/intelligence, process modelling- Computer science, database management systems- Data management, mining, modelling, warehousing- Engineering- Environmental science, environment (ecoinformatics)- Information systems/technology, telecommunications/networking- Management science, operations research, mathematics/statistics- Social sciences- Business/economics, (computational) finance- Healthcare, medicine, pharmaceuticals- (Computational) chemistry, biology (bioinformatics)- Sustainable mobility systems, intelligent transportation systems- National security