ActiveReach: an active learning framework for approximate reachability query answering in large-scale graphs.

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Frontiers in Big Data Pub Date : 2024-11-19 eCollection Date: 2024-01-01 DOI:10.3389/fdata.2024.1427104

Zohreh Raghebi, Farnoush Banaei-Kashani

{"title":"ActiveReach: an active learning framework for approximate reachability query answering in large-scale graphs.","authors":"Zohreh Raghebi, Farnoush Banaei-Kashani","doi":"10.3389/fdata.2024.1427104","DOIUrl":null,"url":null,"abstract":"<p><p>With graph reachability query, one can answer whether there exists a path between two query vertices in a given graph. The existing reachability query processing solutions use traditional reachability index structures and can only compute exact answers, which may take a long time to resolve in large graphs. In contrast, with an approximate reachability query, one can offer a compromise by enabling users to strike a trade-off between query time and the accuracy of the query result. In this study, we propose a framework, dubbed ActiveReach, for learning index structures to answer approximate reachability query. ActiveReach is a two-phase framework that focuses on embedding nodes in a reachability space. In the first phase, we leverage node attributes and positional information to create reachability-aware embeddings for each node. These embeddings are then used as nodes' attributes in the second phase. In the second phase, we incorporate the new attributes and include reachability information as labels in the training data to generate embeddings in a reachability space. In addition, computing reachability for all training data may not be practical. Therefore, selecting a subset of data to compute reachability effectively and enhance reachability prediction performance is challenging. ActiveReach addresses this challenge by employing an active learning approach in the second phase to selectively compute reachability for a subset of node pairs, thus learning the approximate reachability for the entire graph. Our extensive experimental study with various real attributed large-scale graphs demonstrates the effectiveness of each component of our framework.</p>","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"7 ","pages":"1427104"},"PeriodicalIF":2.4000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11611874/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Big Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdata.2024.1427104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

With graph reachability query, one can answer whether there exists a path between two query vertices in a given graph. The existing reachability query processing solutions use traditional reachability index structures and can only compute exact answers, which may take a long time to resolve in large graphs. In contrast, with an approximate reachability query, one can offer a compromise by enabling users to strike a trade-off between query time and the accuracy of the query result. In this study, we propose a framework, dubbed ActiveReach, for learning index structures to answer approximate reachability query. ActiveReach is a two-phase framework that focuses on embedding nodes in a reachability space. In the first phase, we leverage node attributes and positional information to create reachability-aware embeddings for each node. These embeddings are then used as nodes' attributes in the second phase. In the second phase, we incorporate the new attributes and include reachability information as labels in the training data to generate embeddings in a reachability space. In addition, computing reachability for all training data may not be practical. Therefore, selecting a subset of data to compute reachability effectively and enhance reachability prediction performance is challenging. ActiveReach addresses this challenge by employing an active learning approach in the second phase to selectively compute reachability for a subset of node pairs, thus learning the approximate reachability for the entire graph. Our extensive experimental study with various real attributed large-scale graphs demonstrates the effectiveness of each component of our framework.

Abstract Image

查看原文本刊更多论文

ActiveReach：一个主动学习框架，用于大规模图中的近似可达性查询回答。

通过图可达性查询，可以回答给定图中两个查询顶点之间是否存在路径。现有的可达性查询处理方案使用传统的可达性索引结构，只能计算精确的答案，这可能需要很长时间才能在大型图中解析。相比之下，对于近似可达性查询，可以提供一种折衷方案，允许用户在查询时间和查询结果的准确性之间进行权衡。在本研究中，我们提出了一个名为ActiveReach的框架，用于学习索引结构来回答近似可达性查询。ActiveReach是一个两阶段框架，专注于在可达性空间中嵌入节点。在第一阶段，我们利用节点属性和位置信息为每个节点创建可达性感知嵌入。然后在第二阶段将这些嵌入用作节点的属性。在第二阶段，我们结合新的属性，并将可达性信息作为标签包含在训练数据中，以在可达性空间中生成嵌入。此外，计算所有训练数据的可达性可能不切实际。因此，选择一个数据子集来有效地计算可达性并提高可达性预测性能是一个挑战。ActiveReach通过在第二阶段采用主动学习方法来解决这一挑战，有选择地计算节点对子集的可达性，从而学习整个图的近似可达性。我们对各种真实属性大规模图进行了广泛的实验研究，证明了我们框架的每个组成部分的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊