G-Thinkerq：一个基于统一任务规划模型的通用子图查询系统

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-03-01 DOI:10.1109/TKDE.2025.3537964

Lyuheng Yuan;Guimu Guo;Da Yan;Saugat Adhikari;Jalal Khalil;Cheng Long;Lei Zou

{"title":"G-Thinkerq：一个基于统一任务规划模型的通用子图查询系统","authors":"Lyuheng Yuan;Guimu Guo;Da Yan;Saugat Adhikari;Jalal Khalil;Cheng Long;Lei Zou","doi":"10.1109/TKDE.2025.3537964","DOIUrl":null,"url":null,"abstract":"Given a large graph <inline-formula><tex-math>$G$</tex-math></inline-formula>, a subgraph query <inline-formula><tex-math>$Q$</tex-math></inline-formula> finds the set of all subgraphs of <inline-formula><tex-math>$G$</tex-math></inline-formula> that satisfy certain conditions specified by <inline-formula><tex-math>$Q$</tex-math></inline-formula>. Examples of subgraph queries including finding a community containing designated members to organize an event, and subgraph matching. To overcome the weakness of existing graph-parallel systems that underutilize CPU cores when finding subgraphs, our prior system, G-thinker, was proposed that adopts a novel think-like-a-task (TLAT) parallel programming model. However, G-thinker targets offline analytics and cannot support interactive online querying where users continually submit subgraph queries with different query contents. The challenges here are (i) how to maintain fairness that queries are answered in the order that they are received: a later query is processed only if earlier queries cannot saturate the available computation resources; (ii) how to track the progress of active queries (each with many tasks under computation) so that users can be timely notified as soon as a query completes; and (iii) how to maintain memory boundedness and high task concurrency as in G-thinker. In this article, we propose a novel TLAT programming framework, called G-thinkerQ, for answering online subgraph queries. G-thinkerQ inherits the memory boundedness and high task concurrency of G-thinker by organizing the tasks of each query using a “task capsule” structure, and designs a novel task-capsule list is to ensure fairness among queries. A novel lineage-based mechanism is also designed to keep track of when the last task of a query is completed. Parallel counterparts of the state-of-the-art algorithms for 4 recent advanced subgraph queries are implemented on G-thinkerQ to demonstrate its CPU-scalability.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3429-3444"},"PeriodicalIF":10.4000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"G-Thinkerq: A General Subgraph Querying System With a Unified Task-Based Programming Model\",\"authors\":\"Lyuheng Yuan;Guimu Guo;Da Yan;Saugat Adhikari;Jalal Khalil;Cheng Long;Lei Zou\",\"doi\":\"10.1109/TKDE.2025.3537964\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given a large graph <inline-formula><tex-math>$G$</tex-math></inline-formula>, a subgraph query <inline-formula><tex-math>$Q$</tex-math></inline-formula> finds the set of all subgraphs of <inline-formula><tex-math>$G$</tex-math></inline-formula> that satisfy certain conditions specified by <inline-formula><tex-math>$Q$</tex-math></inline-formula>. Examples of subgraph queries including finding a community containing designated members to organize an event, and subgraph matching. To overcome the weakness of existing graph-parallel systems that underutilize CPU cores when finding subgraphs, our prior system, G-thinker, was proposed that adopts a novel think-like-a-task (TLAT) parallel programming model. However, G-thinker targets offline analytics and cannot support interactive online querying where users continually submit subgraph queries with different query contents. The challenges here are (i) how to maintain fairness that queries are answered in the order that they are received: a later query is processed only if earlier queries cannot saturate the available computation resources; (ii) how to track the progress of active queries (each with many tasks under computation) so that users can be timely notified as soon as a query completes; and (iii) how to maintain memory boundedness and high task concurrency as in G-thinker. In this article, we propose a novel TLAT programming framework, called G-thinkerQ, for answering online subgraph queries. G-thinkerQ inherits the memory boundedness and high task concurrency of G-thinker by organizing the tasks of each query using a “task capsule” structure, and designs a novel task-capsule list is to ensure fairness among queries. A novel lineage-based mechanism is also designed to keep track of when the last task of a query is completed. Parallel counterparts of the state-of-the-art algorithms for 4 recent advanced subgraph queries are implemented on G-thinkerQ to demonstrate its CPU-scalability.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"37 6\",\"pages\":\"3429-3444\"},\"PeriodicalIF\":10.4000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10981840/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10981840/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

给定一个大图$G$，子图查询$Q$找到$G$的所有子图的集合，这些子图满足由$Q$指定的某些条件。子图查询的示例包括查找包含指定成员的社区以组织事件，以及子图匹配。为了克服现有图并行系统在寻找子图时CPU内核利用率不足的缺点，我们提出了一种新的类似任务思考（TLAT）并行编程模型。然而，G-thinker的目标是离线分析，不支持交互式在线查询，在这种情况下，用户不断提交具有不同查询内容的子图查询。这里的挑战是(i)如何保持查询按照接收到的顺序被回答的公平性：只有在先前的查询不能使可用的计算资源饱和的情况下，才处理后面的查询；（ii）如何追踪正在进行的查询的进度（每个查询都有许多任务在计算中），以便在查询完成时及时通知用户；（iii）如何像G-thinker那样保持内存有界性和高任务并发性。在本文中，我们提出了一个新的TLAT编程框架，称为G-thinkerQ，用于回答在线子图查询。G-thinkerQ继承了G-thinker的内存有边界性和高任务并发性，采用“任务胶囊”结构组织每个查询的任务，并设计了一种新颖的任务胶囊列表，以确保查询之间的公平性。还设计了一种新的基于继承的机制来跟踪查询的最后一个任务何时完成。在G-thinkerQ上实现了4个最新高级子图查询的最先进算法的并行对等体，以演示其cpu可伸缩性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

G-Thinkerq: A General Subgraph Querying System With a Unified Task-Based Programming Model

Given a large graph

$G$

, a subgraph query

$Q$

finds the set of all subgraphs of

$G$

that satisfy certain conditions specified by

$Q$

. Examples of subgraph queries including finding a community containing designated members to organize an event, and subgraph matching. To overcome the weakness of existing graph-parallel systems that underutilize CPU cores when finding subgraphs, our prior system, G-thinker, was proposed that adopts a novel think-like-a-task (TLAT) parallel programming model. However, G-thinker targets offline analytics and cannot support interactive online querying where users continually submit subgraph queries with different query contents. The challenges here are (i) how to maintain fairness that queries are answered in the order that they are received: a later query is processed only if earlier queries cannot saturate the available computation resources; (ii) how to track the progress of active queries (each with many tasks under computation) so that users can be timely notified as soon as a query completes; and (iii) how to maintain memory boundedness and high task concurrency as in G-thinker. In this article, we propose a novel TLAT programming framework, called G-thinkerQ, for answering online subgraph queries. G-thinkerQ inherits the memory boundedness and high task concurrency of G-thinker by organizing the tasks of each query using a “task capsule” structure, and designs a novel task-capsule list is to ensure fairness among queries. A novel lineage-based mechanism is also designed to keep track of when the last task of a query is completed. Parallel counterparts of the state-of-the-art algorithms for 4 recent advanced subgraph queries are implemented on G-thinkerQ to demonstrate its CPU-scalability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.