Lyuheng Yuan;Guimu Guo;Da Yan;Saugat Adhikari;Jalal Khalil;Cheng Long;Lei Zou
{"title":"G-Thinkerq:一个基于统一任务规划模型的通用子图查询系统","authors":"Lyuheng Yuan;Guimu Guo;Da Yan;Saugat Adhikari;Jalal Khalil;Cheng Long;Lei Zou","doi":"10.1109/TKDE.2025.3537964","DOIUrl":null,"url":null,"abstract":"Given a large graph <inline-formula><tex-math>$G$</tex-math></inline-formula>, a subgraph query <inline-formula><tex-math>$Q$</tex-math></inline-formula> finds the set of all subgraphs of <inline-formula><tex-math>$G$</tex-math></inline-formula> that satisfy certain conditions specified by <inline-formula><tex-math>$Q$</tex-math></inline-formula>. Examples of subgraph queries including finding a community containing designated members to organize an event, and subgraph matching. To overcome the weakness of existing graph-parallel systems that underutilize CPU cores when finding subgraphs, our prior system, G-thinker, was proposed that adopts a novel think-like-a-task (TLAT) parallel programming model. However, G-thinker targets offline analytics and cannot support interactive online querying where users continually submit subgraph queries with different query contents. The challenges here are (i) how to maintain fairness that queries are answered in the order that they are received: a later query is processed only if earlier queries cannot saturate the available computation resources; (ii) how to track the progress of active queries (each with many tasks under computation) so that users can be timely notified as soon as a query completes; and (iii) how to maintain memory boundedness and high task concurrency as in G-thinker. In this article, we propose a novel TLAT programming framework, called G-thinkerQ, for answering online subgraph queries. G-thinkerQ inherits the memory boundedness and high task concurrency of G-thinker by organizing the tasks of each query using a “task capsule” structure, and designs a novel task-capsule list is to ensure fairness among queries. A novel lineage-based mechanism is also designed to keep track of when the last task of a query is completed. Parallel counterparts of the state-of-the-art algorithms for 4 recent advanced subgraph queries are implemented on G-thinkerQ to demonstrate its CPU-scalability.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3429-3444"},"PeriodicalIF":8.9000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"G-Thinkerq: A General Subgraph Querying System With a Unified Task-Based Programming Model\",\"authors\":\"Lyuheng Yuan;Guimu Guo;Da Yan;Saugat Adhikari;Jalal Khalil;Cheng Long;Lei Zou\",\"doi\":\"10.1109/TKDE.2025.3537964\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given a large graph <inline-formula><tex-math>$G$</tex-math></inline-formula>, a subgraph query <inline-formula><tex-math>$Q$</tex-math></inline-formula> finds the set of all subgraphs of <inline-formula><tex-math>$G$</tex-math></inline-formula> that satisfy certain conditions specified by <inline-formula><tex-math>$Q$</tex-math></inline-formula>. Examples of subgraph queries including finding a community containing designated members to organize an event, and subgraph matching. To overcome the weakness of existing graph-parallel systems that underutilize CPU cores when finding subgraphs, our prior system, G-thinker, was proposed that adopts a novel think-like-a-task (TLAT) parallel programming model. However, G-thinker targets offline analytics and cannot support interactive online querying where users continually submit subgraph queries with different query contents. The challenges here are (i) how to maintain fairness that queries are answered in the order that they are received: a later query is processed only if earlier queries cannot saturate the available computation resources; (ii) how to track the progress of active queries (each with many tasks under computation) so that users can be timely notified as soon as a query completes; and (iii) how to maintain memory boundedness and high task concurrency as in G-thinker. In this article, we propose a novel TLAT programming framework, called G-thinkerQ, for answering online subgraph queries. G-thinkerQ inherits the memory boundedness and high task concurrency of G-thinker by organizing the tasks of each query using a “task capsule” structure, and designs a novel task-capsule list is to ensure fairness among queries. A novel lineage-based mechanism is also designed to keep track of when the last task of a query is completed. Parallel counterparts of the state-of-the-art algorithms for 4 recent advanced subgraph queries are implemented on G-thinkerQ to demonstrate its CPU-scalability.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"37 6\",\"pages\":\"3429-3444\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10981840/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10981840/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
G-Thinkerq: A General Subgraph Querying System With a Unified Task-Based Programming Model
Given a large graph $G$, a subgraph query $Q$ finds the set of all subgraphs of $G$ that satisfy certain conditions specified by $Q$. Examples of subgraph queries including finding a community containing designated members to organize an event, and subgraph matching. To overcome the weakness of existing graph-parallel systems that underutilize CPU cores when finding subgraphs, our prior system, G-thinker, was proposed that adopts a novel think-like-a-task (TLAT) parallel programming model. However, G-thinker targets offline analytics and cannot support interactive online querying where users continually submit subgraph queries with different query contents. The challenges here are (i) how to maintain fairness that queries are answered in the order that they are received: a later query is processed only if earlier queries cannot saturate the available computation resources; (ii) how to track the progress of active queries (each with many tasks under computation) so that users can be timely notified as soon as a query completes; and (iii) how to maintain memory boundedness and high task concurrency as in G-thinker. In this article, we propose a novel TLAT programming framework, called G-thinkerQ, for answering online subgraph queries. G-thinkerQ inherits the memory boundedness and high task concurrency of G-thinker by organizing the tasks of each query using a “task capsule” structure, and designs a novel task-capsule list is to ensure fairness among queries. A novel lineage-based mechanism is also designed to keep track of when the last task of a query is completed. Parallel counterparts of the state-of-the-art algorithms for 4 recent advanced subgraph queries are implemented on G-thinkerQ to demonstrate its CPU-scalability.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.