Accelerating complex graph queries by summary-based hybrid partitioning for discovering vulnerabilities of distribution equipment

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-02-13 DOI:10.1016/j.future.2025.107747

Qiong Wang , Wei He , Shang Yang , Ruoyu Zhao , Yinglong Ma

{"title":"Accelerating complex graph queries by summary-based hybrid partitioning for discovering vulnerabilities of distribution equipment","authors":"Qiong Wang , Wei He , Shang Yang , Ruoyu Zhao , Yinglong Ma","doi":"10.1016/j.future.2025.107747","DOIUrl":null,"url":null,"abstract":"<div><div>With the high proportion of electrical and electronic devices in China’s power grids, massive graph data of power distribution equipment has been accumulated to share the knowledge across heterogeneous information, while the vulnerabilities of power devices consequently trigger new security risks to the power grid. It is crucial to swiftly and accurately discover the intrinsic vulnerabilities of power devices from the massive power distribution graph data for ensuring safe operation of the power grid. However, diverse complex queries make it inefficient to achieve consistent graph querying performance over the massive power graph data for swift and accurate vulnerability discovery in a highly available and user-friendly manner. To handle the aforementioned problem, in this paper, we present a power graph query-oriented pipeline framework to consistently accelerate complex graph queries over the massive graph data of power distribution equipment for efficient vulnerability discovery. First, we propose a lossless graph summarization method, through which a summary graph is produced from the raw graph data. Second, very different from existing methods, we propose a two-stage hybrid partitioning including the binary partitioning and the consequent ternary partitioning, which is conducted based on the summary graph instead of the raw graph for reducing the search scope and minimizing the input of the queried data, thereby accelerating the query. Third, the complex graph query with multiple triplet patterns will be automatically translated into the Spark SQL statement for query execution without users’ interference, through which the accurate results will be obtained by recovering the summary-based intermediate results. At last, extensive experiments were made over four datasets against some state-of-the-art methods, and the results show that our approach is very competitive with these approaches and achieves consistent graph querying performance in accelerating complex graph queries while obtaining accurate results.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"167 ","pages":"Article 107747"},"PeriodicalIF":6.2000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25000421","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

With the high proportion of electrical and electronic devices in China’s power grids, massive graph data of power distribution equipment has been accumulated to share the knowledge across heterogeneous information, while the vulnerabilities of power devices consequently trigger new security risks to the power grid. It is crucial to swiftly and accurately discover the intrinsic vulnerabilities of power devices from the massive power distribution graph data for ensuring safe operation of the power grid. However, diverse complex queries make it inefficient to achieve consistent graph querying performance over the massive power graph data for swift and accurate vulnerability discovery in a highly available and user-friendly manner. To handle the aforementioned problem, in this paper, we present a power graph query-oriented pipeline framework to consistently accelerate complex graph queries over the massive graph data of power distribution equipment for efficient vulnerability discovery. First, we propose a lossless graph summarization method, through which a summary graph is produced from the raw graph data. Second, very different from existing methods, we propose a two-stage hybrid partitioning including the binary partitioning and the consequent ternary partitioning, which is conducted based on the summary graph instead of the raw graph for reducing the search scope and minimizing the input of the queried data, thereby accelerating the query. Third, the complex graph query with multiple triplet patterns will be automatically translated into the Spark SQL statement for query execution without users’ interference, through which the accurate results will be obtained by recovering the summary-based intermediate results. At last, extensive experiments were made over four datasets against some state-of-the-art methods, and the results show that our approach is very competitive with these approaches and achieves consistent graph querying performance in accelerating complex graph queries while obtaining accurate results.

查看原文本刊更多论文

基于摘要的混合分区加速复杂图查询，发现配电设备漏洞

中国电网中电气和电子设备占比较高，积累了大量配电设备的图形数据，实现了跨异构信息的知识共享，同时电力设备的漏洞也给电网带来了新的安全风险。从海量配电图数据中快速、准确地发现电力设备的内在漏洞，是保障电网安全运行的关键。然而，复杂查询的多样性使得在海量功率图数据上实现一致的图形查询性能，以高可用性和用户友好的方式快速准确地发现漏洞的效率低下。针对上述问题，本文提出了一种面向功率图查询的管道框架，以持续加速对配电设备海量图形数据的复杂图形查询，从而高效发现漏洞。首先，我们提出了一种无损的图形汇总方法，通过该方法将原始图形数据生成汇总图。其次，与现有方法不同的是，我们提出了一种基于汇总图而不是原始图的两阶段混合分区，包括二进制分区和随后的三元分区，以缩小搜索范围和最小化查询数据的输入，从而加快查询速度。第三，将具有多个三元模式的复杂图查询自动转换为Spark SQL语句执行查询，不受用户干扰，通过恢复基于摘要的中间结果获得准确的查询结果。最后，在4个数据集上对一些最先进的方法进行了大量的实验，结果表明我们的方法与这些方法相比具有很强的竞争力，并且在加速复杂图查询的同时获得准确的结果，取得了一致的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.