TgStore: An Efficient Storage System for Large Time-Evolving Graphs

IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Yongli Cheng;Yan Ma;Hong Jiang;Lingfang Zeng;Fang Wang;Xianghao Xu;Yuhang Wu
{"title":"TgStore: An Efficient Storage System for Large Time-Evolving Graphs","authors":"Yongli Cheng;Yan Ma;Hong Jiang;Lingfang Zeng;Fang Wang;Xianghao Xu;Yuhang Wu","doi":"10.1109/TBDATA.2024.3366087","DOIUrl":null,"url":null,"abstract":"Existing graph systems focus mainly on the execution efficiency of the graph analysis tasks, often ignoring the importance and efficiency of time-evolving graph storage. However, to effectively mine the potential application values, an efficient storage system is important for time-evolving graphs whose storage requirement scales with the increasing number of snapshots. Storage cost and snapshot access speed are the two most important performance indicators for a time-evolving graph storage system, which are challenging for designers of such systems because they are conflicting goals. In this article, we address these challenges by proposing an efficient storage scheme for the large time-evolving graphs. We first design a \n<italic>Snapshot-level Data Deduplication (SLDD)</i>\n strategy to eliminate the large number of repeated vertices and edges among the snapshots, and then a \n<italic>Structure-Changing Graph Representation (SCGR)</i>\n to significantly improve the snapshot access speed. We implement an efficient time-evolving graph storage system, TgStore, based on this scheme to effectively store large-scale time-evolving graphs, aiming to efficiently support the time-evolving graph analysis tasks. Experimental results show that TgStore can obtain a high compression ratio of 43.03:1 when storing 100 snapshots of Twitter, while with an average snapshot access speedup of 16×. Efficient storage scheme enables TgStore to efficiently support time-evolving graph algorithms. For example, when executing the Pagerank algorithm on the time-evolving graph of Twitter, TgStore outperforms Graphone, a state-of-the-art time-evolving graph storage system, by 15.9× in algorithm execution speed and 1.45× in memory usage.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 2","pages":"158-173"},"PeriodicalIF":7.5000,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10436166/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Existing graph systems focus mainly on the execution efficiency of the graph analysis tasks, often ignoring the importance and efficiency of time-evolving graph storage. However, to effectively mine the potential application values, an efficient storage system is important for time-evolving graphs whose storage requirement scales with the increasing number of snapshots. Storage cost and snapshot access speed are the two most important performance indicators for a time-evolving graph storage system, which are challenging for designers of such systems because they are conflicting goals. In this article, we address these challenges by proposing an efficient storage scheme for the large time-evolving graphs. We first design a Snapshot-level Data Deduplication (SLDD) strategy to eliminate the large number of repeated vertices and edges among the snapshots, and then a Structure-Changing Graph Representation (SCGR) to significantly improve the snapshot access speed. We implement an efficient time-evolving graph storage system, TgStore, based on this scheme to effectively store large-scale time-evolving graphs, aiming to efficiently support the time-evolving graph analysis tasks. Experimental results show that TgStore can obtain a high compression ratio of 43.03:1 when storing 100 snapshots of Twitter, while with an average snapshot access speedup of 16×. Efficient storage scheme enables TgStore to efficiently support time-evolving graph algorithms. For example, when executing the Pagerank algorithm on the time-evolving graph of Twitter, TgStore outperforms Graphone, a state-of-the-art time-evolving graph storage system, by 15.9× in algorithm execution speed and 1.45× in memory usage.
TgStore:大型时间演化图的高效存储系统
现有的图形系统主要关注图形分析任务的执行效率,往往忽视了随时间变化的图形存储的重要性和效率。然而,为了有效挖掘潜在的应用价值,高效的存储系统对于随时间变化的图来说非常重要,因为随快照数量的增加,存储需求也会随之增加。存储成本和快照访问速度是时间演化图存储系统的两个最重要的性能指标,这两个指标对于此类系统的设计者来说具有挑战性,因为它们是相互冲突的目标。在本文中,我们针对这些挑战,提出了一种针对大型时间演化图的高效存储方案。我们首先设计了一种快照级重复数据删除(SLDD)策略来消除快照中大量重复的顶点和边,然后设计了一种结构变化图表示(SCGR)来显著提高快照访问速度。在此基础上,我们实现了高效的时间演化图存储系统 TgStore,以有效存储大规模时间演化图,从而高效地支持时间演化图分析任务。实验结果表明,当存储100个Twitter快照时,TgStore可以获得43.03:1的高压缩比,同时快照平均访问速度提高了16倍。高效的存储方案使TgStore能够有效地支持时间演进图算法。例如,在Twitter的时间演化图上执行Pagerank算法时,TgStore的算法执行速度和内存使用量分别比最先进的时间演化图存储系统Graphone快15.9倍和1.45倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
11.80
自引率
2.80%
发文量
114
期刊介绍: The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信