Delong Ma;Ye Yuan;Yanfeng Zhang;Chunze Cao;Yuliang Ma
{"title":"地理分布数据中心上的成本意识三角计数","authors":"Delong Ma;Ye Yuan;Yanfeng Zhang;Chunze Cao;Yuliang Ma","doi":"10.1109/TBDATA.2024.3522816","DOIUrl":null,"url":null,"abstract":"Counting triangles is an important topic in many practical applications, such as anomaly detection, community search, and recommendation systems. For triangle counting in large and dynamic graphs, recent work has focused on distributed streaming algorithms. These works assume that the graph is processed in the same location, while in reality, the graph stream may be generated and processed in datacenters that are geographically distributed. This raises new challenges to existing triangle counting algorithms, due to the multi-level heterogeneities in network bandwidth and communication prices in geo-distributed datacenters. In this article, we propose a cost-aware framework named <inline-formula><tex-math>${\\sf GeoTri}$</tex-math></inline-formula> based on the Master-Worker-Aggregator architecture, which takes both the cost and performance objectives into consideration for triangle counting in geo-distributed datacenters. The two core parts of this framework are the cost-aware nodes assignment strategy in master, which is critical to obtain node's position and distribute edges reasonably to reduce the cost (i.e., time cost and monetary cost), and cost-aware neighbor transfer strategy among workers, which further eliminates redundancy in data transfers. Additionally, we conduct extensive experiments on seven real-world graphs, and the results demonstrate that <inline-formula><tex-math>${\\sf GeoTri}$</tex-math></inline-formula> significantly lowers both runtime and monetary cost while exhibiting nice accuracy and scalability.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 4","pages":"2008-2024"},"PeriodicalIF":5.7000,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cost-Aware Triangle Counting Over Geo-Distributed Datacenters\",\"authors\":\"Delong Ma;Ye Yuan;Yanfeng Zhang;Chunze Cao;Yuliang Ma\",\"doi\":\"10.1109/TBDATA.2024.3522816\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Counting triangles is an important topic in many practical applications, such as anomaly detection, community search, and recommendation systems. For triangle counting in large and dynamic graphs, recent work has focused on distributed streaming algorithms. These works assume that the graph is processed in the same location, while in reality, the graph stream may be generated and processed in datacenters that are geographically distributed. This raises new challenges to existing triangle counting algorithms, due to the multi-level heterogeneities in network bandwidth and communication prices in geo-distributed datacenters. In this article, we propose a cost-aware framework named <inline-formula><tex-math>${\\\\sf GeoTri}$</tex-math></inline-formula> based on the Master-Worker-Aggregator architecture, which takes both the cost and performance objectives into consideration for triangle counting in geo-distributed datacenters. The two core parts of this framework are the cost-aware nodes assignment strategy in master, which is critical to obtain node's position and distribute edges reasonably to reduce the cost (i.e., time cost and monetary cost), and cost-aware neighbor transfer strategy among workers, which further eliminates redundancy in data transfers. Additionally, we conduct extensive experiments on seven real-world graphs, and the results demonstrate that <inline-formula><tex-math>${\\\\sf GeoTri}$</tex-math></inline-formula> significantly lowers both runtime and monetary cost while exhibiting nice accuracy and scalability.\",\"PeriodicalId\":13106,\"journal\":{\"name\":\"IEEE Transactions on Big Data\",\"volume\":\"11 4\",\"pages\":\"2008-2024\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2024-12-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Big Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10816294/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10816294/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Cost-Aware Triangle Counting Over Geo-Distributed Datacenters
Counting triangles is an important topic in many practical applications, such as anomaly detection, community search, and recommendation systems. For triangle counting in large and dynamic graphs, recent work has focused on distributed streaming algorithms. These works assume that the graph is processed in the same location, while in reality, the graph stream may be generated and processed in datacenters that are geographically distributed. This raises new challenges to existing triangle counting algorithms, due to the multi-level heterogeneities in network bandwidth and communication prices in geo-distributed datacenters. In this article, we propose a cost-aware framework named ${\sf GeoTri}$ based on the Master-Worker-Aggregator architecture, which takes both the cost and performance objectives into consideration for triangle counting in geo-distributed datacenters. The two core parts of this framework are the cost-aware nodes assignment strategy in master, which is critical to obtain node's position and distribute edges reasonably to reduce the cost (i.e., time cost and monetary cost), and cost-aware neighbor transfer strategy among workers, which further eliminates redundancy in data transfers. Additionally, we conduct extensive experiments on seven real-world graphs, and the results demonstrate that ${\sf GeoTri}$ significantly lowers both runtime and monetary cost while exhibiting nice accuracy and scalability.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.