Dynamic Diverse Summarisation in Heterogeneous Graph Streams: a Comparison between Thesaurus/Ontology-based and Embeddings-based Approaches

International Journal of Graph Computing Pub Date : 2020-03-01 DOI:10.35708/gc1868-126724

Niki Pavlopoulou

{"title":"Dynamic Diverse Summarisation in Heterogeneous Graph Streams: a Comparison between Thesaurus/Ontology-based and Embeddings-based Approaches","authors":"Niki Pavlopoulou","doi":"10.35708/gc1868-126724","DOIUrl":null,"url":null,"abstract":"Nowadays, there is a lot of attention drawn in smart environments, like Smart Cities and Internet of Things. These environments generate data streams that could be represented as graphs, which can be analysed in real-time to satisfy user or application needs. The challenges involved in these environments, ranging from the dynamism, heterogeneity, continuity, and high-volume of these real-world graph streams\ncreate new requirements for graph processing algorithms. We propose\na dynamic graph stream summarisation system with the use of embeddings that provides expressive graphs while ensuring high usability and limited resource usage. In this paper, we examine the performance comparison between our embeddings-based approach and an existing thesaurus/ontology-based approach (FACES) that we adapted in a dynamic environment with the use of windows and data fusion. Both\napproaches use conceptual clustering and top-k scoring that can result\nin expressive, dynamic graph summaries with limited resources. Evaluations show that sending top-k fused diverse summaries, results in 34%\nto 92% reduction of forwarded messages and redundancy-awareness with\nan F-score ranging from 0.80 to 0.95 depending on the k compared to\nsending all the available information without top-k scoring. Also, the\nsummaries' quality follows the agreement of ideal summaries determined\nby human judges. The summarisation approaches come with the expense\nof reduced system performance. The thesaurus/ontology-based approach\nproved 6 times more latency-heavy and 3 times more memory-heavy compared to the most expensive embeddings-based approach while having\nlower throughput but provided slightly better quality summaries.","PeriodicalId":121183,"journal":{"name":"International Journal of Graph Computing","volume":"318 6","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Graph Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35708/gc1868-126724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Nowadays, there is a lot of attention drawn in smart environments, like Smart Cities and Internet of Things. These environments generate data streams that could be represented as graphs, which can be analysed in real-time to satisfy user or application needs. The challenges involved in these environments, ranging from the dynamism, heterogeneity, continuity, and high-volume of these real-world graph streams create new requirements for graph processing algorithms. We propose a dynamic graph stream summarisation system with the use of embeddings that provides expressive graphs while ensuring high usability and limited resource usage. In this paper, we examine the performance comparison between our embeddings-based approach and an existing thesaurus/ontology-based approach (FACES) that we adapted in a dynamic environment with the use of windows and data fusion. Both approaches use conceptual clustering and top-k scoring that can result in expressive, dynamic graph summaries with limited resources. Evaluations show that sending top-k fused diverse summaries, results in 34% to 92% reduction of forwarded messages and redundancy-awareness with an F-score ranging from 0.80 to 0.95 depending on the k compared to sending all the available information without top-k scoring. Also, the summaries' quality follows the agreement of ideal summaries determined by human judges. The summarisation approaches come with the expense of reduced system performance. The thesaurus/ontology-based approach proved 6 times more latency-heavy and 3 times more memory-heavy compared to the most expensive embeddings-based approach while having lower throughput but provided slightly better quality summaries.

查看原文本刊更多论文

异构图流中的动态多样化摘要：基于词库/本体和基于嵌入的方法之间的比较

如今，智能城市和物联网等智能环境备受关注。这些环境产生的数据流可以表示为图形，可以对其进行实时分析，以满足用户或应用程序的需求。这些环境所面临的挑战包括这些真实世界图流的动态性、异构性、连续性和高容量，这对图处理算法提出了新的要求。我们提出了一种使用嵌入式的动态图流汇总系统，该系统在确保高可用性和有限资源使用的同时，还能提供富有表现力的图形。在本文中，我们研究了基于嵌入式的方法与现有的基于词库/本体的方法（FACES）之间的性能比较。这两种方法都使用了概念聚类和顶k评分，可以在资源有限的情况下生成富有表现力的动态图摘要。评估结果表明，与发送所有可用信息而不进行顶k计分相比，发送顶k融合的多样化摘要可减少34%到92%的转发信息和冗余感知，F分数从0.80到0.95不等，取决于k的大小。此外，摘要的质量与人类评委确定的理想摘要一致。总结方法的代价是降低系统性能。与最昂贵的基于嵌入的方法相比，基于词库/本体的方法的延迟和内存分别增加了 6 倍和 3 倍，吞吐量却更低，但摘要质量却略胜一筹。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Graph Computing

自引率

0.00%

发文量