{"title":"Dynamic Diverse Summarisation in Heterogeneous Graph Streams: a Comparison between Thesaurus/Ontology-based and Embeddings-based Approaches","authors":"Niki Pavlopoulou","doi":"10.35708/gc1868-126724","DOIUrl":null,"url":null,"abstract":"Nowadays, there is a lot of attention drawn in smart environments, like Smart Cities and Internet of Things. These environments generate data streams that could be represented as graphs, which can be analysed in real-time to satisfy user or application needs. The challenges involved in these environments, ranging from the dynamism, heterogeneity, continuity, and high-volume of these real-world graph streams\ncreate new requirements for graph processing algorithms. We propose\na dynamic graph stream summarisation system with the use of embeddings that provides expressive graphs while ensuring high usability and limited resource usage. In this paper, we examine the performance comparison between our embeddings-based approach and an existing thesaurus/ontology-based approach (FACES) that we adapted in a dynamic environment with the use of windows and data fusion. Both\napproaches use conceptual clustering and top-k scoring that can result\nin expressive, dynamic graph summaries with limited resources. Evaluations show that sending top-k fused diverse summaries, results in 34%\nto 92% reduction of forwarded messages and redundancy-awareness with\nan F-score ranging from 0.80 to 0.95 depending on the k compared to\nsending all the available information without top-k scoring. Also, the\nsummaries' quality follows the agreement of ideal summaries determined\nby human judges. The summarisation approaches come with the expense\nof reduced system performance. The thesaurus/ontology-based approach\nproved 6 times more latency-heavy and 3 times more memory-heavy compared to the most expensive embeddings-based approach while having\nlower throughput but provided slightly better quality summaries.","PeriodicalId":121183,"journal":{"name":"International Journal of Graph Computing","volume":"318 6","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Graph Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35708/gc1868-126724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Nowadays, there is a lot of attention drawn in smart environments, like Smart Cities and Internet of Things. These environments generate data streams that could be represented as graphs, which can be analysed in real-time to satisfy user or application needs. The challenges involved in these environments, ranging from the dynamism, heterogeneity, continuity, and high-volume of these real-world graph streams
create new requirements for graph processing algorithms. We propose
a dynamic graph stream summarisation system with the use of embeddings that provides expressive graphs while ensuring high usability and limited resource usage. In this paper, we examine the performance comparison between our embeddings-based approach and an existing thesaurus/ontology-based approach (FACES) that we adapted in a dynamic environment with the use of windows and data fusion. Both
approaches use conceptual clustering and top-k scoring that can result
in expressive, dynamic graph summaries with limited resources. Evaluations show that sending top-k fused diverse summaries, results in 34%
to 92% reduction of forwarded messages and redundancy-awareness with
an F-score ranging from 0.80 to 0.95 depending on the k compared to
sending all the available information without top-k scoring. Also, the
summaries' quality follows the agreement of ideal summaries determined
by human judges. The summarisation approaches come with the expense
of reduced system performance. The thesaurus/ontology-based approach
proved 6 times more latency-heavy and 3 times more memory-heavy compared to the most expensive embeddings-based approach while having
lower throughput but provided slightly better quality summaries.