Graph construction on complex spatiotemporal data for enhancing graph neural network-based approaches

IF 2.8 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Data Science and Analytics Pub Date : 2023-09-25 DOI:10.1007/s41060-023-00452-2

Stefan Bloemheuvel, Jurgen van den Hoogen, Martin Atzmueller

{"title":"Graph construction on complex spatiotemporal data for enhancing graph neural network-based approaches","authors":"Stefan Bloemheuvel, Jurgen van den Hoogen, Martin Atzmueller","doi":"10.1007/s41060-023-00452-2","DOIUrl":null,"url":null,"abstract":"Abstract Graph neural networks (GNNs) haven proven to be an indispensable approach in modeling complex data, in particular spatial temporal data, e.g., relating to sensor data given as time series with according spatial information. Although GNNs provide powerful modeling capabilities on such kind of data, they require adequate input data in terms of both signal and the underlying graph structures. However, typically the according graphs are not automatically available or even predefined, such that typically an ad hoc graph representation needs to be constructed. However, often the construction of the underlying graph structure is given insufficient attention. Therefore, this paper performs an in-depth analysis of several methods for constructing graphs from a set of sensors attributed with spatial information, i.e., geographical coordinates, or using their respective attached signal data. We apply a diverse set of standard methods for estimating groups and similarities between graph nodes as location-based as well as signal-driven approaches on multiple benchmark datasets for evaluation and assessment. Here, for both areas, we specifically include distance-based, clustering-based, as well as correlation-based approaches for estimating the relationships between nodes for subsequent graph construction. In addition, we consider two different GNN approaches, i.e., regression and forecasting in order to enable a broader experimental assessment. Typically, no predefined graph is given, such that (ad hoc) graph creation is necessary. Here, our results indicate the criticality of factoring in the crucial step of graph construction into GNN-based research on spatial temporal data. Overall, in our experimentation no single approach for graph construction emerged as a clear winner. However, in our analysis we are able to provide specific indications based on the obtained results, for a specific class of methods. Collectively, the findings highlight the need for researchers to carefully consider graph construction when employing GNNs in the analysis of spatial temporal data.","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"8 1","pages":"0"},"PeriodicalIF":2.8000,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Science and Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41060-023-00452-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Graph neural networks (GNNs) haven proven to be an indispensable approach in modeling complex data, in particular spatial temporal data, e.g., relating to sensor data given as time series with according spatial information. Although GNNs provide powerful modeling capabilities on such kind of data, they require adequate input data in terms of both signal and the underlying graph structures. However, typically the according graphs are not automatically available or even predefined, such that typically an ad hoc graph representation needs to be constructed. However, often the construction of the underlying graph structure is given insufficient attention. Therefore, this paper performs an in-depth analysis of several methods for constructing graphs from a set of sensors attributed with spatial information, i.e., geographical coordinates, or using their respective attached signal data. We apply a diverse set of standard methods for estimating groups and similarities between graph nodes as location-based as well as signal-driven approaches on multiple benchmark datasets for evaluation and assessment. Here, for both areas, we specifically include distance-based, clustering-based, as well as correlation-based approaches for estimating the relationships between nodes for subsequent graph construction. In addition, we consider two different GNN approaches, i.e., regression and forecasting in order to enable a broader experimental assessment. Typically, no predefined graph is given, such that (ad hoc) graph creation is necessary. Here, our results indicate the criticality of factoring in the crucial step of graph construction into GNN-based research on spatial temporal data. Overall, in our experimentation no single approach for graph construction emerged as a clear winner. However, in our analysis we are able to provide specific indications based on the obtained results, for a specific class of methods. Collectively, the findings highlight the need for researchers to carefully consider graph construction when employing GNNs in the analysis of spatial temporal data.

查看原文本刊更多论文

基于图神经网络的复杂时空数据图构建方法

图神经网络(gnn)已被证明是复杂数据建模中不可或缺的方法，特别是时空数据，例如，与传感器数据相关的时间序列具有相应的空间信息。尽管gnn在这类数据上提供了强大的建模能力，但它们在信号和底层图结构方面都需要足够的输入数据。然而，通常情况下，相应的图不是自动可用的，甚至不是预定义的，因此通常需要构造一个特别的图表示。然而，底层图结构的构造往往没有得到足够的重视。因此，本文深入分析了从一组具有空间信息(即地理坐标)的传感器或使用其各自附加的信号数据构建图形的几种方法。我们应用了一套不同的标准方法来估计图节点之间的组和相似性，作为基于位置的方法和信号驱动的方法，用于多个基准数据集的评估和评估。在这里，对于这两个领域，我们特别包括基于距离的，基于聚类的，以及基于相关性的方法来估计节点之间的关系，以便后续的图构建。此外，我们考虑了两种不同的GNN方法，即回归和预测，以便进行更广泛的实验评估。通常，没有给出预定义的图，因此(特别的)图创建是必要的。在这里，我们的研究结果表明，在基于gnn的时空数据研究中，将图构建的关键步骤纳入其中是至关重要的。总的来说，在我们的实验中，没有一种图构建方法是明显的赢家。然而，在我们的分析中，我们能够根据获得的结果，为特定类别的方法提供特定的适应症。总的来说，这些发现强调了研究人员在使用gnn分析时空数据时需要仔细考虑图的构建。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Data Science and Analytics Multiple-

CiteScore

6.40

自引率

8.30%

发文量

期刊介绍： Data Science has been established as an important emergent scientific field and paradigm driving research evolution in such disciplines as statistics, computing science and intelligence science, and practical transformation in such domains as science, engineering, the public sector, business, social science, and lifestyle. The field encompasses the larger areas of artificial intelligence, data analytics, machine learning, pattern recognition, natural language understanding, and big data manipulation. It also tackles related new scientific challenges, ranging from data capture, creation, storage, retrieval, sharing, analysis, optimization, and visualization, to integrative analysis across heterogeneous and interdependent complex resources for better decision-making, collaboration, and, ultimately, value creation.The International Journal of Data Science and Analytics (JDSA) brings together thought leaders, researchers, industry practitioners, and potential users of data science and analytics, to develop the field, discuss new trends and opportunities, exchange ideas and practices, and promote transdisciplinary and cross-domain collaborations. The journal is composed of three streams: Regular, to communicate original and reproducible theoretical and experimental findings on data science and analytics; Applications, to report the significant data science applications to real-life situations; and Trends, to report expert opinion and comprehensive surveys and reviews of relevant areas and topics in data science and analytics.Topics of relevance include all aspects of the trends, scientific foundations, techniques, and applications of data science and analytics, with a primary focus on:statistical and mathematical foundations for data science and analytics;understanding and analytics of complex data, human, domain, network, organizational, social, behavior, and system characteristics, complexities and intelligences;creation and extraction, processing, representation and modelling, learning and discovery, fusion and integration, presentation and visualization of complex data, behavior, knowledge and intelligence;data analytics, pattern recognition, knowledge discovery, machine learning, deep analytics and deep learning, and intelligent processing of various data (including transaction, text, image, video, graph and network), behaviors and systems;active, real-time, personalized, actionable and automated analytics, learning, computation, optimization, presentation and recommendation; big data architecture, infrastructure, computing, matching, indexing, query processing, mapping, search, retrieval, interoperability, exchange, and recommendation;in-memory, distributed, parallel, scalable and high-performance computing, analytics and optimization for big data;review, surveys, trends, prospects and opportunities of data science research, innovation and applications;data science applications, intelligent devices and services in scientific, business, governmental, cultural, behavioral, social and economic, health and medical, human, natural and artificial (including online/Web, cloud, IoT, mobile and social media) domains; andethics, quality, privacy, safety and security, trust, and risk of data science and analytics