Missing data completion in wastewater network databases: the added-value of Graph Convolutional Neural Networks.

The EGU General Assembly Pub Date : 2021-03-03 DOI:10.5194/egusphere-egu21-8350

Yassine Bel-Ghaddar, C. Delenne, N. Chahinian, Ahlame Begdouri, Abderrahmane Seriai

{"title":"Missing data completion in wastewater network databases: the added-value of Graph Convolutional Neural Networks.","authors":"Yassine Bel-Ghaddar, C. Delenne, N. Chahinian, Ahlame Begdouri, Abderrahmane Seriai","doi":"10.5194/egusphere-egu21-8350","DOIUrl":null,"url":null,"abstract":"Wastewater networks are mandatory for urbanization. Their management, which includes reparation and expansion operations, requires precise information about their underground components, mainly pipes. For hydraulic  modelling purposes, the characteristics of the nodes and pipes in the model must be fully known via specific, complete and consistent attribute tables. However, due to years of service and interventions by different actors, information about the attributes and characteristics associated with the various objects constituting a network are not  properly  tracked and reported. Therefore, databases related to wastewater networks, when available, still suffer from a large amount of missing data.A wastewater network constitutes a graph composed of nodes and edges. Nodes represent manholes, equipment, repairs, etc. while edges represent pipes. Each of the nodes and edges has a set of properties in the form of attributes such as diameters of the pipes. In this work, we seek to complete the missing attributes of wastewater networks using machine learning techniques. The main goal is to make use of the graph structures in the learning process, taking into consideration the topology and the relationships between their components (nodes and edges) to predict missing attribute values.Graph Convolutional Network models (GCN) have gained a lot of attention in recent years and achieved state of the art in many applications such as chemistry. These models are applied directly on graphs to perform diverse machine learning tasks. We present here the use of GCN models such as ChebConv to complete the missing attribute values of two datasets (1239 and 754 elements) extracted from the wastewater networks of  Montpellier and Angers Metropolis in France. To emphasize the importance of the graph structure in the learning process and thus on the quality of the predictions, GCNs' results are benchmarked against non-topological neural networks. The application on diameter value completion, indicates that using the structure of the wastewater network in the learning process has a significant impact on the prediction results especially for minority classes. Indeed, the diameter classes are very heterogeneous in terms of number of elements with a highly majority class and several classes with few elements. Non-topological neural networks always fail to predict these classes and affect the majority class value to every missing diameter, yielding a perfect precision for this class but a null one for all the others. On the contrary, the ChebConv model precision is slightly lower (0.93) for the majority class but much higher (increases from 0.3 to 0.81) for other classes, using only the structure of the graphs. The use of other available information in the learning process may enhance these results.","PeriodicalId":22413,"journal":{"name":"The EGU General Assembly","volume":"36 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The EGU General Assembly","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/egusphere-egu21-8350","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Wastewater networks are mandatory for urbanization. Their management, which includes reparation and expansion operations, requires precise information about their underground components, mainly pipes. For hydraulic modelling purposes, the characteristics of the nodes and pipes in the model must be fully known via specific, complete and consistent attribute tables. However, due to years of service and interventions by different actors, information about the attributes and characteristics associated with the various objects constituting a network are not properly tracked and reported. Therefore, databases related to wastewater networks, when available, still suffer from a large amount of missing data.

A wastewater network constitutes a graph composed of nodes and edges. Nodes represent manholes, equipment, repairs, etc. while edges represent pipes. Each of the nodes and edges has a set of properties in the form of attributes such as diameters of the pipes. In this work, we seek to complete the missing attributes of wastewater networks using machine learning techniques. The main goal is to make use of the graph structures in the learning process, taking into consideration the topology and the relationships between their components (nodes and edges) to predict missing attribute values.

Graph Convolutional Network models (GCN) have gained a lot of attention in recent years and achieved state of the art in many applications such as chemistry. These models are applied directly on graphs to perform diverse machine learning tasks. We present here the use of GCN models such as ChebConv to complete the missing attribute values of two datasets (1239 and 754 elements) extracted from the wastewater networks of Montpellier and Angers Metropolis in France. To emphasize the importance of the graph structure in the learning process and thus on the quality of the predictions, GCNs' results are benchmarked against non-topological neural networks. The application on diameter value completion, indicates that using the structure of the wastewater network in the learning process has a significant impact on the prediction results especially for minority classes. Indeed, the diameter classes are very heterogeneous in terms of number of elements with a highly majority class and several classes with few elements. Non-topological neural networks always fail to predict these classes and affect the majority class value to every missing diameter, yielding a perfect precision for this class but a null one for all the others. On the contrary, the ChebConv model precision is slightly lower (0.93) for the majority class but much higher (increases from 0.3 to 0.81) for other classes, using only the structure of the graphs. The use of other available information in the learning process may enhance these results.

查看原文本刊更多论文

污水管网数据库缺失数据补全:图卷积神经网络的附加值。

污水管网是城市化的必由之路。他们的管理，包括维修和扩建操作，需要精确的地下组成部分的信息，主要是管道。对液压# 160;为了建模，必须通过特定的、完整的和一致的属性表来充分了解模型中节点和管道的特征。然而，由于多年的服务和不同参与者的干预，与构成网络的各种对象相关的属性和特征的信息并不是适当的# 160;跟踪# 160;报道。因此，与污水管网相关的数据库在可用的情况下，仍然存在大量的数据缺失。污水管网是由节点和边组成的图。节点表示人孔、设备、维修等，而边缘表示管道。每个节点和边都有一组属性(如管道直径)。在这项工作中，我们试图使用机器学习技术来完成废水网络的缺失属性。主要目标是在学习过程中利用图结构，考虑拓扑及其组件(节点和边)之间的关系来预测缺失的属性值。近年来，图卷积网络模型(GCN)得到了广泛的关注，并在化学等许多应用领域取得了先进的进展。这些模型直接应用于图来执行各种机器学习任务。我们在这里展示了使用GCN模型(如ChebConv)来完成从废水网络中提取的两个数据集(1239和754个元素)的缺失属性值。法国的蒙彼利埃和昂热。为了强调图结构在学习过程中的重要性以及预测的质量，GCNs的结果与非拓扑神经网络进行了基准测试。在直径值补全上的应用表明，在学习过程中使用废水网络的结构对预测结果有显著影响，特别是对少数班级。实际上，直径类别在元素数量方面是非常不均匀的，一个类别的元素非常多，几个类别的元素很少。非拓扑神经网络总是无法预测这些类，并影响每个缺失直径的大多数类值，从而为该类提供完美的精度，但为所有其他类提供零精度。相反，仅使用图的结构，大多数类的ChebConv模型精度略低(0.93)，但其他类的ChebConv模型精度要高得多(从0.3增加到0.81)。在学习过程中使用其他可用信息可能会增强这些结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The EGU General Assembly

自引率

0.00%

发文量