GCN-ST-MDIR: Graph Convolutional Network-Based Spatial-Temporal Missing Air Pollution Data Pattern Identification and Recovery

IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Yangwen Yu;Victor O. K. Li;Jacqueline C. K. Lam;Kelvin Chan
{"title":"GCN-ST-MDIR: Graph Convolutional Network-Based Spatial-Temporal Missing Air Pollution Data Pattern Identification and Recovery","authors":"Yangwen Yu;Victor O. K. Li;Jacqueline C. K. Lam;Kelvin Chan","doi":"10.1109/TBDATA.2023.3277710","DOIUrl":null,"url":null,"abstract":"Missing data pattern identification and recovery (MDIR) is vital for accurate air pollution monitoring. To recover the missing air pollution data, GCN-ST-MDIR, a Graph Convolutional Network (GCN)-based MDIR framework, is proposed to identify daily missing data patterns and automatically select the best recovery method. GCN-ST-MDIR presents four novelties: (1) A new graph construction is developed to improve GCN data representation for MDIR using S-T similarity matrix and domain-specific knowledge (e.g., weekend/weekday). (2) A TL component is used to pre-train LSCE and ILSCE models. (3) A GCN structure outputs a selection indicator to determine the dominant missing pattern for daily input. The pre-trained data recovery model's accuracy is incorporated into the GCN loss function to penalize the wrong indicator. (4) The output of the GCN structure is used as a score to combine LSCE and ILSCE. Results show that the domain-specific S-T regularity and irregularity can be used as the prior information for both GCN and ILSCE/LSCE to enhance feature extraction. Our model considerably improves the recovery performance as compared to the baselines. GCN-ST-MDIR has achieved an accuracy of 88.48% for general missing data recovery with consecutively and sporadically missing data. GCN-ST-MDIR can be extended to many other S-T MDIR challenges.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1347-1364"},"PeriodicalIF":7.5000,"publicationDate":"2023-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10129038/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Missing data pattern identification and recovery (MDIR) is vital for accurate air pollution monitoring. To recover the missing air pollution data, GCN-ST-MDIR, a Graph Convolutional Network (GCN)-based MDIR framework, is proposed to identify daily missing data patterns and automatically select the best recovery method. GCN-ST-MDIR presents four novelties: (1) A new graph construction is developed to improve GCN data representation for MDIR using S-T similarity matrix and domain-specific knowledge (e.g., weekend/weekday). (2) A TL component is used to pre-train LSCE and ILSCE models. (3) A GCN structure outputs a selection indicator to determine the dominant missing pattern for daily input. The pre-trained data recovery model's accuracy is incorporated into the GCN loss function to penalize the wrong indicator. (4) The output of the GCN structure is used as a score to combine LSCE and ILSCE. Results show that the domain-specific S-T regularity and irregularity can be used as the prior information for both GCN and ILSCE/LSCE to enhance feature extraction. Our model considerably improves the recovery performance as compared to the baselines. GCN-ST-MDIR has achieved an accuracy of 88.48% for general missing data recovery with consecutively and sporadically missing data. GCN-ST-MDIR can be extended to many other S-T MDIR challenges.
GCN-ST-MDIR:基于图卷积网络的时空缺失空气污染数据模式识别与恢复
缺失数据模式识别和恢复(MDIR)对于准确监测空气污染至关重要。为了恢复缺失的空气污染数据,提出了基于图卷积网络(GCN)的MDIR框架GCN- st -MDIR,用于识别每日缺失的数据模式并自动选择最佳恢复方法。GCN- st -MDIR提出了四个新颖之处:(1)利用S-T相似矩阵和领域特定知识(例如,周末/工作日),开发了一种新的图结构,以改进MDIR的GCN数据表示。(2)使用TL组件对LSCE和ILSCE模型进行预训练。(3) GCN结构输出一个选择指标,以确定日常输入的主要缺失模式。将预训练的数据恢复模型的准确性纳入GCN损失函数中,对错误指标进行惩罚。(4)以GCN结构的输出作为评分,将LSCE和ILSCE结合起来。结果表明,域特有的S-T正则性和不规则性可以作为GCN和ILSCE/LSCE的先验信息,以增强特征提取。与基线相比,我们的模型大大提高了恢复性能。GCN-ST-MDIR对于连续和零星缺失数据的一般缺失数据恢复准确率达到了88.48%。GCN-ST-MDIR可以扩展到许多其他S-T MDIR挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
11.80
自引率
2.80%
发文量
114
期刊介绍: The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信