Yangwen Yu;Victor O. K. Li;Jacqueline C. K. Lam;Kelvin Chan;Qi Zhang
{"title":"CTDI:基于cnn -变压器的时空缺失空气污染数据输入","authors":"Yangwen Yu;Victor O. K. Li;Jacqueline C. K. Lam;Kelvin Chan;Qi Zhang","doi":"10.1109/TBDATA.2025.3533882","DOIUrl":null,"url":null,"abstract":"Accurate and comprehensive air pollution data is essential for understanding and addressing environmental challenges. Missing data can impair accurate analysis and decision-making. This study presents a novel approach, named CNN-Transformer-based Spatial-Temporal Data Imputation (CTDI), for imputing missing air pollution data. Data pre-processing incorporates observed air pollution data and related urban data to produce 24-hour period tensors as input samples. 1-by-1 CNN layers capture the interaction between different types of input data. Deep learning transformer architecture is employed in a spatial-temporal (S-T) transformer module to capture long-range dependencies and extract complex relationships in both spatial and temporal dimensions. Hong Kong air pollution data is statistically analyzed and used to evaluate CTDI in its recovery of generated and actual patterns of missing data. Experimental results show that CTDI consistently outperforms existing imputation methods across all evaluated scenarios, including cases with higher rates of missing data, thereby demonstrating its robustness and effectiveness in enhancing air quality monitoring. Additionally, ablation experiments reveal that each component significantly contributes to the model's performance, with the temporal transformer proving particularly crucial under varying rates of missing data.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 5","pages":"2443-2456"},"PeriodicalIF":5.7000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CTDI: CNN-Transformer-Based Spatial-Temporal Missing Air Pollution Data Imputation\",\"authors\":\"Yangwen Yu;Victor O. K. Li;Jacqueline C. K. Lam;Kelvin Chan;Qi Zhang\",\"doi\":\"10.1109/TBDATA.2025.3533882\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate and comprehensive air pollution data is essential for understanding and addressing environmental challenges. Missing data can impair accurate analysis and decision-making. This study presents a novel approach, named CNN-Transformer-based Spatial-Temporal Data Imputation (CTDI), for imputing missing air pollution data. Data pre-processing incorporates observed air pollution data and related urban data to produce 24-hour period tensors as input samples. 1-by-1 CNN layers capture the interaction between different types of input data. Deep learning transformer architecture is employed in a spatial-temporal (S-T) transformer module to capture long-range dependencies and extract complex relationships in both spatial and temporal dimensions. Hong Kong air pollution data is statistically analyzed and used to evaluate CTDI in its recovery of generated and actual patterns of missing data. Experimental results show that CTDI consistently outperforms existing imputation methods across all evaluated scenarios, including cases with higher rates of missing data, thereby demonstrating its robustness and effectiveness in enhancing air quality monitoring. Additionally, ablation experiments reveal that each component significantly contributes to the model's performance, with the temporal transformer proving particularly crucial under varying rates of missing data.\",\"PeriodicalId\":13106,\"journal\":{\"name\":\"IEEE Transactions on Big Data\",\"volume\":\"11 5\",\"pages\":\"2443-2456\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Big Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10854914/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10854914/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
摘要
准确和全面的空气污染数据对于理解和应对环境挑战至关重要。缺少数据会影响准确的分析和决策。本研究提出了一种新的方法,称为CNN-Transformer-based Spatial-Temporal Data Imputation (CTDI),用于输入缺失的空气污染数据。数据预处理结合观测到的空气污染数据和相关城市数据,产生24小时周期张量作为输入样本。1乘1的CNN层捕获不同类型输入数据之间的交互。在时空(S-T)转换器模块中采用深度学习转换器架构来捕获远程依赖关系并提取时空维度上的复杂关系。对香港的空气污染数据进行统计分析,并用于评估CTDI对缺失数据的生成模式和实际模式的恢复。实验结果表明,CTDI在所有评估情景(包括数据缺失率较高的情况)中始终优于现有的归算方法,从而证明了其在加强空气质量监测方面的鲁棒性和有效性。此外,烧蚀实验表明,每个分量对模型的性能都有显著贡献,在数据丢失率不同的情况下,时间转换器被证明尤为重要。
CTDI: CNN-Transformer-Based Spatial-Temporal Missing Air Pollution Data Imputation
Accurate and comprehensive air pollution data is essential for understanding and addressing environmental challenges. Missing data can impair accurate analysis and decision-making. This study presents a novel approach, named CNN-Transformer-based Spatial-Temporal Data Imputation (CTDI), for imputing missing air pollution data. Data pre-processing incorporates observed air pollution data and related urban data to produce 24-hour period tensors as input samples. 1-by-1 CNN layers capture the interaction between different types of input data. Deep learning transformer architecture is employed in a spatial-temporal (S-T) transformer module to capture long-range dependencies and extract complex relationships in both spatial and temporal dimensions. Hong Kong air pollution data is statistically analyzed and used to evaluate CTDI in its recovery of generated and actual patterns of missing data. Experimental results show that CTDI consistently outperforms existing imputation methods across all evaluated scenarios, including cases with higher rates of missing data, thereby demonstrating its robustness and effectiveness in enhancing air quality monitoring. Additionally, ablation experiments reveal that each component significantly contributes to the model's performance, with the temporal transformer proving particularly crucial under varying rates of missing data.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.