自动编码器人工神经网络和主成分分析在全球气温数据模式提取和空间区域化中的应用

Chibuike Chiedozie Ibebuchi, O. Obarein, Itohan-Osa Abu
{"title":"自动编码器人工神经网络和主成分分析在全球气温数据模式提取和空间区域化中的应用","authors":"Chibuike Chiedozie Ibebuchi, O. Obarein, Itohan-Osa Abu","doi":"10.1088/2632-2153/ad1c34","DOIUrl":null,"url":null,"abstract":"\n Spatial regionalization is instrumental in simplifying the spatial complexity of the climate system. To identify regions of significant climate variability, pattern extraction is often required prior to spatial regionalization with a clustering algorithm. In this study, the autoencoder artificial neural network (AE) was applied to extract the inherent patterns of global temperature data (from 1901 to 2021). Subsequently, Fuzzy C-means clustering was applied to the extracted patterns to classify the global temperature regions. Our analysis involved comparing AE-based and principal component analysis (PCA)-based clustering results to assess consistency. We determined the number of clusters by examining the average percentage decrease in Fuzzy Partition Coefficient and its 95% confidence interval, seeking a balance between obtaining a high Fuzzy Partition Coefficient and avoiding over-segmentation. This approach suggested that for a more general model, four clusters is reasonable. The Adjusted Rand Index between the AE-based and PCA-based clusters is 0.75, indicating that the AE-based and PCA-based clusters have considerable overlap. The observed difference between the AE-based clusters and PCA-based clusters is suggested to be associated with AE’s capability to learn and extract complex non-linear patterns, and this attribute, for example, enabled the clustering algorithm to accurately detect the Himalayas region as the “third pole” with similar temperature characteristics as the polar regions. Finally, when the analysis period is divided into two (1901-1960 and 1961-2021), the Adjusted Rand Index between the two clusters is 0.96 which suggests that historical climate change has not significantly affected the defined temperature regions over the two periods. In essence, this study indicates both AE's potential to enhance our understanding of climate variability and reveals the stability of the historical temperature regions.","PeriodicalId":503691,"journal":{"name":"Machine Learning: Science and Technology","volume":"19 26","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of Autoencoders Artificial Neural Network and Principal Component Analysis for Pattern Extraction and Spatial Regionalization of Global Temperature Data\",\"authors\":\"Chibuike Chiedozie Ibebuchi, O. Obarein, Itohan-Osa Abu\",\"doi\":\"10.1088/2632-2153/ad1c34\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Spatial regionalization is instrumental in simplifying the spatial complexity of the climate system. To identify regions of significant climate variability, pattern extraction is often required prior to spatial regionalization with a clustering algorithm. In this study, the autoencoder artificial neural network (AE) was applied to extract the inherent patterns of global temperature data (from 1901 to 2021). Subsequently, Fuzzy C-means clustering was applied to the extracted patterns to classify the global temperature regions. Our analysis involved comparing AE-based and principal component analysis (PCA)-based clustering results to assess consistency. We determined the number of clusters by examining the average percentage decrease in Fuzzy Partition Coefficient and its 95% confidence interval, seeking a balance between obtaining a high Fuzzy Partition Coefficient and avoiding over-segmentation. This approach suggested that for a more general model, four clusters is reasonable. The Adjusted Rand Index between the AE-based and PCA-based clusters is 0.75, indicating that the AE-based and PCA-based clusters have considerable overlap. The observed difference between the AE-based clusters and PCA-based clusters is suggested to be associated with AE’s capability to learn and extract complex non-linear patterns, and this attribute, for example, enabled the clustering algorithm to accurately detect the Himalayas region as the “third pole” with similar temperature characteristics as the polar regions. Finally, when the analysis period is divided into two (1901-1960 and 1961-2021), the Adjusted Rand Index between the two clusters is 0.96 which suggests that historical climate change has not significantly affected the defined temperature regions over the two periods. In essence, this study indicates both AE's potential to enhance our understanding of climate variability and reveals the stability of the historical temperature regions.\",\"PeriodicalId\":503691,\"journal\":{\"name\":\"Machine Learning: Science and Technology\",\"volume\":\"19 26\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine Learning: Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1088/2632-2153/ad1c34\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning: Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2632-2153/ad1c34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

空间区域化有助于简化气候系统的空间复杂性。要识别气候显著变异的区域,通常需要在使用聚类算法进行空间区域化之前提取模式。本研究采用自动编码器人工神经网络(AE)提取全球气温数据(从 1901 年到 2021 年)的固有模式。随后,对提取的模式进行模糊 C-means 聚类,对全球气温区域进行分类。我们的分析包括比较基于 AE 和基于主成分分析 (PCA) 的聚类结果,以评估一致性。我们通过研究模糊分区系数的平均下降百分比及其 95% 的置信区间来确定聚类的数量,在获得高模糊分区系数和避免过度分区之间寻求平衡。这种方法表明,对于一个更一般的模型,四个聚类是合理的。基于 AE 的聚类和基于 PCA 的聚类之间的调整兰德指数为 0.75,表明基于 AE 的聚类和基于 PCA 的聚类有相当大的重叠。观察到的基于 AE 的聚类与基于 PCA 的聚类之间的差异表明,这与 AE 学习和提取复杂非线性模式的能力有关,例如,这一属性使得聚类算法能够准确地检测到喜马拉雅地区作为 "第三极",具有与极地相似的温度特征。最后,如果将分析时段一分为二(1901-1960 年和 1961-2021 年),两个聚类之间的调整兰德指数为 0.96,这表明历史气候变化在两个时段内对所定义的气温区域没有显著影响。从本质上讲,这项研究既表明了 AE 在增强我们对气候变异性的理解方面的潜力,也揭示了历史温度区域的稳定性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Application of Autoencoders Artificial Neural Network and Principal Component Analysis for Pattern Extraction and Spatial Regionalization of Global Temperature Data
Spatial regionalization is instrumental in simplifying the spatial complexity of the climate system. To identify regions of significant climate variability, pattern extraction is often required prior to spatial regionalization with a clustering algorithm. In this study, the autoencoder artificial neural network (AE) was applied to extract the inherent patterns of global temperature data (from 1901 to 2021). Subsequently, Fuzzy C-means clustering was applied to the extracted patterns to classify the global temperature regions. Our analysis involved comparing AE-based and principal component analysis (PCA)-based clustering results to assess consistency. We determined the number of clusters by examining the average percentage decrease in Fuzzy Partition Coefficient and its 95% confidence interval, seeking a balance between obtaining a high Fuzzy Partition Coefficient and avoiding over-segmentation. This approach suggested that for a more general model, four clusters is reasonable. The Adjusted Rand Index between the AE-based and PCA-based clusters is 0.75, indicating that the AE-based and PCA-based clusters have considerable overlap. The observed difference between the AE-based clusters and PCA-based clusters is suggested to be associated with AE’s capability to learn and extract complex non-linear patterns, and this attribute, for example, enabled the clustering algorithm to accurately detect the Himalayas region as the “third pole” with similar temperature characteristics as the polar regions. Finally, when the analysis period is divided into two (1901-1960 and 1961-2021), the Adjusted Rand Index between the two clusters is 0.96 which suggests that historical climate change has not significantly affected the defined temperature regions over the two periods. In essence, this study indicates both AE's potential to enhance our understanding of climate variability and reveals the stability of the historical temperature regions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信