Geo-Hgan: Unsupervised anomaly detection in geochemical data via latent space learning

IF 4.4 2区地球科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Geosciences Pub Date : 2024-08-14 DOI:10.1016/j.cageo.2024.105703

Liang Ding , Bainian Chen , Yuelong Zhu , Hai Dong , Guiyang Chan , Pengcheng Zhang

{"title":"Geo-Hgan: Unsupervised anomaly detection in geochemical data via latent space learning","authors":"Liang Ding , Bainian Chen , Yuelong Zhu , Hai Dong , Guiyang Chan , Pengcheng Zhang","doi":"10.1016/j.cageo.2024.105703","DOIUrl":null,"url":null,"abstract":"<div><p>Reconstructing geochemical data for anomaly detection using Generative Adversarial Networks (GANs) has become a prevalent method in identifying geochemical anomalies. However, injecting random noise into GANs can induce model instability. To mitigate this issue, we propose a novel anomaly detection model, Geo-Hgan, which integrates a dual adversarial network architecture with a Latent Space Adversarial Module (LSAM) to learn the distribution of latent variables from arbitrary data and optimize the sample reconstruction process, thereby alleviating instability during GAN training. Additionally, an encoder guided by the LSAM-pretrained GAN is employed to extract variational features, facilitating rapid and effective sample mapping into the latent space defined by LSAM. Experimental results demonstrate that under unsupervised conditions, Geo-Hgan achieves an Area Under the Curve (AUC) score of 85% across three geochemical datasets, outperforming similar models in accuracy and reconstruction capabilities. To assess its versatility and generalization ability, we extend Geo-Hgan to anomaly detection tasks in computer vision, where it achieves an average AUC score of 98.7% on the MvtecAD dataset, setting a new state-of-the-art performance in the domain. Furthermore, we propose AnomFilter, a method for setting anomaly thresholds based on the clustering hypothesis. AnomFilter identifies high-confidence anomaly samples identified by Geo-Hgan in the source domain and iteratively transfers them to the target domain. These high-confidence anomaly samples, combined with a small number of known positive samples in the target domain, enhance the accuracy of supervised geochemical anomaly detection in the target domain, which achieved an AUC score of 94%. The utilization of anomaly detection models for sample transfer learning offers a novel perspective for future work.</p></div>","PeriodicalId":55221,"journal":{"name":"Computers & Geosciences","volume":"192 ","pages":"Article 105703"},"PeriodicalIF":4.4000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Geosciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098300424001869","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Reconstructing geochemical data for anomaly detection using Generative Adversarial Networks (GANs) has become a prevalent method in identifying geochemical anomalies. However, injecting random noise into GANs can induce model instability. To mitigate this issue, we propose a novel anomaly detection model, Geo-Hgan, which integrates a dual adversarial network architecture with a Latent Space Adversarial Module (LSAM) to learn the distribution of latent variables from arbitrary data and optimize the sample reconstruction process, thereby alleviating instability during GAN training. Additionally, an encoder guided by the LSAM-pretrained GAN is employed to extract variational features, facilitating rapid and effective sample mapping into the latent space defined by LSAM. Experimental results demonstrate that under unsupervised conditions, Geo-Hgan achieves an Area Under the Curve (AUC) score of 85% across three geochemical datasets, outperforming similar models in accuracy and reconstruction capabilities. To assess its versatility and generalization ability, we extend Geo-Hgan to anomaly detection tasks in computer vision, where it achieves an average AUC score of 98.7% on the MvtecAD dataset, setting a new state-of-the-art performance in the domain. Furthermore, we propose AnomFilter, a method for setting anomaly thresholds based on the clustering hypothesis. AnomFilter identifies high-confidence anomaly samples identified by Geo-Hgan in the source domain and iteratively transfers them to the target domain. These high-confidence anomaly samples, combined with a small number of known positive samples in the target domain, enhance the accuracy of supervised geochemical anomaly detection in the target domain, which achieved an AUC score of 94%. The utilization of anomaly detection models for sample transfer learning offers a novel perspective for future work.

查看原文本刊更多论文

Geo-Hgan：通过潜在空间学习对地球化学数据进行无监督异常检测

使用生成对抗网络（GANs）重建地球化学数据以进行异常检测，已成为识别地球化学异常的一种普遍方法。然而，向 GANs 中注入随机噪声会导致模型不稳定。为了缓解这一问题，我们提出了一种新型异常检测模型--Geo-Hgan，它将双对抗网络架构与潜在空间对抗模块（LSAM）整合在一起，从任意数据中学习潜在变量的分布，并优化样本重建过程，从而缓解 GAN 训练过程中的不稳定性。此外，由 LSAM 训练的 GAN 引导的编码器被用来提取变异特征，从而促进快速有效地将样本映射到 LSAM 定义的潜空间中。实验结果表明，在无监督条件下，Geo-Hgan 在三个地球化学数据集上的曲线下面积（AUC）得分率达到了 85%，在准确性和重构能力方面优于同类模型。为了评估 Geo-Hgan 的通用性和泛化能力，我们将其扩展到计算机视觉领域的异常检测任务中，在 MvtecAD 数据集上，Geo-Hgan 的平均 AUC 得分为 98.7%，在该领域创造了新的一流性能。此外，我们还提出了 AnomFilter，一种基于聚类假设设置异常阈值的方法。AnomFilter 可识别源域中由 Geo-Hgan 识别出的高可信度异常样本，并将其迭代转移到目标域。这些高置信度异常样本与目标域中的少量已知阳性样本相结合，提高了目标域中监督地球化学异常检测的准确性，其 AUC 得分为 94%。利用异常检测模型进行样本转移学习为今后的工作提供了一个新的视角。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Geosciences 地学-地球科学综合

CiteScore

9.30

自引率

6.80%

发文量

164

审稿时长

3.4 months

期刊介绍： Computers & Geosciences publishes high impact, original research at the interface between Computer Sciences and Geosciences. Publications should apply modern computer science paradigms, whether computational or informatics-based, to address problems in the geosciences.