Juliann R. Coffey, Alex J. C. Witsil, Kenneth A. Macpherson, David Fee
{"title":"Unsupervised Machine Learning Clustering of Seismic and Infrasound Data Quality Metrics","authors":"Juliann R. Coffey, Alex J. C. Witsil, Kenneth A. Macpherson, David Fee","doi":"10.1785/0220230177","DOIUrl":null,"url":null,"abstract":"\n Developing techniques for improving quality control (QC) schemes to catch seismic and infrasound data defects continues to be an area of active research. Selecting universal thresholds for the automation of data quality (DQ) checks is an efficient way to find QC issues, but these thresholds may not apply well to multiple stations with varying DQ characteristics. In addition, these thresholds may not catch subtle changes in DQ parameters that still indicate problems. Machine learning can be an alternative way of diagnosing QC issues. K-means clustering, an unsupervised machine learning clustering algorithm, has been effectively used in the past for geophysical pattern exploration. This study furthers k-means applications to DQ analysis through clustering on DQ metrics derived from day-long segments of nuclear explosion monitoring data. Our k-means implementation on broadband seismometer DQ metrics separately clustered mass recenters, calibrations lasting at least one hour, and days without either. Applying this technique to infrasound DQ metrics revealed clusters related to physical issues at the stations, such as missing back volume screws and the flooding of ported pipe inlets. These are both examples of QC issues that are difficult to diagnose or detect through the thresholding of metrics or by inspecting waveforms and spectra. Our results show that k-means clustering can be a useful QC tool in exploring DQ patterns to assist analyst review of station operation and maintenance. The learned knowledge from this exploration can then inform a thresholding workflow on how to tailor to individual stations, or the k-means model could classify data directly.","PeriodicalId":21687,"journal":{"name":"Seismological Research Letters","volume":" 14","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seismological Research Letters","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1785/0220230177","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}
引用次数: 0
Abstract
Developing techniques for improving quality control (QC) schemes to catch seismic and infrasound data defects continues to be an area of active research. Selecting universal thresholds for the automation of data quality (DQ) checks is an efficient way to find QC issues, but these thresholds may not apply well to multiple stations with varying DQ characteristics. In addition, these thresholds may not catch subtle changes in DQ parameters that still indicate problems. Machine learning can be an alternative way of diagnosing QC issues. K-means clustering, an unsupervised machine learning clustering algorithm, has been effectively used in the past for geophysical pattern exploration. This study furthers k-means applications to DQ analysis through clustering on DQ metrics derived from day-long segments of nuclear explosion monitoring data. Our k-means implementation on broadband seismometer DQ metrics separately clustered mass recenters, calibrations lasting at least one hour, and days without either. Applying this technique to infrasound DQ metrics revealed clusters related to physical issues at the stations, such as missing back volume screws and the flooding of ported pipe inlets. These are both examples of QC issues that are difficult to diagnose or detect through the thresholding of metrics or by inspecting waveforms and spectra. Our results show that k-means clustering can be a useful QC tool in exploring DQ patterns to assist analyst review of station operation and maintenance. The learned knowledge from this exploration can then inform a thresholding workflow on how to tailor to individual stations, or the k-means model could classify data directly.