DisperPy: A machine learning based tool to automatically pick group velocity dispersion curves from earthquakes

IF 4.4 2区地球科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Geosciences Pub Date : 2025-07-16 DOI:10.1016/j.cageo.2025.106015

André V.S. Nascimento , Carlos A.M. Chaves , Susanne T.R. Maciel , George S. França , Giuliano S. Marotta

{"title":"DisperPy: A machine learning based tool to automatically pick group velocity dispersion curves from earthquakes","authors":"André V.S. Nascimento , Carlos A.M. Chaves , Susanne T.R. Maciel , George S. França , Giuliano S. Marotta","doi":"10.1016/j.cageo.2025.106015","DOIUrl":null,"url":null,"abstract":"<div><div>Seismology has made significant progress in high-resolution Earth imaging, largely driven by the increasing volume of freely available data. As a result, automated tools and machine learning algorithms are becoming essential for processing this vast amount of information. We present <em>DisperPy</em>, an open-source Python library developed to automatically extract group velocity dispersion curves from earthquake data. The analysis framework of <em>DisperPy</em> is structured around two primary tasks: (1) assessing the quality of waveforms to determine if dispersion extraction is feasible, and (2) measuring the group velocity dispersion curve for suitable waveforms. To address the first task, <em>DisperPy</em> uses a convolutional neural network trained on dispersion spectrograms to classify waveform quality. The model, based on the ResNet-34 architecture, is initialized with ImageNet-pretrained weights and fine-tuned using the fastai deep learning library. In the test set, the network achieves an accuracy of 92 % in distinguishing between high- and low-quality dispersion images. For the second task, <em>DisperPy</em> employs unsupervised learning techniques, starting with a Gaussian mixture model to separate dispersion energy from background noise, followed by <em>k-means</em> to separate the dispersion energy into clusters, making it easier to track amplitude maxima and then construct initial dispersion curves. Finally, a refinement of the initial dispersion is achieved using both the density-based spatial clustering of applications with noise algorithm and data quality criteria to remove possible outliers. To further test <em>DisperPy</em>, we conduct a surface wave tomography experiment across the contiguous United States using freely available vertical-component broadband waveforms. After processing the data with <em>DisperPy</em> and removing low-quality waveforms, the final dataset consisted of 194,325 unique dispersion curves. Consistent with previous studies, our maps reveal a prominent velocity dichotomy, with low velocities in the tectonically active western US and high velocities in the stable central and eastern US.</div></div>","PeriodicalId":55221,"journal":{"name":"Computers & Geosciences","volume":"205 ","pages":"Article 106015"},"PeriodicalIF":4.4000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Geosciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098300425001657","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Seismology has made significant progress in high-resolution Earth imaging, largely driven by the increasing volume of freely available data. As a result, automated tools and machine learning algorithms are becoming essential for processing this vast amount of information. We present DisperPy, an open-source Python library developed to automatically extract group velocity dispersion curves from earthquake data. The analysis framework of DisperPy is structured around two primary tasks: (1) assessing the quality of waveforms to determine if dispersion extraction is feasible, and (2) measuring the group velocity dispersion curve for suitable waveforms. To address the first task, DisperPy uses a convolutional neural network trained on dispersion spectrograms to classify waveform quality. The model, based on the ResNet-34 architecture, is initialized with ImageNet-pretrained weights and fine-tuned using the fastai deep learning library. In the test set, the network achieves an accuracy of 92 % in distinguishing between high- and low-quality dispersion images. For the second task, DisperPy employs unsupervised learning techniques, starting with a Gaussian mixture model to separate dispersion energy from background noise, followed by k-means to separate the dispersion energy into clusters, making it easier to track amplitude maxima and then construct initial dispersion curves. Finally, a refinement of the initial dispersion is achieved using both the density-based spatial clustering of applications with noise algorithm and data quality criteria to remove possible outliers. To further test DisperPy, we conduct a surface wave tomography experiment across the contiguous United States using freely available vertical-component broadband waveforms. After processing the data with DisperPy and removing low-quality waveforms, the final dataset consisted of 194,325 unique dispersion curves. Consistent with previous studies, our maps reveal a prominent velocity dichotomy, with low velocities in the tectonically active western US and high velocities in the stable central and eastern US.

查看原文本刊更多论文

色散：一个基于机器学习的工具，可以自动从地震中选择群速度色散曲线

地震学在高分辨率地球成像方面取得了重大进展，这主要是由于免费数据量的增加。因此，自动化工具和机器学习算法对于处理如此大量的信息变得至关重要。我们提出了一个开源的Python库DisperPy，用于从地震数据中自动提取群速度色散曲线。色散分析框架围绕两个主要任务构建：(1)评估波形质量以确定色散提取是否可行；(2)测量合适波形的群速度色散曲线。为了解决第一个任务，DisperPy使用在色散谱图上训练的卷积神经网络对波形质量进行分类。该模型基于ResNet-34架构，使用imagenet预训练的权值进行初始化，并使用fastai深度学习库进行微调。在测试集中，该网络在区分高质量和低质量色散图像方面达到了92%的准确率。对于第二个任务，DisperPy采用无监督学习技术，从高斯混合模型开始将色散能量从背景噪声中分离出来，然后使用k-means将色散能量分离成簇，从而更容易跟踪振幅最大值，然后构建初始色散曲线。最后，使用基于密度的空间聚类应用和噪声算法和数据质量标准来去除可能的异常值，从而实现初始离散度的细化。为了进一步测试色散，我们使用免费提供的垂直分量宽带波形在美国邻近地区进行了表面波层析成像实验。在对数据进行色散处理并去除低质量波形后，最终数据集由194,325条独特的色散曲线组成。与之前的研究一致，我们的地图显示了一个明显的速度二分法，在构造活跃的美国西部，速度较低，而在稳定的美国中部和东部，速度较高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Geosciences 地学-地球科学综合

CiteScore

9.30

自引率

6.80%

发文量

164

审稿时长

3.4 months

期刊介绍： Computers & Geosciences publishes high impact, original research at the interface between Computer Sciences and Geosciences. Publications should apply modern computer science paradigms, whether computational or informatics-based, to address problems in the geosciences.