André V.S. Nascimento , Carlos A.M. Chaves , Susanne T.R. Maciel , George S. França , Giuliano S. Marotta
{"title":"DisperPy: A machine learning based tool to automatically pick group velocity dispersion curves from earthquakes","authors":"André V.S. Nascimento , Carlos A.M. Chaves , Susanne T.R. Maciel , George S. França , Giuliano S. Marotta","doi":"10.1016/j.cageo.2025.106015","DOIUrl":null,"url":null,"abstract":"<div><div>Seismology has made significant progress in high-resolution Earth imaging, largely driven by the increasing volume of freely available data. As a result, automated tools and machine learning algorithms are becoming essential for processing this vast amount of information. We present <em>DisperPy</em>, an open-source Python library developed to automatically extract group velocity dispersion curves from earthquake data. The analysis framework of <em>DisperPy</em> is structured around two primary tasks: (1) assessing the quality of waveforms to determine if dispersion extraction is feasible, and (2) measuring the group velocity dispersion curve for suitable waveforms. To address the first task, <em>DisperPy</em> uses a convolutional neural network trained on dispersion spectrograms to classify waveform quality. The model, based on the ResNet-34 architecture, is initialized with ImageNet-pretrained weights and fine-tuned using the fastai deep learning library. In the test set, the network achieves an accuracy of 92 % in distinguishing between high- and low-quality dispersion images. For the second task, <em>DisperPy</em> employs unsupervised learning techniques, starting with a Gaussian mixture model to separate dispersion energy from background noise, followed by <em>k-means</em> to separate the dispersion energy into clusters, making it easier to track amplitude maxima and then construct initial dispersion curves. Finally, a refinement of the initial dispersion is achieved using both the density-based spatial clustering of applications with noise algorithm and data quality criteria to remove possible outliers. To further test <em>DisperPy</em>, we conduct a surface wave tomography experiment across the contiguous United States using freely available vertical-component broadband waveforms. After processing the data with <em>DisperPy</em> and removing low-quality waveforms, the final dataset consisted of 194,325 unique dispersion curves. Consistent with previous studies, our maps reveal a prominent velocity dichotomy, with low velocities in the tectonically active western US and high velocities in the stable central and eastern US.</div></div>","PeriodicalId":55221,"journal":{"name":"Computers & Geosciences","volume":"205 ","pages":"Article 106015"},"PeriodicalIF":4.4000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Geosciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098300425001657","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Seismology has made significant progress in high-resolution Earth imaging, largely driven by the increasing volume of freely available data. As a result, automated tools and machine learning algorithms are becoming essential for processing this vast amount of information. We present DisperPy, an open-source Python library developed to automatically extract group velocity dispersion curves from earthquake data. The analysis framework of DisperPy is structured around two primary tasks: (1) assessing the quality of waveforms to determine if dispersion extraction is feasible, and (2) measuring the group velocity dispersion curve for suitable waveforms. To address the first task, DisperPy uses a convolutional neural network trained on dispersion spectrograms to classify waveform quality. The model, based on the ResNet-34 architecture, is initialized with ImageNet-pretrained weights and fine-tuned using the fastai deep learning library. In the test set, the network achieves an accuracy of 92 % in distinguishing between high- and low-quality dispersion images. For the second task, DisperPy employs unsupervised learning techniques, starting with a Gaussian mixture model to separate dispersion energy from background noise, followed by k-means to separate the dispersion energy into clusters, making it easier to track amplitude maxima and then construct initial dispersion curves. Finally, a refinement of the initial dispersion is achieved using both the density-based spatial clustering of applications with noise algorithm and data quality criteria to remove possible outliers. To further test DisperPy, we conduct a surface wave tomography experiment across the contiguous United States using freely available vertical-component broadband waveforms. After processing the data with DisperPy and removing low-quality waveforms, the final dataset consisted of 194,325 unique dispersion curves. Consistent with previous studies, our maps reveal a prominent velocity dichotomy, with low velocities in the tectonically active western US and high velocities in the stable central and eastern US.
期刊介绍:
Computers & Geosciences publishes high impact, original research at the interface between Computer Sciences and Geosciences. Publications should apply modern computer science paradigms, whether computational or informatics-based, to address problems in the geosciences.