DiTing: A large-scale Chinese seismic benchmark dataset for artificial intelligence in seismology

IF 1.2 4区 地球科学 Q3 Earth and Planetary Sciences
Ming Zhao , Zhuowei Xiao , Shi Chen , Lihua Fang
{"title":"DiTing: A large-scale Chinese seismic benchmark dataset for artificial intelligence in seismology","authors":"Ming Zhao ,&nbsp;Zhuowei Xiao ,&nbsp;Shi Chen ,&nbsp;Lihua Fang","doi":"10.1016/j.eqs.2022.01.022","DOIUrl":null,"url":null,"abstract":"<div><p>In recent years, artificial intelligence technology has exhibited great potential in seismic signal recognition, setting off a new wave of research. Vast amounts of high-quality labeled data are required to develop and apply artificial intelligence in seismology research. In this study, based on the 2013–2020 seismic cataloging reports of the China Earthquake Networks Center, we constructed an artificial intelligence seismological training dataset (“DiTing”) with the largest known total time length. Data were recorded using broadband and short-period seismometers. The obtained dataset included 2,734,748 three-component waveform traces from 787,010 regional seismic events, the corresponding P- and S-phase arrival time labels, and 641,025 P-wave first-motion polarity labels. All waveforms were sampled at 50 Hz and cut to a time length of 180 s starting from a random number of seconds before the occurrence of an earthquake. Each three-component waveform contained a considerable amount of descriptive information, such as the epicentral distance, back azimuth, and signal-to-noise ratios. The magnitudes of seismic events, epicentral distance, signal-to-noise ratio of P-wave data, and signal-to-noise ratio of S-wave data ranged from 0 to 7.7, 0 to 330 km, –0.05 to 5.31 dB, and –0.05 to 4.73 dB, respectively. The dataset compiled in this study can serve as a high-quality benchmark for machine learning model development and data-driven seismological research on earthquake detection, seismic phase picking, first-motion polarity determination, earthquake magnitude prediction, early warning systems, and strong ground-motion prediction. Such research will further promote the development and application of artificial intelligence in seismology.</p></div>","PeriodicalId":46333,"journal":{"name":"Earthquake Science","volume":"36 2","pages":"Pages 84-94"},"PeriodicalIF":1.2000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earthquake Science","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1674451922000222","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Earth and Planetary Sciences","Score":null,"Total":0}
引用次数: 14

Abstract

In recent years, artificial intelligence technology has exhibited great potential in seismic signal recognition, setting off a new wave of research. Vast amounts of high-quality labeled data are required to develop and apply artificial intelligence in seismology research. In this study, based on the 2013–2020 seismic cataloging reports of the China Earthquake Networks Center, we constructed an artificial intelligence seismological training dataset (“DiTing”) with the largest known total time length. Data were recorded using broadband and short-period seismometers. The obtained dataset included 2,734,748 three-component waveform traces from 787,010 regional seismic events, the corresponding P- and S-phase arrival time labels, and 641,025 P-wave first-motion polarity labels. All waveforms were sampled at 50 Hz and cut to a time length of 180 s starting from a random number of seconds before the occurrence of an earthquake. Each three-component waveform contained a considerable amount of descriptive information, such as the epicentral distance, back azimuth, and signal-to-noise ratios. The magnitudes of seismic events, epicentral distance, signal-to-noise ratio of P-wave data, and signal-to-noise ratio of S-wave data ranged from 0 to 7.7, 0 to 330 km, –0.05 to 5.31 dB, and –0.05 to 4.73 dB, respectively. The dataset compiled in this study can serve as a high-quality benchmark for machine learning model development and data-driven seismological research on earthquake detection, seismic phase picking, first-motion polarity determination, earthquake magnitude prediction, early warning systems, and strong ground-motion prediction. Such research will further promote the development and application of artificial intelligence in seismology.

DiTing:用于地震学人工智能的大规模中国地震基准数据集
近年来,人工智能技术在地震信号识别方面显示出巨大的潜力,掀起了新的研究浪潮。在地震学研究中开发和应用人工智能需要大量高质量的标记数据。本研究基于中国地震台网中心2013-2020年地震编目报告,构建了已知总时间长度最大的人工智能地震训练数据集(“DiTing”)。数据是用宽带和短周期地震仪记录的。获得的数据集包括787,010个区域地震事件的2,734,748个三分量波形迹线,相应的P相和s相到达时间标记,以及641,025个P波首次运动极性标记。所有波形都以50赫兹的频率采样,并从地震发生前的随机秒数开始切割到180秒的时间长度。每个三分量波形都包含大量的描述性信息,如震中距离、反向方位角和信噪比。地震事件震级、震中距离、纵波资料信噪比和横波资料信噪比分别为0 ~ 7.7 km、0 ~ 330 km、-0.05 ~ 5.31 dB和-0.05 ~ 4.73 dB。本研究编制的数据集可作为机器学习模型开发和数据驱动地震学研究的高质量基准,用于地震检测、地震相位提取、初动极性确定、地震震级预测、早期预警系统和强地震动预测。这些研究将进一步推动人工智能在地震学中的发展和应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Earthquake Science
Earthquake Science GEOCHEMISTRY & GEOPHYSICS-
CiteScore
1.10
自引率
8.30%
发文量
42
审稿时长
3 months
期刊介绍: Earthquake Science (EQS) aims to publish high-quality, original, peer-reviewed articles on earthquake-related research subjects. It is an English international journal sponsored by the Seismological Society of China and the Institute of Geophysics, China Earthquake Administration. The topics include, but not limited to, the following ● Seismic sources of all kinds. ● Earth structure at all scales. ● Seismotectonics. ● New methods and theoretical seismology. ● Strong ground motion. ● Seismic phenomena of all kinds. ● Seismic hazards, earthquake forecasting and prediction. ● Seismic instrumentation. ● Significant recent or past seismic events. ● Documentation of recent seismic events or important observations. ● Descriptions of field deployments, new methods, and available software tools. The types of manuscripts include the following. There is no length requirement, except for the Short Notes. 【Articles】 Original contributions that have not been published elsewhere. 【Short Notes】 Short papers of recent events or topics that warrant rapid peer reviews and publications. Limited to 4 publication pages. 【Rapid Communications】 Significant contributions that warrant rapid peer reviews and publications. 【Review Articles】Review articles are by invitation only. Please contact the editorial office and editors for possible proposals. 【Toolboxes】 Descriptions of novel numerical methods and associated computer codes. 【Data Products】 Documentation of datasets of various kinds that are interested to the community and available for open access (field data, processed data, synthetic data, or models). 【Opinions】Views on important topics and future directions in earthquake science. 【Comments and Replies】Commentaries on a recently published EQS paper is welcome. The authors of the paper commented will be invited to reply. Both the Comment and the Reply are subject to peer review.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信