基于对比学习的声预训练枪响识别

Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things Pub Date : 2023-05-26 DOI:10.1145/3603781.3603908

Xianjie Shen, Saimin Ma, Linlin Yang, Yubo Jiang, Zhifeng Xiao, Shuren Xu

{"title":"基于对比学习的声预训练枪响识别","authors":"Xianjie Shen, Saimin Ma, Linlin Yang, Yubo Jiang, Zhifeng Xiao, Shuren Xu","doi":"10.1145/3603781.3603908","DOIUrl":null,"url":null,"abstract":"Gun control has become a serious social and political issue in some countries. Automatic, accurate, and fast gunshot recognition technology can assist police in the identification of gun caliber, thus help better track the suspect, speeding up the process of criminal investigation. Recent development in deep learning has brought new opportunities in the area of speech/acoustic recognition. However, lack of sufficient training examples remains a challenge for the training of a robust model. In this paper, we propose an acoustic pre-training method with contrastive learning to capture gunshot-like voice in a rich collection of urban sounds. Specifically, we develop an encoder-decoder model that utilizes more typical samples from external datasets to mine semantic acoustic features in a self-supervised manner. The pre-trained network is then fine-tuned in the downstream task for gunshot recognition. Extensive experiments demonstrate the superiority of our methods compared to existing machine learning methods.","PeriodicalId":391180,"journal":{"name":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Acoustic Pre-training with Contrastive Learning for Gunshot Recognition\",\"authors\":\"Xianjie Shen, Saimin Ma, Linlin Yang, Yubo Jiang, Zhifeng Xiao, Shuren Xu\",\"doi\":\"10.1145/3603781.3603908\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gun control has become a serious social and political issue in some countries. Automatic, accurate, and fast gunshot recognition technology can assist police in the identification of gun caliber, thus help better track the suspect, speeding up the process of criminal investigation. Recent development in deep learning has brought new opportunities in the area of speech/acoustic recognition. However, lack of sufficient training examples remains a challenge for the training of a robust model. In this paper, we propose an acoustic pre-training method with contrastive learning to capture gunshot-like voice in a rich collection of urban sounds. Specifically, we develop an encoder-decoder model that utilizes more typical samples from external datasets to mine semantic acoustic features in a self-supervised manner. The pre-trained network is then fine-tuned in the downstream task for gunshot recognition. Extensive experiments demonstrate the superiority of our methods compared to existing machine learning methods.\",\"PeriodicalId\":391180,\"journal\":{\"name\":\"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3603781.3603908\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3603781.3603908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在一些国家，枪支管制已经成为一个严重的社会和政治问题。自动、准确、快速的枪支识别技术可以协助警方识别枪支口径，从而帮助更好地追踪嫌疑人，加快刑事侦查进程。深度学习的最新发展为语音/声学识别领域带来了新的机遇。然而，缺乏足够的训练样本仍然是鲁棒模型训练的一个挑战。在本文中，我们提出了一种基于对比学习的声学预训练方法，用于在丰富的城市声音集合中捕获类似枪声的声音。具体来说，我们开发了一个编码器-解码器模型，该模型利用来自外部数据集的更多典型样本以自监督的方式挖掘语义声学特征。然后在下游任务中对预训练的网络进行微调，以进行枪声识别。大量的实验证明了我们的方法与现有机器学习方法相比的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Acoustic Pre-training with Contrastive Learning for Gunshot Recognition

Gun control has become a serious social and political issue in some countries. Automatic, accurate, and fast gunshot recognition technology can assist police in the identification of gun caliber, thus help better track the suspect, speeding up the process of criminal investigation. Recent development in deep learning has brought new opportunities in the area of speech/acoustic recognition. However, lack of sufficient training examples remains a challenge for the training of a robust model. In this paper, we propose an acoustic pre-training method with contrastive learning to capture gunshot-like voice in a rich collection of urban sounds. Specifically, we develop an encoder-decoder model that utilizes more typical samples from external datasets to mine semantic acoustic features in a self-supervised manner. The pre-trained network is then fine-tuned in the downstream task for gunshot recognition. Extensive experiments demonstrate the superiority of our methods compared to existing machine learning methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things

自引率

0.00%

发文量