{"title":"基于对比学习的声预训练枪响识别","authors":"Xianjie Shen, Saimin Ma, Linlin Yang, Yubo Jiang, Zhifeng Xiao, Shuren Xu","doi":"10.1145/3603781.3603908","DOIUrl":null,"url":null,"abstract":"Gun control has become a serious social and political issue in some countries. Automatic, accurate, and fast gunshot recognition technology can assist police in the identification of gun caliber, thus help better track the suspect, speeding up the process of criminal investigation. Recent development in deep learning has brought new opportunities in the area of speech/acoustic recognition. However, lack of sufficient training examples remains a challenge for the training of a robust model. In this paper, we propose an acoustic pre-training method with contrastive learning to capture gunshot-like voice in a rich collection of urban sounds. Specifically, we develop an encoder-decoder model that utilizes more typical samples from external datasets to mine semantic acoustic features in a self-supervised manner. The pre-trained network is then fine-tuned in the downstream task for gunshot recognition. Extensive experiments demonstrate the superiority of our methods compared to existing machine learning methods.","PeriodicalId":391180,"journal":{"name":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Acoustic Pre-training with Contrastive Learning for Gunshot Recognition\",\"authors\":\"Xianjie Shen, Saimin Ma, Linlin Yang, Yubo Jiang, Zhifeng Xiao, Shuren Xu\",\"doi\":\"10.1145/3603781.3603908\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gun control has become a serious social and political issue in some countries. Automatic, accurate, and fast gunshot recognition technology can assist police in the identification of gun caliber, thus help better track the suspect, speeding up the process of criminal investigation. Recent development in deep learning has brought new opportunities in the area of speech/acoustic recognition. However, lack of sufficient training examples remains a challenge for the training of a robust model. In this paper, we propose an acoustic pre-training method with contrastive learning to capture gunshot-like voice in a rich collection of urban sounds. Specifically, we develop an encoder-decoder model that utilizes more typical samples from external datasets to mine semantic acoustic features in a self-supervised manner. The pre-trained network is then fine-tuned in the downstream task for gunshot recognition. Extensive experiments demonstrate the superiority of our methods compared to existing machine learning methods.\",\"PeriodicalId\":391180,\"journal\":{\"name\":\"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3603781.3603908\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3603781.3603908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Acoustic Pre-training with Contrastive Learning for Gunshot Recognition
Gun control has become a serious social and political issue in some countries. Automatic, accurate, and fast gunshot recognition technology can assist police in the identification of gun caliber, thus help better track the suspect, speeding up the process of criminal investigation. Recent development in deep learning has brought new opportunities in the area of speech/acoustic recognition. However, lack of sufficient training examples remains a challenge for the training of a robust model. In this paper, we propose an acoustic pre-training method with contrastive learning to capture gunshot-like voice in a rich collection of urban sounds. Specifically, we develop an encoder-decoder model that utilizes more typical samples from external datasets to mine semantic acoustic features in a self-supervised manner. The pre-trained network is then fine-tuned in the downstream task for gunshot recognition. Extensive experiments demonstrate the superiority of our methods compared to existing machine learning methods.