2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)最新文献_第3页

Efficient Moving Target Detection Using Resource-Constrained Neural Networks 基于资源约束神经网络的高效运动目标检测

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) Pub Date : 2023-06-04 DOI: 10.1109/ICASSPW59220.2023.10193347

Dimitris Milioris

{"title":"Efficient Moving Target Detection Using Resource-Constrained Neural Networks","authors":"Dimitris Milioris","doi":"10.1109/ICASSPW59220.2023.10193347","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193347","url":null,"abstract":"In recent years, the widespread use of autonomous vehicles, such as aerial and automotive, has enhanced our abilities to perform target tracking, dispensing our over-reliance on visual features. With the development of computer vision and deep learning techniques, vision-based classification and recognition have recently received special attention in the scientific community. Moreover, recent advances in the field of neural networks with quantized weights and activations down to single bit precision have allowed the development of models that can be deployed in resource-constrained settings, where a trade-off between task performance and efficiency is accepted. In this work we design an efficient single stage object detector based on CenterNet containing a combination of full precision and binary layers. Our model is easy to train and achieves comparable results with a full precision network trained from scratch while requiring an order of magnitude less FLOP. This opens the possibility of deploying an object detector in applications where time is of the essence and a graphical processing unit (GPU) is absent. We train our model and evaluate its performance by comparing with state-of-the-art techniques, obtaining higher accurate results and provide an insight into the design process of resource constrained neural networks involving trade-offs.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129663958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Light-Weight Visualvoice: Neural Network Quantization On Audio Visual Speech Separation 轻量级视觉语音:视听语音分离的神经网络量化

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) Pub Date : 2023-06-04 DOI: 10.1109/ICASSPW59220.2023.10193263

Yifei Wu, Chenda Li, Y. Qian

{"title":"Light-Weight Visualvoice: Neural Network Quantization On Audio Visual Speech Separation","authors":"Yifei Wu, Chenda Li, Y. Qian","doi":"10.1109/ICASSPW59220.2023.10193263","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193263","url":null,"abstract":"As multi-modal systems show superior performance on more tasks, the huge amount of computational resources they need becomes one of the critical problems to be solved. In this work, we explore neural network quantization methods to compress the resource requirement of VisualVoice, a state-of-the-art audio-visual speech separation system. The model is firstly fine-tuned by an ADMM-based quantization-aware training approach to produce the fixed-precision quantized version. Then three strategies, including manual selection, Hessian trace-based selection and KL divergence-based greedy search are explored to find the optimal mixed-precision setting of the model. The result shows that by applying the optimal strategy, we obtain a satisfying trade-off between space, speed and performance for the final system. The KL divergence-based strategy reaches 7.2 dB in SDR at 3-bit equivalent setup, which outperforms the fixed-precision setup and the other two mixed-precision strategies. More-over, we also discuss the influence caused by quantizing different parts of the multi-modal system.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127508569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

List of ICASSP’23 Satellite Workshops: ICASSP ' 23卫星讲习班清单:

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) Pub Date : 2023-06-04 DOI: 10.1109/icasspw59220.2023.10192944

引用次数: 0

Secure Integrated Sensing and Communication Downlink Beamforming: A Semidefinite Relaxation Approach With Tightness Guaranteed 安全集成传感和通信下行波束形成:一种紧度保证的半定松弛方法

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) Pub Date : 2023-06-04 DOI: 10.1109/ICASSPW59220.2023.10193088

Wai-Yiu Keung, Hoi-To Wai, Wing-Kin Ma

{"title":"Secure Integrated Sensing and Communication Downlink Beamforming: A Semidefinite Relaxation Approach With Tightness Guaranteed","authors":"Wai-Yiu Keung, Hoi-To Wai, Wing-Kin Ma","doi":"10.1109/ICASSPW59220.2023.10193088","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193088","url":null,"abstract":"Integrated sensing and communication (ISAC) is considered as a key solution toward spectrum congestion in future generations of wireless system. On the other hand, physical-layer security has recently regained attentions as network-layer encryption is becoming more challenging in 5G and beyond. This paper studies a multiuser MIMO beamforming design for ISAC with physical-layer security. Specifically, we consider a power minimization problem with signal-to-interference-plus-noise ratio constraints and with a Cramér-Rao-based sensing performance constraint. The problem is non-convex, but can in principle be approximated by semidefinite relaxation (SDR) which is a convex optimization-based scheme. The main contribution of this paper lies in showing that, with a nearly harmless modification, the problem can be exactly solved by SDR. Prior works showed that the same ISAC problem without physical-layer security can be solved by SDR, but the proof method therein appears to be inapplicable to our secure ISAC problem. Numerical results are presented to illustrate the efficiency of our SDR for solving the secure ISAC problem.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129276297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assisted Labeling Visualizer (ALVI): A Semi-Automatic Labeling System For Time-Series Data 辅助标记可视化器(ALVI):一个半自动标记系统的时间序列数据

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) Pub Date : 2023-06-04 DOI: 10.1109/ICASSPW59220.2023.10193169

Lee B. Hinkle, Tristan Pedro, Tyler Lynn, G. Atkinson, V. Metsis

引用次数: 0

Deep Learning Based UAV Payload Recognition 基于深度学习的无人机有效载荷识别

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) Pub Date : 2023-06-04 DOI: 10.1109/ICASSPW59220.2023.10193235

L. Sommer, Raphael Spraul

{"title":"Deep Learning Based UAV Payload Recognition","authors":"L. Sommer, Raphael Spraul","doi":"10.1109/ICASSPW59220.2023.10193235","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193235","url":null,"abstract":"Due to the increased availability of unmanned aerial vehicles (UAVs), the demand for automated counter-UAV systems to protect facilities or areas from misused or threatening UAVs is growing. Fundamental for these systems are fast and accurate detection as well as identification of potential threats to initiate countermeasures. Criteria to classify the potential threat are UAV type and payload. Though thermal or electro optical (EO) imagery have been widely applied for the detection task, other sensor modalities, i.e. acoustic, radar and radio frequency, are predominately used for UAV type and payload classification. In this work, we examine the potential of UAV payload classification in EO imagery, which facilitates direct interpretability by human operators. For this, we compare conventional CNN-based architectures and recent architectures exploiting self-attention mechanisms such as Vision Transformers. The different architectures are trained and evaluated on a novel dataset composed of own recordings of UAVs with and without payload, imagery crawled from the Internet and imagery taken from publicly available UAV datasets.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"182 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123111933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal Estimation Of Change Points Of Physiological Arousal During Driving 驾驶过程中生理觉醒变化点的多模态估计

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) Pub Date : 2023-06-04 DOI: 10.1109/ICASSPW59220.2023.10193718

Kleanthis Avramidis, Tiantian Feng, Digbalay Bose, Shrikanth S. Narayanan

{"title":"Multimodal Estimation Of Change Points Of Physiological Arousal During Driving","authors":"Kleanthis Avramidis, Tiantian Feng, Digbalay Bose, Shrikanth S. Narayanan","doi":"10.1109/ICASSPW59220.2023.10193718","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193718","url":null,"abstract":"Detecting unsafe driving states, such as stress, drowsiness, and fatigue, is an important component of ensuring driving safety and an essential prerequisite for automatic intervention systems in vehicles. These concerning conditions are primarily connected to the driver’s low or high arousal levels. In this study, we describe a framework for processing multimodal physiological time-series from wearable sensors during driving and locating points of prominent change in drivers’ physiological arousal. These points of change could potentially indicate events that require just-in-time intervention. We apply time-series segmentation on heart rate and breathing rate measurements and quantify their robustness in capturing change points in electrodermal activity, treated as a reference index for arousal, as well as on self-reported stress ratings, using three public datasets. Our experiments demonstrate that physiological measures are veritable indicators of change points of arousal.11Code and results available at https://github.com/usc-sail/ggs driving","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"242 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121156101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

State-of-the-Art in Nudity Classification: A Comparative Analysis 裸体分类的最新进展:比较分析

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) Pub Date : 2023-06-04 DOI: 10.1109/ICASSPW59220.2023.10193621

F. C. Akyon, A. Temi̇zel

引用次数: 0

Gloss Alignment using Word Embeddings 光泽对齐使用词嵌入

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) Pub Date : 2023-06-04 DOI: 10.1109/ICASSPW59220.2023.10193013

Harry Walsh, Ozge Mercanoglu Sincan, Ben Saunders, R. Bowden

{"title":"Gloss Alignment using Word Embeddings","authors":"Harry Walsh, Ozge Mercanoglu Sincan, Ben Saunders, R. Bowden","doi":"10.1109/ICASSPW59220.2023.10193013","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193013","url":null,"abstract":"Capturing and annotating Sign language datasets is a time consuming and costly process. Current datasets are orders of magnitude too small to successfully train unconstrained Sign Language Translation (SLT) models. As a result, research has turned to TV broadcast content as a source of large-scale training data, consisting of both the sign language interpreter and the associated audio subtitle. However, lack of sign language annotation limits the usability of this data and has led to the development of automatic annotation techniques such as sign spotting. These spottings are aligned to the video rather than the subtitle, which often results in a misalignment between the subtitle and spotted signs. In this paper we propose a method for aligning spottings with their corresponding subtitles using large spoken language models. Using a single modality means our method is computationally inexpensive and can be utilized in conjunction with existing alignment techniques. We quantitatively demonstrate the effectiveness of our method on the Meine DGS-Annotated (MeineDGS) and BBC-Oxford British Sign Language (BOBSL) datasets, recovering up to a 33.22 BLEU-1 score in word alignment.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122526889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluation Of A Marine Mesoscale Events Classifier 海洋中尺度事件分类器的评价

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) Pub Date : 2023-06-04 DOI: 10.1109/ICASSPW59220.2023.10193234

M. Reggiannini, O. Papini, G. Pieri

引用次数: 0