{"title":"Efficient Moving Target Detection Using Resource-Constrained Neural Networks","authors":"Dimitris Milioris","doi":"10.1109/ICASSPW59220.2023.10193347","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193347","url":null,"abstract":"In recent years, the widespread use of autonomous vehicles, such as aerial and automotive, has enhanced our abilities to perform target tracking, dispensing our over-reliance on visual features. With the development of computer vision and deep learning techniques, vision-based classification and recognition have recently received special attention in the scientific community. Moreover, recent advances in the field of neural networks with quantized weights and activations down to single bit precision have allowed the development of models that can be deployed in resource-constrained settings, where a trade-off between task performance and efficiency is accepted. In this work we design an efficient single stage object detector based on CenterNet containing a combination of full precision and binary layers. Our model is easy to train and achieves comparable results with a full precision network trained from scratch while requiring an order of magnitude less FLOP. This opens the possibility of deploying an object detector in applications where time is of the essence and a graphical processing unit (GPU) is absent. We train our model and evaluate its performance by comparing with state-of-the-art techniques, obtaining higher accurate results and provide an insight into the design process of resource constrained neural networks involving trade-offs.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129663958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Light-Weight Visualvoice: Neural Network Quantization On Audio Visual Speech Separation","authors":"Yifei Wu, Chenda Li, Y. Qian","doi":"10.1109/ICASSPW59220.2023.10193263","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193263","url":null,"abstract":"As multi-modal systems show superior performance on more tasks, the huge amount of computational resources they need becomes one of the critical problems to be solved. In this work, we explore neural network quantization methods to compress the resource requirement of VisualVoice, a state-of-the-art audio-visual speech separation system. The model is firstly fine-tuned by an ADMM-based quantization-aware training approach to produce the fixed-precision quantized version. Then three strategies, including manual selection, Hessian trace-based selection and KL divergence-based greedy search are explored to find the optimal mixed-precision setting of the model. The result shows that by applying the optimal strategy, we obtain a satisfying trade-off between space, speed and performance for the final system. The KL divergence-based strategy reaches 7.2 dB in SDR at 3-bit equivalent setup, which outperforms the fixed-precision setup and the other two mixed-precision strategies. More-over, we also discuss the influence caused by quantizing different parts of the multi-modal system.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127508569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"List of ICASSP’23 Satellite Workshops:","authors":"","doi":"10.1109/icasspw59220.2023.10192944","DOIUrl":"https://doi.org/10.1109/icasspw59220.2023.10192944","url":null,"abstract":"","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129050152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Secure Integrated Sensing and Communication Downlink Beamforming: A Semidefinite Relaxation Approach With Tightness Guaranteed","authors":"Wai-Yiu Keung, Hoi-To Wai, Wing-Kin Ma","doi":"10.1109/ICASSPW59220.2023.10193088","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193088","url":null,"abstract":"Integrated sensing and communication (ISAC) is considered as a key solution toward spectrum congestion in future generations of wireless system. On the other hand, physical-layer security has recently regained attentions as network-layer encryption is becoming more challenging in 5G and beyond. This paper studies a multiuser MIMO beamforming design for ISAC with physical-layer security. Specifically, we consider a power minimization problem with signal-to-interference-plus-noise ratio constraints and with a Cramér-Rao-based sensing performance constraint. The problem is non-convex, but can in principle be approximated by semidefinite relaxation (SDR) which is a convex optimization-based scheme. The main contribution of this paper lies in showing that, with a nearly harmless modification, the problem can be exactly solved by SDR. Prior works showed that the same ISAC problem without physical-layer security can be solved by SDR, but the proof method therein appears to be inapplicable to our secure ISAC problem. Numerical results are presented to illustrate the efficiency of our SDR for solving the secure ISAC problem.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129276297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lee B. Hinkle, Tristan Pedro, Tyler Lynn, G. Atkinson, V. Metsis
{"title":"Assisted Labeling Visualizer (ALVI): A Semi-Automatic Labeling System For Time-Series Data","authors":"Lee B. Hinkle, Tristan Pedro, Tyler Lynn, G. Atkinson, V. Metsis","doi":"10.1109/ICASSPW59220.2023.10193169","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193169","url":null,"abstract":"Machine learning applications can significantly benefit from large amounts of labeled data, although the task of labeling data is notoriously challenging and time-consuming. This is particularly evident in domains involving human subjects, where labeling time-series signals often necessitates trained professionals. In this work, we introduce the Assisted Labeling Visualizer (ALVI), a system that simplifies the process of labeling data by offering an interactive user interface that visualizes synchronized video, feature-map representations, and raw time-series signals. ALVI also leverages deep learning and self-supervised learning techniques to facilitate the semi-automatic labeling of large amounts of unlabeled data. We demonstrate the capabilities of ALVI on a human activity recognition dataset to showcase its potential for enhancing the labeling process of time-series sensor data.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125555477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning Based UAV Payload Recognition","authors":"L. Sommer, Raphael Spraul","doi":"10.1109/ICASSPW59220.2023.10193235","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193235","url":null,"abstract":"Due to the increased availability of unmanned aerial vehicles (UAVs), the demand for automated counter-UAV systems to protect facilities or areas from misused or threatening UAVs is growing. Fundamental for these systems are fast and accurate detection as well as identification of potential threats to initiate countermeasures. Criteria to classify the potential threat are UAV type and payload. Though thermal or electro optical (EO) imagery have been widely applied for the detection task, other sensor modalities, i.e. acoustic, radar and radio frequency, are predominately used for UAV type and payload classification. In this work, we examine the potential of UAV payload classification in EO imagery, which facilitates direct interpretability by human operators. For this, we compare conventional CNN-based architectures and recent architectures exploiting self-attention mechanisms such as Vision Transformers. The different architectures are trained and evaluated on a novel dataset composed of own recordings of UAVs with and without payload, imagery crawled from the Internet and imagery taken from publicly available UAV datasets.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"182 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123111933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kleanthis Avramidis, Tiantian Feng, Digbalay Bose, Shrikanth S. Narayanan
{"title":"Multimodal Estimation Of Change Points Of Physiological Arousal During Driving","authors":"Kleanthis Avramidis, Tiantian Feng, Digbalay Bose, Shrikanth S. Narayanan","doi":"10.1109/ICASSPW59220.2023.10193718","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193718","url":null,"abstract":"Detecting unsafe driving states, such as stress, drowsiness, and fatigue, is an important component of ensuring driving safety and an essential prerequisite for automatic intervention systems in vehicles. These concerning conditions are primarily connected to the driver’s low or high arousal levels. In this study, we describe a framework for processing multimodal physiological time-series from wearable sensors during driving and locating points of prominent change in drivers’ physiological arousal. These points of change could potentially indicate events that require just-in-time intervention. We apply time-series segmentation on heart rate and breathing rate measurements and quantify their robustness in capturing change points in electrodermal activity, treated as a reference index for arousal, as well as on self-reported stress ratings, using three public datasets. Our experiments demonstrate that physiological measures are veritable indicators of change points of arousal.11Code and results available at https://github.com/usc-sail/ggs driving","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"242 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121156101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"State-of-the-Art in Nudity Classification: A Comparative Analysis","authors":"F. C. Akyon, A. Temi̇zel","doi":"10.1109/ICASSPW59220.2023.10193621","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193621","url":null,"abstract":"This paper presents a comparative analysis of existing nudity classification techniques for classifying images based on the presence of nudity, with a focus on their application in content moderation. The evaluation focuses on CNN-based models, vision transformer, and popular open-source safety checkers from Stable Diffusion and Large-scale Artificial Intelligence Open Network (LAION). The study identifies the limitations of current evaluation datasets and highlights the need for more diverse and challenging datasets. The paper discusses the potential implications of these findings for developing more accurate and effective image classification systems on online platforms. Overall, the study emphasizes the importance of continually improving image classification models to ensure the safety and well-being of platform users. The project page, including the demonstrations and results is publicly available at https://github.com/fcakyon/contentmoderation-deep-learning.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114261562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harry Walsh, Ozge Mercanoglu Sincan, Ben Saunders, R. Bowden
{"title":"Gloss Alignment using Word Embeddings","authors":"Harry Walsh, Ozge Mercanoglu Sincan, Ben Saunders, R. Bowden","doi":"10.1109/ICASSPW59220.2023.10193013","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193013","url":null,"abstract":"Capturing and annotating Sign language datasets is a time consuming and costly process. Current datasets are orders of magnitude too small to successfully train unconstrained Sign Language Translation (SLT) models. As a result, research has turned to TV broadcast content as a source of large-scale training data, consisting of both the sign language interpreter and the associated audio subtitle. However, lack of sign language annotation limits the usability of this data and has led to the development of automatic annotation techniques such as sign spotting. These spottings are aligned to the video rather than the subtitle, which often results in a misalignment between the subtitle and spotted signs. In this paper we propose a method for aligning spottings with their corresponding subtitles using large spoken language models. Using a single modality means our method is computationally inexpensive and can be utilized in conjunction with existing alignment techniques. We quantitatively demonstrate the effectiveness of our method on the Meine DGS-Annotated (MeineDGS) and BBC-Oxford British Sign Language (BOBSL) datasets, recovering up to a 33.22 BLEU-1 score in word alignment.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122526889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation Of A Marine Mesoscale Events Classifier","authors":"M. Reggiannini, O. Papini, G. Pieri","doi":"10.1109/ICASSPW59220.2023.10193234","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193234","url":null,"abstract":"Marine mesoscale phenomena are relevant oceanographic processes that impact on fishery, biodiversity and climate variation. In previous literature, their analysis has been tackled by processing instantaneous remote sensing observations and returning a classification of the observed event. Indeed, these phenomena occur within an extended time range, thus an analysis including time dependence is desirable. Mesoscale Events Classifier (MEC) is an algorithm devoted to the classification of marine mesoscale events in sea surface temperature imagery. By processing time series of satellite temperature observations MEC recognizes the considered area of interest as the domain of one out of a given number of possible events and returns the corresponding label. Objective of this work is to discuss the performance of the MEC pipeline in terms of its capability of correctly capturing the nature of the observed mesoscale process. The evaluation process exploited satellite remote sensing data collected in front of the Portuguese coast.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132547593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}