2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)最新文献

筛选
英文 中文
Efficient Moving Target Detection Using Resource-Constrained Neural Networks 基于资源约束神经网络的高效运动目标检测
Dimitris Milioris
{"title":"Efficient Moving Target Detection Using Resource-Constrained Neural Networks","authors":"Dimitris Milioris","doi":"10.1109/ICASSPW59220.2023.10193347","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193347","url":null,"abstract":"In recent years, the widespread use of autonomous vehicles, such as aerial and automotive, has enhanced our abilities to perform target tracking, dispensing our over-reliance on visual features. With the development of computer vision and deep learning techniques, vision-based classification and recognition have recently received special attention in the scientific community. Moreover, recent advances in the field of neural networks with quantized weights and activations down to single bit precision have allowed the development of models that can be deployed in resource-constrained settings, where a trade-off between task performance and efficiency is accepted. In this work we design an efficient single stage object detector based on CenterNet containing a combination of full precision and binary layers. Our model is easy to train and achieves comparable results with a full precision network trained from scratch while requiring an order of magnitude less FLOP. This opens the possibility of deploying an object detector in applications where time is of the essence and a graphical processing unit (GPU) is absent. We train our model and evaluate its performance by comparing with state-of-the-art techniques, obtaining higher accurate results and provide an insight into the design process of resource constrained neural networks involving trade-offs.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129663958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Light-Weight Visualvoice: Neural Network Quantization On Audio Visual Speech Separation 轻量级视觉语音:视听语音分离的神经网络量化
Yifei Wu, Chenda Li, Y. Qian
{"title":"Light-Weight Visualvoice: Neural Network Quantization On Audio Visual Speech Separation","authors":"Yifei Wu, Chenda Li, Y. Qian","doi":"10.1109/ICASSPW59220.2023.10193263","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193263","url":null,"abstract":"As multi-modal systems show superior performance on more tasks, the huge amount of computational resources they need becomes one of the critical problems to be solved. In this work, we explore neural network quantization methods to compress the resource requirement of VisualVoice, a state-of-the-art audio-visual speech separation system. The model is firstly fine-tuned by an ADMM-based quantization-aware training approach to produce the fixed-precision quantized version. Then three strategies, including manual selection, Hessian trace-based selection and KL divergence-based greedy search are explored to find the optimal mixed-precision setting of the model. The result shows that by applying the optimal strategy, we obtain a satisfying trade-off between space, speed and performance for the final system. The KL divergence-based strategy reaches 7.2 dB in SDR at 3-bit equivalent setup, which outperforms the fixed-precision setup and the other two mixed-precision strategies. More-over, we also discuss the influence caused by quantizing different parts of the multi-modal system.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127508569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
List of ICASSP’23 Satellite Workshops: ICASSP ' 23卫星讲习班清单:
{"title":"List of ICASSP’23 Satellite Workshops:","authors":"","doi":"10.1109/icasspw59220.2023.10192944","DOIUrl":"https://doi.org/10.1109/icasspw59220.2023.10192944","url":null,"abstract":"","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129050152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secure Integrated Sensing and Communication Downlink Beamforming: A Semidefinite Relaxation Approach With Tightness Guaranteed 安全集成传感和通信下行波束形成:一种紧度保证的半定松弛方法
Wai-Yiu Keung, Hoi-To Wai, Wing-Kin Ma
{"title":"Secure Integrated Sensing and Communication Downlink Beamforming: A Semidefinite Relaxation Approach With Tightness Guaranteed","authors":"Wai-Yiu Keung, Hoi-To Wai, Wing-Kin Ma","doi":"10.1109/ICASSPW59220.2023.10193088","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193088","url":null,"abstract":"Integrated sensing and communication (ISAC) is considered as a key solution toward spectrum congestion in future generations of wireless system. On the other hand, physical-layer security has recently regained attentions as network-layer encryption is becoming more challenging in 5G and beyond. This paper studies a multiuser MIMO beamforming design for ISAC with physical-layer security. Specifically, we consider a power minimization problem with signal-to-interference-plus-noise ratio constraints and with a Cramér-Rao-based sensing performance constraint. The problem is non-convex, but can in principle be approximated by semidefinite relaxation (SDR) which is a convex optimization-based scheme. The main contribution of this paper lies in showing that, with a nearly harmless modification, the problem can be exactly solved by SDR. Prior works showed that the same ISAC problem without physical-layer security can be solved by SDR, but the proof method therein appears to be inapplicable to our secure ISAC problem. Numerical results are presented to illustrate the efficiency of our SDR for solving the secure ISAC problem.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129276297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assisted Labeling Visualizer (ALVI): A Semi-Automatic Labeling System For Time-Series Data 辅助标记可视化器(ALVI):一个半自动标记系统的时间序列数据
Lee B. Hinkle, Tristan Pedro, Tyler Lynn, G. Atkinson, V. Metsis
{"title":"Assisted Labeling Visualizer (ALVI): A Semi-Automatic Labeling System For Time-Series Data","authors":"Lee B. Hinkle, Tristan Pedro, Tyler Lynn, G. Atkinson, V. Metsis","doi":"10.1109/ICASSPW59220.2023.10193169","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193169","url":null,"abstract":"Machine learning applications can significantly benefit from large amounts of labeled data, although the task of labeling data is notoriously challenging and time-consuming. This is particularly evident in domains involving human subjects, where labeling time-series signals often necessitates trained professionals. In this work, we introduce the Assisted Labeling Visualizer (ALVI), a system that simplifies the process of labeling data by offering an interactive user interface that visualizes synchronized video, feature-map representations, and raw time-series signals. ALVI also leverages deep learning and self-supervised learning techniques to facilitate the semi-automatic labeling of large amounts of unlabeled data. We demonstrate the capabilities of ALVI on a human activity recognition dataset to showcase its potential for enhancing the labeling process of time-series sensor data.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125555477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning Based UAV Payload Recognition 基于深度学习的无人机有效载荷识别
L. Sommer, Raphael Spraul
{"title":"Deep Learning Based UAV Payload Recognition","authors":"L. Sommer, Raphael Spraul","doi":"10.1109/ICASSPW59220.2023.10193235","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193235","url":null,"abstract":"Due to the increased availability of unmanned aerial vehicles (UAVs), the demand for automated counter-UAV systems to protect facilities or areas from misused or threatening UAVs is growing. Fundamental for these systems are fast and accurate detection as well as identification of potential threats to initiate countermeasures. Criteria to classify the potential threat are UAV type and payload. Though thermal or electro optical (EO) imagery have been widely applied for the detection task, other sensor modalities, i.e. acoustic, radar and radio frequency, are predominately used for UAV type and payload classification. In this work, we examine the potential of UAV payload classification in EO imagery, which facilitates direct interpretability by human operators. For this, we compare conventional CNN-based architectures and recent architectures exploiting self-attention mechanisms such as Vision Transformers. The different architectures are trained and evaluated on a novel dataset composed of own recordings of UAVs with and without payload, imagery crawled from the Internet and imagery taken from publicly available UAV datasets.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"182 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123111933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Estimation Of Change Points Of Physiological Arousal During Driving 驾驶过程中生理觉醒变化点的多模态估计
Kleanthis Avramidis, Tiantian Feng, Digbalay Bose, Shrikanth S. Narayanan
{"title":"Multimodal Estimation Of Change Points Of Physiological Arousal During Driving","authors":"Kleanthis Avramidis, Tiantian Feng, Digbalay Bose, Shrikanth S. Narayanan","doi":"10.1109/ICASSPW59220.2023.10193718","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193718","url":null,"abstract":"Detecting unsafe driving states, such as stress, drowsiness, and fatigue, is an important component of ensuring driving safety and an essential prerequisite for automatic intervention systems in vehicles. These concerning conditions are primarily connected to the driver’s low or high arousal levels. In this study, we describe a framework for processing multimodal physiological time-series from wearable sensors during driving and locating points of prominent change in drivers’ physiological arousal. These points of change could potentially indicate events that require just-in-time intervention. We apply time-series segmentation on heart rate and breathing rate measurements and quantify their robustness in capturing change points in electrodermal activity, treated as a reference index for arousal, as well as on self-reported stress ratings, using three public datasets. Our experiments demonstrate that physiological measures are veritable indicators of change points of arousal.11Code and results available at https://github.com/usc-sail/ggs driving","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"242 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121156101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
State-of-the-Art in Nudity Classification: A Comparative Analysis 裸体分类的最新进展:比较分析
F. C. Akyon, A. Temi̇zel
{"title":"State-of-the-Art in Nudity Classification: A Comparative Analysis","authors":"F. C. Akyon, A. Temi̇zel","doi":"10.1109/ICASSPW59220.2023.10193621","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193621","url":null,"abstract":"This paper presents a comparative analysis of existing nudity classification techniques for classifying images based on the presence of nudity, with a focus on their application in content moderation. The evaluation focuses on CNN-based models, vision transformer, and popular open-source safety checkers from Stable Diffusion and Large-scale Artificial Intelligence Open Network (LAION). The study identifies the limitations of current evaluation datasets and highlights the need for more diverse and challenging datasets. The paper discusses the potential implications of these findings for developing more accurate and effective image classification systems on online platforms. Overall, the study emphasizes the importance of continually improving image classification models to ensure the safety and well-being of platform users. The project page, including the demonstrations and results is publicly available at https://github.com/fcakyon/contentmoderation-deep-learning.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114261562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gloss Alignment using Word Embeddings 光泽对齐使用词嵌入
Harry Walsh, Ozge Mercanoglu Sincan, Ben Saunders, R. Bowden
{"title":"Gloss Alignment using Word Embeddings","authors":"Harry Walsh, Ozge Mercanoglu Sincan, Ben Saunders, R. Bowden","doi":"10.1109/ICASSPW59220.2023.10193013","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193013","url":null,"abstract":"Capturing and annotating Sign language datasets is a time consuming and costly process. Current datasets are orders of magnitude too small to successfully train unconstrained Sign Language Translation (SLT) models. As a result, research has turned to TV broadcast content as a source of large-scale training data, consisting of both the sign language interpreter and the associated audio subtitle. However, lack of sign language annotation limits the usability of this data and has led to the development of automatic annotation techniques such as sign spotting. These spottings are aligned to the video rather than the subtitle, which often results in a misalignment between the subtitle and spotted signs. In this paper we propose a method for aligning spottings with their corresponding subtitles using large spoken language models. Using a single modality means our method is computationally inexpensive and can be utilized in conjunction with existing alignment techniques. We quantitatively demonstrate the effectiveness of our method on the Meine DGS-Annotated (MeineDGS) and BBC-Oxford British Sign Language (BOBSL) datasets, recovering up to a 33.22 BLEU-1 score in word alignment.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122526889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation Of A Marine Mesoscale Events Classifier 海洋中尺度事件分类器的评价
M. Reggiannini, O. Papini, G. Pieri
{"title":"Evaluation Of A Marine Mesoscale Events Classifier","authors":"M. Reggiannini, O. Papini, G. Pieri","doi":"10.1109/ICASSPW59220.2023.10193234","DOIUrl":"https://doi.org/10.1109/ICASSPW59220.2023.10193234","url":null,"abstract":"Marine mesoscale phenomena are relevant oceanographic processes that impact on fishery, biodiversity and climate variation. In previous literature, their analysis has been tackled by processing instantaneous remote sensing observations and returning a classification of the observed event. Indeed, these phenomena occur within an extended time range, thus an analysis including time dependence is desirable. Mesoscale Events Classifier (MEC) is an algorithm devoted to the classification of marine mesoscale events in sea surface temperature imagery. By processing time series of satellite temperature observations MEC recognizes the considered area of interest as the domain of one out of a given number of possible events and returns the corresponding label. Objective of this work is to discuss the performance of the MEC pipeline in terms of its capability of correctly capturing the nature of the observed mesoscale process. The evaluation process exploited satellite remote sensing data collected in front of the Portuguese coast.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132547593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信