A. Ulvog, Joshua Rapp, T. Koike-Akino, Hassan Mansour, P. Boufounos, K. Parsons
{"title":"Phase Unwrapping in Correlated Noise for FMCW Lidar Depth Estimation","authors":"A. Ulvog, Joshua Rapp, T. Koike-Akino, Hassan Mansour, P. Boufounos, K. Parsons","doi":"10.1109/ICASSP49357.2023.10095456","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095456","url":null,"abstract":"In frequency-modulated continuous-wave (FMCW) lidar, the distance to an illuminated target is proportional to the beat frequency of the interference signal. Laser phase noise often limits the range accuracy of FMCW lidar, and existing frequency estimation methods make overly simplistic assumptions about the noise model. In this work, we propose an algorithm that performs frequency estimation via phase unwrapping by explicitly accounting for correlations in the phase noise. Given a candidate frequency, we approximately recover the maximum likelihood unwrapping sequence using the Viterbi algorithm and the phase noise statistics. The algorithm then alternates between unwrapping and frequency estimate refinement until convergence. Compared to state-of-the-art alternatives, our algorithm consistently achieves superior performance at long range or with large-linewidth lasers when the signal-to-noise ratio is sufficiently high.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128118269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Davide Albertini, Gioele Greco, A. Bernardini, A. Sarti
{"title":"Diffusion-Based Sound Source Localization Using Networks of Planar Microphone Arrays","authors":"Davide Albertini, Gioele Greco, A. Bernardini, A. Sarti","doi":"10.1109/ICASSP49357.2023.10095405","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095405","url":null,"abstract":"In this work, we propose a novel approach for distributed 3D sound source localization and tracking based on networks of planar microphone arrays, each of which estimates a 2D Direction Of Arrival (DOA). The proposed method is computationally distributed and eliminates the need for a specialized node to collect and process all information. Sound source localization is achieved by considering the task as a distributed optimization problem approached using the Adapt Then Combine (ATC) diffusion technique. This approach also allows the development of cooperation strategies between sensor nodes (i.e., microphone arrays). We propose the use of a cooperation strategy that improves the localization accuracy by exploiting the estimated error statistics of each sensor node and penalizing the noisy arrays. We then evaluate the proposed approach in terms of localization accuracy and robustness to noisy sensor measurements.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125603892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contrastive Learning-Based Audio to Lyrics Alignment for Multiple Languages","authors":"Simon Durand, D. Stoller, Sebastian Ewert","doi":"10.1109/ICASSP49357.2023.10096725","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096725","url":null,"abstract":"Lyrics alignment gained considerable attention in recent years. State-of-the-art systems either re-use established speech recognition toolkits, or design end-to-end solutions involving a Connectionist Temporal Classification (CTC) loss. However, both approaches suffer from specific weaknesses: toolkits are known for their complexity, and CTC systems use a loss designed for transcription which can limit alignment accuracy. In this paper, we use instead a contrastive learning procedure that derives cross-modal embeddings linking the audio and text domains. This way, we obtain a novel system that is simple to train end-to-end, can make use of weakly annotated training data, jointly learns a powerful text model, and is tailored to alignment. The system is not only the first to yield an average absolute error below 0.2 seconds on the standard Jamendo dataset but it is also robust to other languages, even when trained on English data only. Finally, we release word-level alignments for the JamendoLyrics Multi-Lang dataset.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127901792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-Speed Drone Detection Based On Yolo-V8","authors":"Jun-Hwa Kim, Namho Kim, C. Won","doi":"10.1109/ICASSP49357.2023.10095516","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095516","url":null,"abstract":"Detecting drones in a video is a challenging problem due to their dynamic movements and varying range of scales. Moreover, since drone detection is often required for security, it should be as fast as possible. In this paper, we modify the state-of-the-art YOLO-V8 to achieve fast and reliable drone detection. Specifically, we add Multi-Scale Image Fusion and P2 Layer to the medium-size model (M-model) of YOLO-V8. Our model was evaluated in the 6th WOSDETC challenge.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127977906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Adaptive Reasoning on Sub-Questions for Multi-Hop Question Answering","authors":"Zekai Li, Wei Peng","doi":"10.1109/ICASSP49357.2023.10097206","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10097206","url":null,"abstract":"In this paper, we present the Self-Adapting Reasoning Model (SAR) for solving multi-hop question answering (MHQA) tasks, where the QA system is supposed to find the correct answer within the given multiple documents and a multi-hop question. One feasible track on MHQA is question decomposition, based on the idea that a multi-hop question is usually made from several single-hop questions, which are much easier to answer. However, ignoring the inner connection between sub-questions, existing works usually train additional single-hop question-answering models and answer sub-questions separately. To tackle this problem, we design an end-to-end self-adaptive multi-hop reasoning model. Specifically, given a multi-hop question, a question decomposer first decomposes it into two simple questions and identifies the question type. Then, based on the question type, different reasoning strategies are applied for reasoning. This enables our model to be self-adapting and more explainable regarding different types of questions. Experiments are carried out to demonstrate the effectiveness of our model, and SAR achieves remarkable results on the HotpotQA dataset.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"298 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121454225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xun Wu, Guolong Wang, Zhaoyuan Liu, Xuan Dang, Zheng Qin
{"title":"Instance-Aware Hierarchical Structured Policy for Prompt Learning in Vision-Language Models","authors":"Xun Wu, Guolong Wang, Zhaoyuan Liu, Xuan Dang, Zheng Qin","doi":"10.1109/ICASSP49357.2023.10095231","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095231","url":null,"abstract":"In recent years, learnable prompts have emerged as a major prompt learning paradigm, enhancing the performance of large-scale vision-language pre-trained models in few-shot image classification. However, enhancing methods are often time-consuming and inflexible because 1) class-specific prompts are inefficient in certain situations; 2) instance-specific prompts are put in a fixed position. To address these issues, inspired by the coarse-to-fine decision-making paradigm of human, we propose an Instance-Aware Hierarchical-Structured Policy (IAHSP) that integrates instance-specific prompt selection and appropriate position selection using a reinforcement learning fashion. Specifically, IAHSP consists of two sub-policies: 1) the root policy selects the most suitable prompt from the prompts pool, and 2) the leaf policy identifies the optimal position for inserting the selected prompt. We train these two policies iteratively with rewards constraining the prompts while maintaining their diversity. Extensive experiments on 11 public benchmarks demonstrate that our IAHSP significantly boosts the few-shot image classification performance of vision-language pre-trained models, while also exhibiting superior generalization performance.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121718726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Adaptive DFE Using Light-Pattern-Protection Algorithm in 12 NM CMOS Technology","authors":"Shi Xing, Changlong Lin, Yuchen Li, Huandong Wang","doi":"10.1109/ICASSP49357.2023.10097007","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10097007","url":null,"abstract":"The sign-sign least-mean-squares (SSLMS) algorithm has been widely used in decision feedback equalizer (DFE) adaptation. However, the convergence direction of DFE tap coefficients in the training process is closely related to the data flow. In the case of extreme data flow, the coefficients may converge to inaccurate values, resulting in DFE sampling errors. This article proposes a novel light-pattern-protection (LPP) algorithm to achieve robustness. The LPP guarantees the convergence direction in extreme data flow and brings no loss of convergence rate in a balanced situation. Another advantage of LPP is good scalability, which can be demonstrated in two points. One point is that the convergence time does not increase as the number of DFE taps. The other is that extending the algorithm to the traditional SSLMS scheme requires insignificant hardware and power consumption.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121751321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cong Xu, Yuhang Li, Dae Lee, Dae Hoon Park, Hongda Mao, Huyen Do, Jonathan Chung, Dinesh Nair
{"title":"Augmentation Robust Self-Supervised Learning for Human Activity Recognition","authors":"Cong Xu, Yuhang Li, Dae Lee, Dae Hoon Park, Hongda Mao, Huyen Do, Jonathan Chung, Dinesh Nair","doi":"10.1109/ICASSP49357.2023.10096151","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096151","url":null,"abstract":"Human Activity Recognition (HAR) is widely applied on wearable devices in our daily lives. However, acquiring high-quality wearable sensor data set with ground-truths is challenging due to the high cost in collecting data and necessity of domain experts. In order to achieve generalization from limited data, we study augmentation-based Self-Supervised Learning (SSL) for data from wearable devices. However, there is an issue in one of the most popular SSL approaches, contrastive learning: it is sensitive to the choice of data augmentations. To resolve this, we first propose to combine contrastive learning with generative learning, which is robust to augmentations. Second, we propose an automatic augmentation policy search method to discover the most promising augmentation policy. We empirically verify our approaches on three public HAR datasets. Experimental results show that our proposed SSL approach is robust to augmentations, and delivers higher accuracy than contrastive learning. Additionally, with the searched augmentation policy we are able to further improve the accuracy of HAR task.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121753144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Brain Network Features Differentiate Intentions from Different Emotional Expressions of the Same Text","authors":"Zhongjie Li, Bin Zhao, Gaoyan Zhang, J. Dang","doi":"10.1109/ICASSP49357.2023.10095376","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095376","url":null,"abstract":"Intent differentiation in speech communication relies not only on linguistic information but also on paralinguistic information. The same textual content, when pronounced with different prosodies and emotions, may express totally different intentions. The true intentions in this condition can be easily grasped by our brain. Therefore, combining text, speech, and electroencephalography (EEG) for intent discrimination on the same text may be an effective approach. Before fusing speech and text modalities, the current study focused on exploring effective EEG-based features for Chinese intent recognition as no previous research has utilized EEG signals for this purpose. To tackle this issue, we first created a Chinese multimodal spoken language intention understanding (CMSLIU) dataset, in which the same texts were pronounced with varying prosodies to express different intents. To identify effective brain features that were most relevant to intent recognition improvement, we compared the event-related spectral perturbation and effective brain connectivity patterns on two intent conditions (praise vs. irony). It was found that the praise expression tended to elicit stronger high-frequency brain activities while the irony expression involved a more suppressive network connection in the right hemisphere. These features were trained on the CMSLIU dataset and achieved an intention classification accuracy of 78.66%, which indicated a great potential of the EEG features in intent discrimination on the same text.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131386922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William Chettleburgh, Zhishen Huang, Ming-Hsuan Yang
{"title":"Fast Robust Principle Component Analysis Using Gauss-Newton Iterations","authors":"William Chettleburgh, Zhishen Huang, Ming-Hsuan Yang","doi":"10.1109/ICASSP49357.2023.10096269","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096269","url":null,"abstract":"Robust Principal Component Analysis (RPCA) is an optimization problem that decomposes a data matrix into a low-rank and a sparse matrix. However, solving this problem using alternating procedures requires sequentially computing singular value decompositions (SVDs) of large matrices, which is computationally expensive. In this work, we propose a computation protocol that leverages Gauss-Newton iterations to speed up the sequential computation of SVDs and accelerate the entire RPCA process. Our method is validated on synthetic and video data, benchmarked against established RPCA algorithms, and analyzed for stability with respect to hyperparameters. Our proposed protocol can also be applied to problems that require repeated computation of the proximal of functions that solely depend on singular values of the input matrix.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131395333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}