{"title":"A Note on Totally Symmetric Equi-Isoclinic Tight Fusion Frames","authors":"M. Fickus, Joseph W. Iverson, J. Jasper, D. Mixon","doi":"10.1109/icassp43922.2022.9746835","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746835","url":null,"abstract":"Consider the fundamental problem of arranging r-dimensional subspaces of Rd in such a way that maximizes the minimum distance between unit vectors in different subspaces. It is well known that equi-isoclinic tight fusion frames (EITFFs) are optimal for this packing problem, but such ensembles are notoriously hard to construct. In this paper, we present a novel construction of EITFFs that are totally symmetric: any permutation of the subspaces can be realized by an orthogonal transformation of ℝd.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125718361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sahil Datta, A. Aondoakaa, Jorunn Jo Holmberg, E. Antonova
{"title":"Recognition Of Silently Spoken Word From Eeg Signals Using Dense Attention Network (DAN)","authors":"Sahil Datta, A. Aondoakaa, Jorunn Jo Holmberg, E. Antonova","doi":"10.1109/icassp43922.2022.9746241","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746241","url":null,"abstract":"In this paper, we propose a method for recognizing silently spoken words from electroencephalogram (EEG) signals using a Dense Attention Network (DAN). The proposed network learns features from the EEG data by applying the self-attention mechanism on temporal, spectral, and spatial (electrodes) dimensions. We examined the effectiveness of the proposed network in extracting spatio-spectro-temporal in-formation from EEG signals and provide a network for recognition of silently spoken words. The DAN achieved a recognition rate of 80.7% in leave-trials-out (LTO) and 75.1% in leave-subject-out (LSO) cross validation methods. In a direct comparison with other methods, the DAN outperformed other existing techniques in recognition of silently spoken words.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127918229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangling Ding, Pu Huang, Dengyong Zhang, Xianfeng Zhao
{"title":"Video Frame Interpolation via Local Lightweight Bidirectional Encoding with Channel Attention Cascade","authors":"Xiangling Ding, Pu Huang, Dengyong Zhang, Xianfeng Zhao","doi":"10.1109/icassp43922.2022.9747182","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747182","url":null,"abstract":"Deep Neural Networks based video frame interpolation, synthesizing in-between frames given two consecutive neighboring frames, typically depends on heavy model architectures, preventing them from being deployed on small terminals. When directly adopting the lightweight network architecture from these models, the synthesized frames may suffer from poor visual appearance. In this paper, a lightweight-driven video frame interpolation network (L2BEC2) is proposed. Concretely, we first improve the visual appearance by introducing the bidirectional encoding structure with channel attention cascade to better characterize the motion information; then we further adopt the local network lightweight idea into the aforementioned structure to significantly eliminate its redundant parts of the model parameters. As a result, our L2BEC2 performs favorably at the cost of only one third of the parameters compared with the state-of-the-art methods on public datasets. Our source code is available at https://github.com/Pumpkin123709/LBEC.git.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121439858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guanze Liu, Bo Xu, Han Huang, Cheng Lu, Yandong Guo
{"title":"SDETR: Attention-Guided Salient Object Detection with Transformer","authors":"Guanze Liu, Bo Xu, Han Huang, Cheng Lu, Yandong Guo","doi":"10.1109/icassp43922.2022.9746367","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746367","url":null,"abstract":"Most existing CNN-based salient object detection methods can identify fine-grained segmentation details like hair and animal fur, but often mispredict the salient object due to lack of global contextual information caused by locality convolution layers. The limited training data of the current SOD task adds additional difficulty to capture the saliency information. In this paper, we propose a two-stage predict-refine SDETR model to leverage both benefits of transformer and CNN layers that can produce results with accurate saliency prediction and fine-grained local details. We also propose a novel pre-train dataset annotation COCO SOD to erase the overfitting problem caused by insufficient training data. Comprehensive experiments on five benchmark datasets demonstrate that the SDETR outperforms state-of-the-art approaches on four evaluation metrics, and our COCO SOD can largely improve the model performance on DUTS, ECSSD, DUT, PASCAL-S datasets.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115916725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sensors to Sign Language: A Natural Approach to Equitable Communication","authors":"T. Fouts, Ali Hindy, C. Tanner","doi":"10.1109/icassp43922.2022.9747385","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747385","url":null,"abstract":"Sign Language Recognition (SLR) aims to improve the equity of communication with the hearing impaired. However, SLR typically relies on having recorded videos of the signer. We develop a more natural solution by fitting a signer with arm sensors and classifying the sensor signals directly into language. We refer to this task as Sensors-to-Sign-Language (STSL). While existing STSL systems demonstrate effectiveness with small vocabularies of fewer than 100 words, we aim to determine if STSL can scale to larger, more realistic lexicons. For this purpose, we introduce a new dataset, SignBank, which consists of exactly 6,000 signs, spans 558 distinct words from 15 different novice signers, and constitutes the largest such dataset. By using a simple but effective model for STSL, we demonstrate a strong baseline performance on SignBank. Notably, despite our model having trained on only four signings of each word, it is able to correctly classify new signings with 95.1% accuracy (out of 558 candidate words). This work enables and motivates further development of lightweight, wearable hardware and real-time modelling for SLR.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113961361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Bergler, M. Schmitt, A. Maier, R. Cheng, Volker Barth, E. Nöth
{"title":"ORCA-PARTY: An Automatic Killer Whale Sound Type Separation Toolkit Using Deep Learning","authors":"Christian Bergler, M. Schmitt, A. Maier, R. Cheng, Volker Barth, E. Nöth","doi":"10.1109/icassp43922.2022.9746623","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746623","url":null,"abstract":"Data-driven and machine-based analysis of massive bioacoustic data collections, in particular acoustic regions containing a substantial number of vocalizations events, is essential and extremely valuable to identify recurring vocal paradigms. However, these acoustic sections are usually characterized by a strong incidence of overlapping vocalization events, a major problem severely affecting subsequent human-/machine-based analysis and interpretation. Robust machine-driven signal separation of species-specific call types is extremely challenging due to missing ground truth data, speaker/source-relevant information, limited knowledge about inter- and intra-call type variations, next to diverse recording conditions. The current study is the first introducing a fully-automated deep signal separation approach for overlapping orca vocalizations, addressing all of the previously mentioned challenges, together with one of the largest bioacoustic data archives recorded on killer whales (Orcinus Orca). Incorporating ORCA-PARTY as additional data enhancement step for downstream call type classification demonstrated to be extremely valuable. Besides the proof of cross-domain applicability and consistently promising results on non-overlapping signals, significant improvements were achieved when processing acoustic orca segments comprising a multitude of vocal activities. Apart from auspicious visual inspections, a final numerical evaluation on an unseen dataset proved that about 30 % more known sound patterns could be identified.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131392259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Disentangled Feature-Guided Multi-Exposure High Dynamic Range Imaging","authors":"Keun-Ohk Lee, Y. Jang, N. Cho","doi":"10.1109/icassp43922.2022.9747329","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747329","url":null,"abstract":"Multi-exposure high dynamic range (HDR) imaging aims to generate an HDR image from multiple differently exposed low dynamic range (LDR) images. It is a challenging task due to two major problems: (1) there are usually misalignments among the input LDR images, and (2) LDR images often have incomplete information due to under-/over-exposure. In this paper, we propose a disentangled feature-guided HDR network (DFGNet) to alleviate the above-stated problems. Specifically, we first extract and disentangle exposure features and spatial features of input LDR images. Then, we process these features through the proposed DFG modules, which produce a high-quality HDR image. Experiments show that the proposed DFGNet achieves outstanding performance on a benchmark dataset. Our code and more results are available at https://github.com/KeuntekLee/DFGNet.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132216010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Harmonic and Percussive Sound Separation Based on Mixed Partial Derivative of Phase Spectrogram","authors":"Natsuki Akaishi, K. Yatabe, Yasuhiro Oikawa","doi":"10.1109/icassp43922.2022.9747057","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747057","url":null,"abstract":"Harmonic and percussive sound separation (HPSS) is a widely applied pre-processing tool that extracts distinct (harmonic and percussive) components of a signal. In the previous methods, HPSS has been performed based on the structural properties of magnitude (or power) spectrograms. However, such approach does not take advantage of phase that contains useful information of the waveform. In this paper, we propose a novel HPSS method named MipDroP that relies only on phase and does not use information of magnitude spectrograms. The proposed MipDroP algorithm effectively examines phase through its mixed partial derivative and constructs a pair of masks for the separation. Our experiments showed that MipDroP can extract percussive components better than the other methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130080514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regression Assisted Matrix Completion for Reconstructing a Propagation Field with Application to Source Localization","authors":"Hao Sun, Junting Chen","doi":"10.1109/icassp43922.2022.9746415","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746415","url":null,"abstract":"This paper develops a regression assisted matrix completion method to reconstruct the propagation field for received signal strength (RSS) based source localization without prior knowledge of the propagation model. Existing matrix completion methods did not exploit the fact that the uncertainty of each observed entry is different due to the reality that the sensor density may vary across different locations. This paper proposes to employ local polynomial regression to increase the accuracy of matrix completion. First, the values of selected entries of a matrix are estimated via interpolation from local measurements, and the interpolation error is analyzed. Then, a matrix completion problem that is aware of the different uncertainty of observed entries is formulated and solved. It is demonstrated that the proposed method significantly improves the performance of matrix completion, and as a result, increases the localization accuracy from the numerical results.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133831969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How Secure Are The Adversarial Examples Themselves?","authors":"Hui Zeng, Kang Deng, Biwei Chen, Anjie Peng","doi":"10.1109/ICASSP43922.2022.9747206","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747206","url":null,"abstract":"Existing adversarial example generation algorithms mainly consider the success rate of spoofing target model, but pay little attention to its own security. In this paper, we propose the concept of adversarial example security as how unlikely themselves can be detected. A two-step test is proposed to deal with the adversarial attacks of different strengths. Game theory is introduced to model the interplay between the attacker and the investigator. By solving Nash equilibrium, the optimal strategies of both parties are obtained, and the security of the attacks is evaluated. Five typical attacks are compared on the ImageNet. The results show that a rational attacker tends to use a relatively weak strength. By comparing the ROC curves under Nash equilibrium, it is observed that the constrained perturbation attacks are more secure than the optimized perturbation attacks in face of the two-step test. The proposed framework can be used to evaluate the security of various potential attacks and further the research of adversarial example generation/detection.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134428895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}