ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
FINT: Field-Aware Interaction Neural Network for Click-Through Rate Prediction 用于点击率预测的场感知交互神经网络
Zhishan Zhao, Sen Yang, Guohui Liu, Dawei Feng, Kele Xu
{"title":"FINT: Field-Aware Interaction Neural Network for Click-Through Rate Prediction","authors":"Zhishan Zhao, Sen Yang, Guohui Liu, Dawei Feng, Kele Xu","doi":"10.1109/ICASSP43922.2022.9747247","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747247","url":null,"abstract":"As a critical component for online advertising and marketing, click-through rate (CTR) prediction has drawn lots of attention from both industry and academia. Recently, deep learning has become the mainstream methodological choice for CTR. Despite sustainable efforts have been made, existing approaches still pose several challenges. On the one hand, high-order interaction between the features is under-explored. On the other hand, high-order interactions may neglect the semantic information from the low-order fields. In this paper, we proposed a novel prediction method, named FINT, that employs the Field-aware INTeraction layer which explicitly captures high-order feature interactions while retaining the low-order field information. To empirically investigate the effectiveness and robustness of the FINT, we perform extensive experiments on the three realistic databases: KDD2012, Criteo and Avazu. The obtained results demonstrate that the FINT can significantly improve the performance compared to the existing methods, without increasing the amount of computation required. Moreover, the proposed method brought about 2.72% increase to the advertising revenue of iQIYI, a big online video app through A/B testing. To better promote the research in CTR field, we released our code as well as reference implementation at: https://github.com/zhishan01/FINT.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116691095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Unlimited Sampling with Local Averages 无限采样与局部平均
Dorian Florescu, A. Bhandari
{"title":"Unlimited Sampling with Local Averages","authors":"Dorian Florescu, A. Bhandari","doi":"10.1109/ICASSP43922.2022.9747127","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747127","url":null,"abstract":"Signal saturation or clipping is a fundamental bottleneck that limits the capability of analog-to-digital converters (ADCs). The problem arises when the input signal dynamic range is larger than ADC’s dynamic range. To overcome this issue, an alternative acquisition protocol called the Unlimited Sensing Framework (USF) was recently proposed. This non-linear sensing scheme incorporates signal folding (via modulo non-linearity) before sampling. Reconstruction then entails \"unfolding\" of the high dynamic range input. Taking an end-to-end approach to the USF, a hardware validation called US-ADC was recently presented. US-ADC experiments show that, in some scenarios, the samples can be more accurately modelled as local averages than ideal, pointwise measurements. In particular, this happens when the input signal frequency is much larger than the operational bandwidth of the US-ADC. Pushing such hardware limits using computational approaches motivates the study of modulo sampling and reconstruction via local averages. By incorporating a modulo-hysteresis model, both in theory and in hardware, we present a guaranteed recovery algorithm for input reconstruction. We also explore a practical method suited for low sampling rates. Our approach is validated via simulations and experiments on hardware, thus enabling a step closer to practice.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116905397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An Efficient Framework for Detection and Recognition of Numerical Traffic Signs 一种有效的数字交通标志检测与识别框架
Zhishan Li, Mingmu Chen, Yifan He, Lei Xie, H. Su
{"title":"An Efficient Framework for Detection and Recognition of Numerical Traffic Signs","authors":"Zhishan Li, Mingmu Chen, Yifan He, Lei Xie, H. Su","doi":"10.1109/ICASSP43922.2022.9747406","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747406","url":null,"abstract":"Due to the variety of categories and uneven distribution of available samples, automatic traffic sign detection and recognition is still a challenging task. For those categories with less training data, existing deep learning methods cannot achieve desirable performance, and the overall detection effect is not satisfactory as well. In this letter, we fully explore the relationship between different traffic signs with digital characters and transform the category objects into multi-level classes to alleviate the uneven distribution of samples. We design a lightweight two-stage object detection framework with high real-time performance. The first stage network is proposed to obtain the category groups of traffic signs, and then we construct another object detection network to identify the digital characters of the detected traffic signs. To make the prediction in the first stage more accurate, we put forward a boxes fusion algorithm in the post-processing process and a refine module to improve the recognition performance. Experimental results show that our approach possesses significantly improved performance compared with the latest object detection networks and other traffic sign detectors. Even some traffic signs that only exist in testset can also be recognized accurately by our method.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121235393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
DeepGBASS: Deep Guided Boundary-Aware Semantic Segmentation 深度引导边界感知语义分割
Qingfeng Liu, Hai Su, Mostafa El-Khamy, Kee-Bong Song
{"title":"DeepGBASS: Deep Guided Boundary-Aware Semantic Segmentation","authors":"Qingfeng Liu, Hai Su, Mostafa El-Khamy, Kee-Bong Song","doi":"10.1109/ICASSP43922.2022.9747892","DOIUrl":"https://doi.org/10.1109/ICASSP43922.2022.9747892","url":null,"abstract":"Image semantic segmentation is ubiquitously used in scene understanding applications, such as AI Camera, which require high accuracy and efficiency. Deep learning has significantly advanced the state-of-the-art in semantic segmentation. However, many of recent semantic segmentation works only consider class accuracy and ignore the accuracies at the boundaries between semantic classes. To improve the semantic boundary accuracy, we propose low complexity Deep Guided Decoder (DGD) networks, trained with a novel Semantic Boundary-Aware Learning (SBAL) strategy. Our ablation studies on Cityscapes and the ADE20K-32 confirm the effectiveness of our approach with network of different complexities. We show that our DeepGBASS approach significantly improves the mIoU by up to 11% relative gain and the mean boundary F1-score (mBF) by up to 39.4% when training MobileNetEdgeTPU DeepLab on ADE20K-32 dataset.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127192443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical Deep Learning Model with Inertial and Physiological Sensors Fusion for Wearable-Based Human Activity Recognition 基于惯性和生理传感器融合的可穿戴人体活动识别层次深度学习模型
Dae Yon Hwang, Pai Chet Ng, Yuanhao Yu, Yang Wang, P. Spachos, D. Hatzinakos, K. Plataniotis
{"title":"Hierarchical Deep Learning Model with Inertial and Physiological Sensors Fusion for Wearable-Based Human Activity Recognition","authors":"Dae Yon Hwang, Pai Chet Ng, Yuanhao Yu, Yang Wang, P. Spachos, D. Hatzinakos, K. Plataniotis","doi":"10.1109/icassp43922.2022.9747471","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747471","url":null,"abstract":"This paper presents a human activity recognition (HAR) system with wearable devices. While various approaches have been suggested for HAR, most of them focus on either 1) the inertial sensors to capture the physical movement or 2) subject-dependent evaluations that are less practical to real world cases. To this end, our work integrates sensing in-puts from physiological sensors to compensate the limitation of inertial sensors in capturing the human activities with less physical movements. Physiological sensors can capture physiological responses reflecting human behaviors in executing daily activities. To simulate a realistic application, three different evaluation scenarios are considered, namely All-access, Cross-subject and Cross-activity. Lastly, we propose a Hierarchical Deep Learning (HDL) model, which improves the accuracy and stability of HAR, compared to conventional models. Our proposed HDL with fusion of inertial and physiological sensing inputs achieves 97.16%, 92.23%, 90.18% average accuracy in All-access, Cross-subject, Cross-activity scenarios, which confirms the effectiveness of our approach.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127372310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Improved Beamforming Encoding for Joint Radar and Communication 联合雷达与通信的改进波束形成编码
Tuomas Aittomäki, V. Koivunen
{"title":"Improved Beamforming Encoding for Joint Radar and Communication","authors":"Tuomas Aittomäki, V. Koivunen","doi":"10.1109/icassp43922.2022.9747241","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747241","url":null,"abstract":"Integrated Sensing and Communication Systems (ISAC) that are capable of functioning both as radars and communication systems have a tremendous potential to provide significant performance gains and cost savings and facilitate the sharing of the same energy, spectral, and hardware resources. Managing interference in frequency and spatial domains is a crucial task in ISAC. We consider a radar-centric scenario where a radar system is able to do beamforming and transmit communications data on the side. We propose an improved method allowing good control of the transmit beampattern power resulting in lower communication error level. Furthermore, we also propose straightforward method for phase coding of information in radar signals.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127515305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Latency Human-Computer Auditory Interface Based on Real-Time Vision Analysis 基于实时视觉分析的低延迟人机听觉界面
Florian Scalvini, Camille Bordeau, Maxime Ambard, C. Migniot, Julien Dubois
{"title":"Low-Latency Human-Computer Auditory Interface Based on Real-Time Vision Analysis","authors":"Florian Scalvini, Camille Bordeau, Maxime Ambard, C. Migniot, Julien Dubois","doi":"10.1109/icassp43922.2022.9747094","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747094","url":null,"abstract":"This paper proposes a visuo-auditory substitution method to assist visually impaired people in scene understanding. Our approach focuses on person localisation in the user’s vicinity in order to ease urban walking. Since a real-time and low-latency is required in this context for user’s security, we propose an embedded system. The processing is based on a lightweight convolutional neural network to perform an efficient 2D person localisation. This measurement is enhanced with the corresponding person depth information, and is then transcribed into a stereophonic signal via a head-related transfer function. A GPU-based implementation is presented that enables a real-time processing to be reached at 23 frames/s on a 640x480 video stream. We show with an experiment that this method allows for a real-time accurate audio-based localization.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124894974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dual Attention Pooling Network for Recording Device Classification Using Neutral and Whispered Speech 基于中性和耳语语音的录音设备分类双注意池网络
Abinay Reddy Naini, B. Singhal, P. Ghosh
{"title":"Dual Attention Pooling Network for Recording Device Classification Using Neutral and Whispered Speech","authors":"Abinay Reddy Naini, B. Singhal, P. Ghosh","doi":"10.1109/icassp43922.2022.9747700","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747700","url":null,"abstract":"In this work, we proposed a method for recording device classification using the recorded speech signal. With the rapid increase in different mobile and professional recording devices, determining the source device has many applications in forensics and in further improving various speech-based applications. This paper proposes dual and single attention pooling-based convolutional neural networks (CNN) for recording device classification using neutral and whispered speech. Experiments using five recording devices with simultaneous direct recordings from 88 speakers speaking both in neutral and whisper and recordings from 21 mobile devices with simultaneous playback recordings reveal that the proposed dual attention pooling based CNN method performs better than the best baseline scheme. We show that we achieve a better performance in recording device classification with whispered speech recordings than corresponding neutral speech. We also demonstrate the importance of voiced/unvoiced speech and different frequency bands in classifying the recording devices.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125098711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Neural Network-Based Compression Framework for DOA Estimation Exploiting Distributed Array 基于神经网络的分布式阵列DOA估计压缩框架
S. Pavel, Yimin D. Zhang
{"title":"Neural Network-Based Compression Framework for DOA Estimation Exploiting Distributed Array","authors":"S. Pavel, Yimin D. Zhang","doi":"10.1109/icassp43922.2022.9746724","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746724","url":null,"abstract":"Distributed array consisting of multiple subarrays is attractive for high-resolution direction-of-arrival (DOA) estimation when a large-scale array is infeasible. To achieve effective distributed DOA estimation, it is required to transmit information observed at the subarrays to the fusion center, where DOA estimation is performed. For noncoherent data fusion, the covariance matrices are used for subarray fusion. To address the complexity involved with the large array size, we propose a compression framework consisting of multiple parallel encoders and a classifier. The parallel encoders at the distributed subarrays are trained to compress the respective covariance matrices. The compressed results are sent to the fusion center where the signal DOAs are estimated using a classifier based on the compressed covariance matrices.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125835525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Conjugate Augmented Spatial-Temporal Near-Field Sources Localization with Cross Array 基于交叉阵的共轭增强时空近场源定位
Zhi-Min Jiang, Hua Chen, Wei Liu, Ye Tian, G. Wang
{"title":"Conjugate Augmented Spatial-Temporal Near-Field Sources Localization with Cross Array","authors":"Zhi-Min Jiang, Hua Chen, Wei Liu, Ye Tian, G. Wang","doi":"10.1109/icassp43922.2022.9746864","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746864","url":null,"abstract":"A new near-field source localization method is proposed for two-dimensional (2-D) direction-of-arrival (DOA) and range estimation based on a symmetrical cross array. It first employs the conjugate symmetry property of the signal auto-correlation at different time delays to construct a conjugate augmented spatial-temporal cross correlation matrix, then the extended steering vector is decoupled to avoid the usual multiple-dimensional (M-D) search based on the properties of the Khatri-Rao product, and finally three one-dimensional (1-D) MUSIC type searches are employed to obtain the results. The proposed method can realize automatic pairing of multiple parameters associated with each source and it also works in the underdetermined case.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126108413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信