ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献_第6页

Agent-Environment Network for Temporal Action Proposal Generation 时间动作提案生成的agent -环境网络

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9415101

Viet-Khoa Vo-Ho, Ngan T. H. Le, Kashu Yamazaki, A. Sugimoto, Minh-Triet Tran

{"title":"Agent-Environment Network for Temporal Action Proposal Generation","authors":"Viet-Khoa Vo-Ho, Ngan T. H. Le, Kashu Yamazaki, A. Sugimoto, Minh-Triet Tran","doi":"10.1109/ICASSP39728.2021.9415101","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9415101","url":null,"abstract":"Temporal action proposal generation is an essential and challenging task that aims at localizing temporal intervals containing human actions in untrimmed videos. Most of existing approaches are unable to follow the human cognitive process of understanding the video context due to lack of attention mechanism to express the concept of an action or an agent who performs the action or the interaction between the agent and the environment. Based on the action definition that a human, known as an agent, interacts with the environment and performs an action that affects the environment, we propose a contextual Agent-Environment Network. Our proposed contextual AEN involves (i) agent pathway, operating at a local level to tell about which humans/agents are acting and (ii) environment pathway operating at a global level to tell about how the agents interact with the environment. Comprehensive evaluations on 20-action THUMOS-14 and 200-action ActivityNet-1.3 datasets with different backbone networks, i.e C3D and SlowFast, show that our method robustly exhibits outperformance against state-of-the-art methods regardless of the employed backbone network.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114101872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

An Improved Mean Teacher Based Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection 一种改进的基于平均教师的大规模弱标记半监督声音事件检测方法

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9414931

Xu Zheng, Yan Song, I. Mcloughlin, Lin Liu, Lirong Dai

{"title":"An Improved Mean Teacher Based Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection","authors":"Xu Zheng, Yan Song, I. Mcloughlin, Lin Liu, Lirong Dai","doi":"10.1109/ICASSP39728.2021.9414931","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414931","url":null,"abstract":"This paper presents an improved mean teacher (MT) based method for large-scale weakly labeled semi-supervised sound event detection (SED), by focusing on learning a better student model. Two main improvements are proposed based on the authors’ previous perturbation based MT method. Firstly, an event-aware module is de-signed to allow multiple branches with different kernel sizes to be fused via an attention mechanism. By inserting this module after the convolutional layer, each neuron can adaptively adjust its receptive field to suit different sound events. Secondly, instead of using the teacher model to provide a consistency cost term, we propose using a stochastic inference of unlabeled examples to generate high quality pseudo-targets by averaging multiple predictions from the perturbed student model. MixUp of both labeled and unlabeled data is further exploited to improve the effectiveness of student model. Finally, the teacher model can be obtained via exponential moving average (EMA) of the student model, which generates final predictions for SED during inference. Experiments on the DCASE2018 task4 dataset demonstrate the ability of the proposed method. Specifically, an F1-score of 42.1% is achieved, significantly outperforming the 32.4% achieved by the winning system, or the 39.3% by the previous perturbation based method.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"55 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114114106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Improved Atomic Norm Based Channel Estimation for Time-Varying Narrowband Leaked Channels 时变窄带泄漏信道的改进原子范数信道估计

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9413804

Jianxiu Li, U. Mitra

引用次数: 1

Fast Inverse Mapping of Face GANs 人脸gan的快速逆映射

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9413532

N. Bayat, Vahid Reza Khazaie, Y. Mohsenzadeh

引用次数: 2

What And Where To Focus In Person Search 在个人搜索中关注什么和在哪里

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9414439

Tong Zhou, Kun Tian

引用次数: 0

A New Framework Based on Transfer Learning for Cross-Database Pneumonia Detection 基于迁移学习的跨数据库肺炎检测新框架

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9414997

Xinxin Shan, Y. Wen

引用次数: 1

Multi-Scale and Multi-Region Facial Discriminative Representation for Automatic Depression Level Prediction 基于多尺度多区域面部判别表征的抑郁水平自动预测

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9413504

Mingyue Niu, J. Tao, B. Liu

{"title":"Multi-Scale and Multi-Region Facial Discriminative Representation for Automatic Depression Level Prediction","authors":"Mingyue Niu, J. Tao, B. Liu","doi":"10.1109/ICASSP39728.2021.9413504","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413504","url":null,"abstract":"Physiological studies have shown that differences in facial activities between depressed patients and normal individuals are manifested in different local facial regions and the durations of these activities are not the same. But most previous works extract features from the entire facial region at a fixed time scale to predict the individual depression level. Thus, they are inadequate in capturing dynamic facial changes. For these reasons, we propose a multi-scale and multi-region fa-cial dynamic representation method to improve the prediction performance. In particular, we firstly use multiple time scales to divide the original long-term video into segments containing different facial regions. Secondly, the segment-level feature is extracted by 3D convolution neural network to characterize the facial activities with different durations in different facial regions. Thirdly, this paper adopts eigen evolution pooling and gradient boosting decision tree to aggregate these segment-level features and select discriminative elements to generate the video-level feature. Finally, the depression level is predicted using support vector regression. Experiments are conducted on AVEC2013 and AVEC2014. The results demonstrate that our method achieves better performance than the previous works.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121496866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Teacher-Student Learning for Low-Latency Online Speech Enhancement Using Wave-U-Net 利用Wave-U-Net进行低延迟在线语音增强的师生学习

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9414280

Sotaro Nakaoka, Li Li, S. Inoue, S. Makino

{"title":"Teacher-Student Learning for Low-Latency Online Speech Enhancement Using Wave-U-Net","authors":"Sotaro Nakaoka, Li Li, S. Inoue, S. Makino","doi":"10.1109/ICASSP39728.2021.9414280","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414280","url":null,"abstract":"In this paper, we propose a low-latency online extension of wave-U-net for single-channel speech enhancement, which utilizes teacher-student learning to reduce the system latency while keeping the enhancement performance high. Wave-U-net is a recently proposed end-to-end source separation method, which achieved remarkable performance in singing voice separation and speech enhancement tasks. Since the enhancement is performed in the time domain, wave-U-net can efficiently model phase information and address the domain transformation limitation, where the time-frequency domain is normally adopted. In this paper, we apply wave-U-net to face-to-face applications such as hearing aids and in-car communication systems, where a strictly low-latency of less than 10 ms is required. To this end, we investigate online versions of wave-U-net and propose the use of teacher-student learning to prevent the performance degradation caused by the reduction in input segment length such that the system delay in a CPU is less than 10 ms. The experimental results revealed that the proposed model could perform in real-time with low-latency and high performance, achieving a signal-to-distortion ratio improvement of about 8.73 dB.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121570125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Coughwatch: Real-World Cough Detection using Smartwatches Coughwatch:使用智能手表进行真实咳嗽检测

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9414881

D. Liaqat, S. Liaqat, Jun Lin Chen, Tina Sedaghat, Moshe Gabel, Frank Rudzicz, E. D. Lara

引用次数: 15

History Utterance Embedding Transformer LM for Speech Recognition 用于语音识别的历史话语嵌入变换LM

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9414575

Keqi Deng, Gaofeng Cheng, Haoran Miao, Pengyuan Zhang, Yonghong Yan

引用次数: 3