ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Double-DCCCAE: Estimation of Body Gestures From Speech Waveform 双dcccae:从语音波形中估计肢体动作
Jinhong Lu, Tianhang Liu, Shuzhuang Xu, H. Shimodaira
{"title":"Double-DCCCAE: Estimation of Body Gestures From Speech Waveform","authors":"Jinhong Lu, Tianhang Liu, Shuzhuang Xu, H. Shimodaira","doi":"10.1109/ICASSP39728.2021.9414660","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414660","url":null,"abstract":"This paper presents an approach for body-motion estimation from audio-speech waveform, where context information in both input and output streams is taken in to account without using recurrent models. Previous works commonly use multiple frames of input to estimate one frame of motion data, where the temporal information of the generated motion is little considered. To resolve the problems, we extend our previous work and propose a system, double deep canonical-correlation-constrained autoencoder (D-DCCCAE), which encodes each of speech and motion segments into fixed-length embedded features that are well correlated with the segments of the other modality. The learnt motion embedded feature is estimated from the learnt speech-embedded feature through a simple neural network and further decoded back to the sequential motion. The proposed pair of embedded features showed higher correlation than spectral features with motion data, and our model was more preferred than the baseline model (BA) in terms of human-likeness and comparable in terms of similar appropriateness.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122187130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Non-Convex Sparse Deviation Modeling Via Generative Models 基于生成模型的非凸稀疏偏差建模
Yaxi Yang, Hailin Wang, Haiquan Qiu, Jianjun Wang, Yao Wang
{"title":"Non-Convex Sparse Deviation Modeling Via Generative Models","authors":"Yaxi Yang, Hailin Wang, Haiquan Qiu, Jianjun Wang, Yao Wang","doi":"10.1109/ICASSP39728.2021.9414170","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414170","url":null,"abstract":"In this paper, the generative model is used to introduce the structural properties of the signal to replace the common sparse hypothesis, and a non-convex compressed sensing sparse deviation model based on the generative model (ℓq-Gen) is proposed. By establishing ℓq variant of the restricted isometry property (q-RIP) and Set-Restricted Eigenvalue Condition (q-S-REC), the error upper bound of the optimal decoder is derived when the recovered signal is within the sparse deviation range of the generator. Furthermore, it is proved that the Gaussian matrix satisfying a certain number of measurements is sufficient to ensure a good recovery for the generating function with high probability. Finally, a series of experiments are carried out to verify the effectiveness and superiority of the ℓq-Gen model.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117016966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Adaptive Pyramid Single-View Depth Lookup Table Coding Method 一种自适应金字塔单视图深度查找表编码方法
Yangang Cai, Ronggang Wang, Song Gu, Jian Zhang, Wen Gao
{"title":"An Adaptive Pyramid Single-View Depth Lookup Table Coding Method","authors":"Yangang Cai, Ronggang Wang, Song Gu, Jian Zhang, Wen Gao","doi":"10.1109/ICASSP39728.2021.9414584","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414584","url":null,"abstract":"As depth maps show unique characteristics like piecewise smooth regions bounded by sharp edges at depth discontinuities, new coding tools are required to approximate these signal characteristics. Moreover, the number of bits to signal the residual values for each segment can be further reduced by integrating a Depth Lookup Table (DLT), which maps depth values to valid depth values of the original depth map. The DLT is constructed based on an initial analysis of the input depth map and is then coded in the sequence header. In this paper, an adaptive pyramid single-view depth lookup table coding method is proposed, with the purpose of designing a clean syntax structure in the sequence header with reasonably good performance. Experiments show that the proposed method can reduce about 84.97% coding bits on average.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117270011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Instrument Classification of Solo Sheet Music Images 独奏乐谱图像的乐器分类
Kevin Ji, Daniel Yang, T. Tsai
{"title":"Instrument Classification of Solo Sheet Music Images","authors":"Kevin Ji, Daniel Yang, T. Tsai","doi":"10.1109/ICASSP39728.2021.9413732","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413732","url":null,"abstract":"This paper studies instrument classification of solo sheet music. Whereas previous work has focused on instrument recognition in audio data, we instead approach the instrument classification problem using raw sheet music images. Our approach first converts the sheet music image into a sequence of musical words based on the bootleg score representation, and then treats the problem as a text classification task. We show that it is possible to significantly improve classifier performance by training a language model on unlabeled data, initializing a classifier with the pretrained language model weights, and then finetuning the classifier on labeled data. In this work, we train AWD-LSTM, GPT-2, and RoBERTa models on solo sheet music images from IMSLP for eight different instruments. We find that GPT-2 and RoBERTa slightly outperform AWD-LSTM, and that pretraining increases classification accuracy for RoBERTa from 34.5% to 42.9%. Furthermore, we propose two data augmentation methods that increase classification accuracy for RoBERTa by an additional 15%.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128638017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Real-Time Radio Modulation Classification With An LSTM Auto-Encoder 基于LSTM自编码器的实时无线电调制分类
Ziqi Ke, H. Vikalo
{"title":"Real-Time Radio Modulation Classification With An LSTM Auto-Encoder","authors":"Ziqi Ke, H. Vikalo","doi":"10.1109/ICASSP39728.2021.9414351","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414351","url":null,"abstract":"Identifying modulation type of a received radio signal is a challenging problem encountered in many applications including radio interference mitigation and spectrum allocation. This problem is rendered challenging by the existence of a large number of modulation schemes and numerous sources of interference. Existing methods for monitoring spectrum readily collect large amounts of radio signals. However, existing state-of-the-art approaches to modulation classification struggle to reach desired levels of accuracy with computational efficiency practically feasible for implementation on low-cost computational platforms. To this end, we propose a learning framework based on an LSTM denoising autoencoder designed to extract robust and stable features from the noisy received signals, and detect the underlying modulation scheme. The method uses a compact architecture that may be implemented on low-cost computational devices while achieving or exceeding state-of-the-art classification accuracy. Experimental results on realistic synthetic and over-the-air radio data show that the proposed framework reliably and efficiently classifies radio signals, and often significantly outperform state-of-the-art approaches.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"84 Pt 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129006633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Applied Methods for Sparse Sampling of Head-Related Transfer Functions 头相关传递函数稀疏采样的应用方法
Lior Arbel, Z. Ben-Hur, D. Alon, B. Rafaely
{"title":"Applied Methods for Sparse Sampling of Head-Related Transfer Functions","authors":"Lior Arbel, Z. Ben-Hur, D. Alon, B. Rafaely","doi":"10.1109/ICASSP39728.2021.9413976","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413976","url":null,"abstract":"Production of high fidelity spatial audio applications requires individual head-related transfer functions (HRTFs). As the acquisition of HRTF is an elaborate process, interest lies in interpolating full length HRTF from sparse samples. Ear-alignment is a recently developed pre-processing technique, shown to reduce an HRTF’s spherical harmonics order, thus permitting sparse sampling over fewer directions. This paper describes the application of two methods for ear-aligned HRTF interpolation by sparse sampling: Orthogonal Matching Pursuit and Principal Component Analysis. These methods consist of generating unique vector sets for HRTF representation. The methods were tested over an HRTF dataset, indicating that interpolation errors using small sampling schemes may be further reduced by up to 5 dB in comparison with spherical harmonics interpolation.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"94 2 Pt 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129454743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Scale Residual Network for Covid-19 Diagnosis Using Ct-Scans 基于ct扫描的Covid-19多尺度残差网络诊断
Pratyush Garg, R. Ranjan, Kamini Upadhyay, M. Agrawal, D. Deepak
{"title":"Multi-Scale Residual Network for Covid-19 Diagnosis Using Ct-Scans","authors":"Pratyush Garg, R. Ranjan, Kamini Upadhyay, M. Agrawal, D. Deepak","doi":"10.1109/ICASSP39728.2021.9414426","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414426","url":null,"abstract":"To mitigate the outbreak of highly contagious COVID-19, we need a sensitive, robust automated diagnostic tool. This paper proposes a three-level approach to separate the cases of COVID-19, pneumonia from normal patients using chest CT scans. At the first level, we fine tune a multi-scale ResNet50 model for feature extraction from all the slices of CT scan for each patient. By using multi-scale residual network, we can learn different sizes of infection, thereby making the detection possible at early stages too. These extracted features are used to train a patient-level classifier, at the second level. Four different classifiers are trained at this stage. Finally, predictions of patient level classifiers are combined by training an ensemble classifier. We test the proposed method on three sets of data released by ICASSP, COVID-19 Signal Processing Grand Challenge (SPGC). The proposed method has been successful in classifying the three classes with a validation accuracy of 94.9% and testing accuracy of 88.89%.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128991469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Improving Dialogue Response Generation Via Knowledge Graph Filter 基于知识图过滤器的对话响应生成改进
Yanmeng Wang, Ye Wang, Xingyu Lou, Wenge Rong, Zhenghong Hao, Shaojun Wang
{"title":"Improving Dialogue Response Generation Via Knowledge Graph Filter","authors":"Yanmeng Wang, Ye Wang, Xingyu Lou, Wenge Rong, Zhenghong Hao, Shaojun Wang","doi":"10.1109/ICASSP39728.2021.9414324","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414324","url":null,"abstract":"Current generative dialogue systems tend to produce generic dialog responses, which lack useful information and semantic coherence. An promising method to alleviate this problem is to integrate knowledge triples from knowledge base. However, current approaches mainly augment Seq2Seq framework with knowledge-aware mechanism to retrieve a large number of knowledge triples without considering specific dialogue context, which probably results in knowledge redundancy and incomplete knowledge comprehension. In this paper, we propose to leverage the contextual word representation of dialog post to filter out irrelevant knowledge with an attention-based triple filter network. We introduce a novel knowledge-enriched framework to integrate the filtered knowledge into the dialogue representation. Entity copy is further proposed to facilitate the integration of the knowledge during generation. Experiments on dialogue generation tasks have shown the proposed framework’s promising potential.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123827616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Perceptual Quality Assessment for Recognizing True and Pseudo 4k Content 识别真实和伪4k内容的感知质量评估
Wenhan Zhu, Guangtao Zhai, Xiongkuo Min, Xiaokang Yang, Xiao-Ping Zhang
{"title":"Perceptual Quality Assessment for Recognizing True and Pseudo 4k Content","authors":"Wenhan Zhu, Guangtao Zhai, Xiongkuo Min, Xiaokang Yang, Xiao-Ping Zhang","doi":"10.1109/ICASSP39728.2021.9414932","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414932","url":null,"abstract":"To meet the imperative demand for monitoring the quality of Ultra High-Definition (UHD) content in multimedia industries, we propose an efficient no-reference (NR) image quality assessment (IQA) metric to distinguish original and pseudo 4K contents and measure the quality of their quality in this paper. First, we establish a database including more than 3000 4K images composed of natural 4K images together with upscaled versions interpolated from 1080p and 720p images by fourteen algorithms. To improve computing efficiency, our model segments the input image and selects three representative patches by local variances. Then, we extract the histogram features and cut-off frequency features in the frequency domain as well as the natural scenes statistic (NSS) based features from the representative patches. Finally, we employ support vector regressor (SVR) to aggregate these extracted features as an overall quality metric to predict the quality score of the target image. Extensive experimental comparisons using seven common evaluation indicators demonstrate that the proposed model outperforms the competitive NR IQA methods and has a great ability to distinguish true and pseudo 4K images.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124224000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Checking PRNU Usability on Modern Devices 检查PRNU在现代设备上的可用性
C. Albisani, Massimo Iuliani, Alessandro Piva
{"title":"Checking PRNU Usability on Modern Devices","authors":"C. Albisani, Massimo Iuliani, Alessandro Piva","doi":"10.1109/ICASSP39728.2021.9413611","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413611","url":null,"abstract":"The image source identification task is mainly addressed by exploiting the unique traces of the sensor pattern noise, that ensure a negligible false alarm rate when comparing patterns extracted from different devices, even of the same brand or model. However, most recent smartphones are equipped with proprietary in-camera processing that can possibly expose unexpected correlated patterns within images belonging to different sensors.In this paper, we first highlight that wrong source attribution can happen on smartphones belonging to the same brand when images are acquired both in default and in bokeh mode. While the bokeh mode is proved to introduce a correlated pattern due to the specific in-camera post-processing, we also show that natural images also expose such issue, even when a reference from flat images is available. Furthermore, different camera models expose different correlation patterns since they are reasonably related to developers’ choices. Then, we propose a general strategy that allows the forensic practitioner to determine whether a questioned device may suffer from these correlated patterns, thus avoiding the risk of false image attribution.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121192517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信