ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献_第2页

Convolutional Dropout and Wordpiece Augmentation for End-to-End Speech Recognition 端到端语音识别的卷积Dropout和词块增强

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9415004

Hainan Xu, Yinghui Huang, Yun Zhu, Kartik Audhkhasi, B. Ramabhadran

引用次数: 3

Sparsity in Max-Plus Algebra and Applications in Multivariate Convex Regression Max-Plus代数的稀疏性及其在多元凸回归中的应用

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9414139

Nikos Tsilivis, Anastasios Tsiamis, P. Maragos

引用次数: 2

Improving Identification of System-Directed Speech Utterances by Deep Learning of ASR-Based Word Embeddings and Confidence Metrics 通过深度学习基于 ASR 的单词嵌入和置信度指标，改进系统引导的语音片段识别

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9414330

Vilayphone Vilaysouk, Amr H. Nour-Eldin, Dermot Connolly

引用次数: 0

Image Coding with Neural Network-Based Colorization 基于神经网络着色的图像编码

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9413816

Diogo Lopes, J. Ascenso, Catarina Brites, Fernando Pereira

{"title":"Image Coding with Neural Network-Based Colorization","authors":"Diogo Lopes, J. Ascenso, Catarina Brites, Fernando Pereira","doi":"10.1109/ICASSP39728.2021.9413816","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413816","url":null,"abstract":"Automatic colorization is a process with the objective of inferring the color of grayscale images. This process is frequently used for artistic purposes and to restore the color in old or damaged images. Motivated by the excellent results obtained with deep learning-based solutions in the area of automatic colorization, this paper proposes an image coding solution integrating a deep learning-based colorization process to estimate the chrominance components based on the decoded luminance which is regularly encoded with a conventional image coding standard. In this case, the chrominance components are not coded and transmitted as usual, notably after some subsampling, as only some color hints, i.e. chrominance values for specific pixel locations, may be sent to the decoder to help it creating more accurate colorizations. To boost the colorization and final compression performance, intelligent ways to select the color hints are proposed. Experimental results show performance improvements with the increased level of intelligence in the color hints extraction process and a good subjective quality of the final decoded (and colorized) images.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124859985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ATVIO: Attention Guided Visual-Inertial Odometry 注意引导视觉惯性里程计

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9413912

Li Liu, Ge Li, Thomas H. Li

引用次数: 8

Segmental Dtw: A Parallelizable Alternative to Dynamic Time Warping 分段Dtw:动态时间翘曲的可并行选择

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9413827

T. Tsai

{"title":"Segmental Dtw: A Parallelizable Alternative to Dynamic Time Warping","authors":"T. Tsai","doi":"10.1109/ICASSP39728.2021.9413827","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413827","url":null,"abstract":"In this work we explore parallelizable alternatives to DTW for globally aligning two feature sequences. One of the main practical limitations of DTW is its quadratic computation and memory cost. Previous works have sought to reduce the computational cost in various ways, such as imposing bands in the cost matrix or using a multiresolution approach. In this work, we utilize the fact that computation is an abundant resource and focus instead on exploring alternatives that approximate the inherently sequential DTW algorithm with one that is parallelizable. We describe two variations of an algorithm called Segmental DTW, in which the global cost matrix is broken into smaller sub-matrices, subsequence DTW is performed on each sub-matrix, and the results are used to solve a segment-level dynamic programming problem that specifies a globally optimal alignment path. We evaluate the proposed alignment algorithms on an audio-audio alignment task using the Chopin Mazurka dataset, and we show that they closely match the performance of regular DTW. We further demonstrate that almost all of the computations in Segmental DTW are parallelizable, and that one of the variants is unilaterally better than the other for both empirical and theoretical reasons.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124976832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

On the Detection of Pitch-Shifted Voice: Machines and Human Listeners 音高移位语音的检测:机器与人类听者

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9414890

D. Looney, N. Gaubitch

引用次数: 0

Multi-Task Estimation of Age and Cognitive Decline from Speech 多任务估计年龄和言语认知衰退

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9414642

Yilin Pan, Venkata Srikanth Nallanthighal, D. Blackburn, H. Christensen, Aki Härmä

{"title":"Multi-Task Estimation of Age and Cognitive Decline from Speech","authors":"Yilin Pan, Venkata Srikanth Nallanthighal, D. Blackburn, H. Christensen, Aki Härmä","doi":"10.1109/ICASSP39728.2021.9414642","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414642","url":null,"abstract":"Speech is a common physiological signal that can be affected by both ageing and cognitive decline. Often the effect can be confounding, as would be the case for people at, e.g., very early stages of cognitive decline due to dementia. Despite this, the automatic predictions of age and cognitive decline based on cues found in the speech signal are generally treated as two separate tasks. In this paper, multi-task learning is applied for the joint estimation of age and the Mini-Mental Status Evaluation criteria (MMSE) commonly used to assess cognitive decline. To explore the relationship between age and MMSE, two neural network architectures are evaluated: a SincNet-based end-to-end architecture, and a system comprising of a feature extractor followed by a shallow neural network. Both are trained with single-task or multi-task targets. To compare, an SVM-based regressor is trained in a single-task setup. i-vector, x-vector and ComParE features are explored. Results are obtained on systems trained on the DementiaBank dataset and tested on an in-house dataset as well as the ADReSS dataset. The results show that both the age and MMSE estimation is improved by applying multitask learning, with state-of-the-art results achieved on the ADReSS dataset acoustic-only task.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125175690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

GDTW: A Novel Differentiable DTW Loss for Time Series Tasks 时间序列任务中一种新的可微DTW损失

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9413895

Xiang Liu, Naiqi Li, Shutao Xia

引用次数: 1

CMIM: Cross-Modal Information Maximization For Medical Imaging 医学影像的跨模态信息最大化

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI: 10.1109/ICASSP39728.2021.9414132

Tristan Sylvain, Francis Dutil, T. Berthier, Lisa Di-Jorio, M. Luck, R. Devon Hjelm, Y. Bengio

引用次数: 3