ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
CyFi-TTS: Cyclic Normalizing Flow with Fine-Grained Representation for End-to-End Text-to-Speech CyFi-TTS:端到端文本到语音的细粒度表示循环归一化流
In-Sun Hwang, Young-Sub Han, Byoung-Ki Jeon
{"title":"CyFi-TTS: Cyclic Normalizing Flow with Fine-Grained Representation for End-to-End Text-to-Speech","authors":"In-Sun Hwang, Young-Sub Han, Byoung-Ki Jeon","doi":"10.1109/ICASSP49357.2023.10095323","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095323","url":null,"abstract":"Advanced end-to-end text-to-speech (TTS) systems directly generate high-quality speech. These systems demonstrate superior performance on the seen dataset from training. However, inferring speech using unseen transcripts is challenging. Usually, the generated speech tends to be mispronounced because the one-to-many problem creates an information gap between the text and speech. To address these problems, we propose a cyclic normalizing flow with fine-grained representation for end-to-end text-to-speech (CyFi-TTS), which generates natural-sounding speech by bridging the information gap. We leverage a temporal multi-resolution upsampler to progressively produce a fine-grained representation. Furthermore, we adopt a cyclic normalizing flow to produce an acoustic representation through cyclic representation learning. Experimental results reveal that CyFi-TTS directly generates speech with clear pronunciation compared to recent TTS systems. Furthermore, CyFi-TTS achieves a mean opinion score of 4.02 and a character error rate of 1.99%.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126157837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Daily Mental Health Monitoring from Speech: A Real-World Japanese Dataset and Multitask Learning Analysis 每日心理健康监测:一个真实世界的日语数据集和多任务学习分析
Meishu Song, Andreas Triantafyllopoulos, Zijiang Yang, Hiroki Takeuchi, Toru Nakamura, A. Kishi, Tetsuro Ishizawa, K. Yoshiuchi, Xin Jing, Vincent Karas, Zhonghao Zhao, Kun Qian, B. Hu, B. Schuller, Yoshiharu Yamamoto
{"title":"Daily Mental Health Monitoring from Speech: A Real-World Japanese Dataset and Multitask Learning Analysis","authors":"Meishu Song, Andreas Triantafyllopoulos, Zijiang Yang, Hiroki Takeuchi, Toru Nakamura, A. Kishi, Tetsuro Ishizawa, K. Yoshiuchi, Xin Jing, Vincent Karas, Zhonghao Zhao, Kun Qian, B. Hu, B. Schuller, Yoshiharu Yamamoto","doi":"10.1109/ICASSP49357.2023.10096884","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096884","url":null,"abstract":"Translating mental health recognition from clinical research into real-world application requires extensive data, yet existing emotion datasets are impoverished in terms of daily mental health monitoring, especially when aiming for self-reported anxiety and depression recognition. We introduce the Japanese Daily Speech Dataset (JDSD), a large in-the-wild daily speech emotion dataset consisting of 20,827 speech samples from 342 speakers and 54 hours of total duration. The data is annotated on the Depression and Anxiety Mood Scale (DAMS) – 9 self-reported emotions to evaluate mood state including \"vigorous\", \"gloomy\", \"concerned\", \"happy\", \"unpleasant\", \"anxious\", \"cheerful\", \"depressed\", and \"worried\". Our dataset possesses emotional states, activity, and time diversity, making it useful for training models to track daily emotional states for healthcare purposes. We partition our corpus and provide a multi-task benchmark across nine emotions, demonstrating that mental health states can be predicted reliably from self-reports with a Concordance Correlation Coefficient value of .547 on average. We hope that JDSD will become a valuable resource to further the development of daily emotional healthcare tracking.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"379 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123448177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Cross-Domain Object Classification Via Successive Subspace Alignment 基于连续子空间对齐的跨域目标分类
Kecheng Chen, Hao Li, Hong Yan
{"title":"Cross-Domain Object Classification Via Successive Subspace Alignment","authors":"Kecheng Chen, Hao Li, Hong Yan","doi":"10.1109/ICASSP49357.2023.10096792","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096792","url":null,"abstract":"Recently, successive subspace learning (SSL)-based methods have shown to be effective for the task of visual object classification with mild data desire and mathematically transparent interpretable capability. However, existing SSL-based methods rely heavily on the data-centric subspace representations, leading to potential performance degradation problem in case of the domain shift between the training (a.k.a., source domain) and testing (a.k.a., target domain) data. To address this limitation, we propose an effective successive subspace learning method based on existing SSL-based methods. Specifically, we introduce a novel linear transformation layer to align eigenvectors in SSL module between source and target domains, as such, the discrepancy between source and target domains will be reduced, resulting in better cross-domain performance. The effectiveness of our proposed method is demonstrated on the Office-Caltech-10 and Office-31 benchmark datasets by using features extracted from pre-trained deep neural networks as input.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123542846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Priv-Aug-Shap-ECGResNet: Privacy Preserving Shapley-Value Attributed Augmented Resnet for Practical Single-Lead Electrocardiogram Classification privaug - shap - ecgresnet:用于实用单导联心电图分类的隐私保护Shapley-Value属性增强Resnet
A. Ukil, Leandro Marín, A. Jara
{"title":"Priv-Aug-Shap-ECGResNet: Privacy Preserving Shapley-Value Attributed Augmented Resnet for Practical Single-Lead Electrocardiogram Classification","authors":"A. Ukil, Leandro Marín, A. Jara","doi":"10.1109/ICASSP49357.2023.10096437","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096437","url":null,"abstract":"We aim to build an effective automated single-lead Electrocardiogram (ECG) classification system to enable remote and timely screening of critical cardio-vascular diseases like Heart attack. However, the expenses associated with cardiologist-intervened ECG annotation limits the number of training instances. While conventional deep learning models require large set of training examples for accurate classification, we propose Priv-Aug-Shap-ECGResNet which demonstrates that deep learning algorithm (for e.g., residual network or ResNet) with ablation of unimportant features from the given training dataset can ensure consistently better classification performance over relevant state-of-the-art algorithms. Additively perturbed training augmentation with Shapley attribution finds out the right feature subset with the assistance of the axioms of transferable utility, namely \"efficiency\" and \"null player\" on which Shapley value game is defined. Priv-Aug-Shap-ECGResNet is enabled with novel data privacy preservation feature through differential privacy technique to provide measured obfuscation to render ZeroR classification equivalent knowledge gain to the adversary.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125623795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Selecting Language Models Features VIA Software-Hardware Co-Design 通过软硬件协同设计选择语言模型特征
Vlad Pandelea, E. Ragusa, P. Gastaldo, E. Cambria
{"title":"Selecting Language Models Features VIA Software-Hardware Co-Design","authors":"Vlad Pandelea, E. Ragusa, P. Gastaldo, E. Cambria","doi":"10.1109/ICASSP49357.2023.10097191","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10097191","url":null,"abstract":"The availability of new datasets and deep learning techniques have led to a surge of effort directed towards the creation of new models that can exploit the large amount of data. However, little attention has been given to the development of models that are not only accurate, but also suitable for user-specific use or geared towards resource-constrained devices. Fine-tuning deep models on edge devices is impractical and, often, user customization stands on the sub-optimal feature-extractor/classifier paradigm. Here, we propose a method to fully utilize the intermediate outputs of the popular large pre-trained models in natural language processing when used as frozen feature extractors, and further close the gap between their fine-tuning and more computationally efficient solutions. We reach this goal exploiting the concept of software-hardware co-design and propose a methodical procedure, inspired by Neural Architecture Search, to select the most desirable model taking into consideration application constraints.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126870023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Restoration of Time-Varying Graph Signals using Deep Algorithm Unrolling 基于深度展开算法的时变图信号恢复
Hayate Kojima, Hikari Noguchi, Koki Yamada, Yuichi Tanaka
{"title":"Restoration of Time-Varying Graph Signals using Deep Algorithm Unrolling","authors":"Hayate Kojima, Hikari Noguchi, Koki Yamada, Yuichi Tanaka","doi":"10.1109/ICASSP49357.2023.10094838","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10094838","url":null,"abstract":"In this paper, we propose a restoration method of time-varying graph signals, i.e., signals on a graph whose signal values change over time, using deep algorithm unrolling. Deep algorithm unrolling is a method that learns parameters in an iterative optimization algorithm with deep learning techniques. It is expected to improve convergence speed and accuracy while the iterative steps are still interpretable. In the proposed method, the minimization problem is formulated so that the time-varying graph signal is smooth both in time and spatial domains. The internal parameters, i.e., time domain FIR filters and regularization parameters, are learned from training data. Experimental results using synthetic data and real sea surface temperature data show that the proposed method improves signal reconstruction accuracy compared to several existing time-varying graph signal re- construction methods.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126953076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Spatially Informed Independent vector analysis for Source Extraction based on the convolutive Transfer Function Model 基于卷积传递函数模型的空间信息独立矢量分析
Xianrui Wang, Andreas Brendel, Gongping Huang, Yichen Yang, Walter Kellermann, Jingdong Chen
{"title":"Spatially Informed Independent vector analysis for Source Extraction based on the convolutive Transfer Function Model","authors":"Xianrui Wang, Andreas Brendel, Gongping Huang, Yichen Yang, Walter Kellermann, Jingdong Chen","doi":"10.1109/ICASSP49357.2023.10097106","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10097106","url":null,"abstract":"Spatial information can help improve source separation performance. Numerous spatially informed source extraction methods based on the independent vector analysis (IVA) have been developed, which can achieve reasonably good performance in non- or weakly reverberant environments. However, the performance of those methods degrades quickly as the reverberation increases. The underlying reason is that those methods are derived based on the multiplicative transfer function model with a rank-1 assumption, which does not hold true if reverberation is strong. To circumvent this issue, this paper proposes to use the convolutive transfer function (CTF) model to improve the source extraction performance and develop a spatially informed IVA algorithm. Simulations demonstrate the efficacy of the developed method even in highly reverberant environments.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115220401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Implicit Gradient Method for Constrained Bilevel Problems Using Barrier Approximation 用势垒逼近求解约束双层问题的隐式梯度方法
Ioannis C. Tsaknakis, Prashant Khanduri, Min-Sun Hong
{"title":"An Implicit Gradient Method for Constrained Bilevel Problems Using Barrier Approximation","authors":"Ioannis C. Tsaknakis, Prashant Khanduri, Min-Sun Hong","doi":"10.1109/ICASSP49357.2023.10096878","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096878","url":null,"abstract":"In this work, we propose algorithms for solving a class of Bilevel Optimization (BLO) problems, with applications in areas such as signal processing, networking and machine learning. Specifically, we develop a novel barrier-based gradient approximation algorithm that transforms the constrained BLO problem to a problem with only linear equality constraints in the LL task. For the reformulated problem, we compute the implicit gradient and develop a gradient-based scheme, involving only a single gradient descent step and the (approximate) solution of the linearly constrained strongly convex LL task at each iteration. We establish, under certain assumptions, the non-asymptotic convergence guarantees of the proposed method to stationary points. Finally, we perform a number of experiments that show the potential of the proposed algorithm.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115222284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
M-CTRL: A Continual Representation Learning Framework with Slowly Improving Past Pre-Trained Model M-CTRL:一个具有缓慢改进过去预训练模型的持续表示学习框架
Jin-Seong Choi, Jaehwan Lee, Chae-Won Lee, Joon‐Hyuk Chang
{"title":"M-CTRL: A Continual Representation Learning Framework with Slowly Improving Past Pre-Trained Model","authors":"Jin-Seong Choi, Jaehwan Lee, Chae-Won Lee, Joon‐Hyuk Chang","doi":"10.1109/ICASSP49357.2023.10096793","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096793","url":null,"abstract":"Representation models pre-trained on unlabeled data show competitive performance in speech recognition, even when fine-tuned on small amounts of labeled data. The continual representation learning (CTRL) framework combines pre-training and continual learning methods to obtain powerful representation. CTRL relies on two neural networks, online and offline models, where the fixed latter model transfers information to the former model with continual learning loss. In this paper, we present momentum continual representation learning (M-CTRL), a framework that slowly updates the offline model with an exponential moving average of the online model. Our framework aims to capture information from the offline model improved on past and new domains. To evaluate our framework, we continually pre-train wav2vec 2.0 with M-CTRL in the following order: Librispeech, Wall Street Journal, and TED-LIUM V3. Our experiments demonstrate that M-CTRL improves the performance in the new domain and reduces information loss in the past domain compared to CTRL.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115323574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
D-CONFORMER: Deformable Sparse Transformer Augmented Convolution for Voxel-Based 3D Object Detection D-CONFORMER:基于体素的三维物体检测的可变形稀疏变压器增强卷积
Xiao Zhao, Liuzhen Su, Xukun Zhang, Dingkang Yang, Mingyang Sun, Shunli Wang, Peng Zhai, Lihua Zhang
{"title":"D-CONFORMER: Deformable Sparse Transformer Augmented Convolution for Voxel-Based 3D Object Detection","authors":"Xiao Zhao, Liuzhen Su, Xukun Zhang, Dingkang Yang, Mingyang Sun, Shunli Wang, Peng Zhai, Lihua Zhang","doi":"10.1109/ICASSP49357.2023.10097060","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10097060","url":null,"abstract":"Although CNN-based and Transformer-based detectors have made impressive improvements in 3D object detection, these two network paradigms suffer from the interference of insufficient receptive field and local detail weakening, which significantly limits the feature extraction performance of the backbone. In this paper, we propose to fuse convolution and transformer, and simultaneously considering the different contributions of non-empty voxels at different positions in 3D space to object detection, it is not consistent with applying standard convolution and transformer directly on voxels. Specifically, we design a novel deformable sparse transformer to perform long-range information interaction on fine-grained local detail semantics aggregated by focal sparse convolution, termed D-Conformer. D-Conformer learns valuable voxels with position-wise in sparse space and can be applied to most voxel-based detectors as a backbone. Extensive experiments demonstrate that our method achieves satisfactory detection results and outperforms state-of-the-art 3D detection methods by a large margin.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115350873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信