ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
CLIP-Font: Sementic Self-Supervised Few-Shot Font Generation with Clip CLIP-Font:带剪辑的 Sementic 自监督少枪字体生成技术
Jialu Xiong, Yefei Wang, Jinshan Zeng
{"title":"CLIP-Font: Sementic Self-Supervised Few-Shot Font Generation with Clip","authors":"Jialu Xiong, Yefei Wang, Jinshan Zeng","doi":"10.1109/icassp48485.2024.10447490","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10447490","url":null,"abstract":"","PeriodicalId":517764,"journal":{"name":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"20 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140706483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Elevating Visual Prompting in Transfer Learning Via Pruned Model Ensembles: No Retrain, No Pain 通过剪枝模型集合提升迁移学习中的视觉提示:无重训,无痛苦
Brian Zhang, Yuguang Yao, Sijia Liu
{"title":"Elevating Visual Prompting in Transfer Learning Via Pruned Model Ensembles: No Retrain, No Pain","authors":"Brian Zhang, Yuguang Yao, Sijia Liu","doi":"10.1109/icassp48485.2024.10447808","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10447808","url":null,"abstract":"","PeriodicalId":517764,"journal":{"name":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"156 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140706988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Binauralmusic: A Diverse Dataset for Improving Cross-Modal Binaural Audio Generation 双耳音乐:用于改进跨模态双耳音频生成的多样化数据集
Yunqi Li, Shulin Liu, Haonan Cheng, Long Ye
{"title":"Binauralmusic: A Diverse Dataset for Improving Cross-Modal Binaural Audio Generation","authors":"Yunqi Li, Shulin Liu, Haonan Cheng, Long Ye","doi":"10.1109/icassp48485.2024.10448509","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10448509","url":null,"abstract":"","PeriodicalId":517764,"journal":{"name":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"67 S10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140705089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enriching Music Descriptions with A Finetuned-LLM and Metadata for Text-to-Music Retrieval 用微调 LLM 和元数据丰富音乐描述,实现从文本到音乐的检索
Seungheon Doh, Minhee Lee, Dasaem Jeong, Juhan Nam
{"title":"Enriching Music Descriptions with A Finetuned-LLM and Metadata for Text-to-Music Retrieval","authors":"Seungheon Doh, Minhee Lee, Dasaem Jeong, Juhan Nam","doi":"10.1109/icassp48485.2024.10446380","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10446380","url":null,"abstract":"","PeriodicalId":517764,"journal":{"name":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"88 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140705865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Speech Emotion Recognition Using A Conditional Neural Process 使用条件神经过程进行动态语音情感识别
Luz Martinez-Lucas, Carlos Busso
{"title":"Dynamic Speech Emotion Recognition Using A Conditional Neural Process","authors":"Luz Martinez-Lucas, Carlos Busso","doi":"10.1109/icassp48485.2024.10447805","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10447805","url":null,"abstract":"The problem of predicting emotional attributes from speech has often focused on predicting a single value from a sentence or short speaking turn. These methods often ignore that natural emotions are both dynamic and dependent on context. To model the dynamic nature of emotions, we can treat the prediction of emotion from speech as a time-series problem. We refer to the problem of predicting these emotional traces as dynamic speech emotion recognition. Previous studies in this area have used models that treat all emotional traces as coming from the same underlying distribution. Since emotions are dependent on contextual information, these methods might obscure the context of an emotional interaction. This paper uses a neural process model with a segment-level speech emotion recognition (SER) model for this problem. This type of model leverages information from the time-series and predictions from the SER model to learn a prior that defines a distribution over emotional traces. Our proposed model performs 21% better than a bidirectional long short-term memory (BiLSTM) baseline when predicting emotional traces for valence.","PeriodicalId":517764,"journal":{"name":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"9 10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140705349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An MVDR-Embedded U-Net Beamformer for Effective and Robust Multichannel Speech Enhancement 一种嵌入 MVDR 的 U-Net 波束形成器,用于有效和稳健的多通道语音增强
Ching-Hua Lee, Kashyap Patel, Chouchang Yang, Yilin Shen, Hongxia Jin
{"title":"An MVDR-Embedded U-Net Beamformer for Effective and Robust Multichannel Speech Enhancement","authors":"Ching-Hua Lee, Kashyap Patel, Chouchang Yang, Yilin Shen, Hongxia Jin","doi":"10.1109/icassp48485.2024.10448366","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10448366","url":null,"abstract":"","PeriodicalId":517764,"journal":{"name":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140705769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AUTOSGM: A Unified Lowpass Regularization Framework for Accelerated Learning AUTOSGM:用于加速学习的统一低通正则化框架
Oluwasegun Ayokunle Somefun, Stefan Lee, V. J. Mathews
{"title":"AUTOSGM: A Unified Lowpass Regularization Framework for Accelerated Learning","authors":"Oluwasegun Ayokunle Somefun, Stefan Lee, V. J. Mathews","doi":"10.1109/icassp48485.2024.10448203","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10448203","url":null,"abstract":"","PeriodicalId":517764,"journal":{"name":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"49 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140705776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Modified Cramér-Rao Bound for Discrete-Time Markovian Dynamic Systems 离散时间马尔可夫动态系统的修正克拉梅尔-拉奥约束
Sara El Bouch, J. Galy, É. Chaumette, J. Vilà‐Valls
{"title":"A Modified Cramér-Rao Bound for Discrete-Time Markovian Dynamic Systems","authors":"Sara El Bouch, J. Galy, É. Chaumette, J. Vilà‐Valls","doi":"10.1109/icassp48485.2024.10446252","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10446252","url":null,"abstract":"","PeriodicalId":517764,"journal":{"name":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"86 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140705874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trades++: Enhancing Multi-Object Tracking of Real Low Confidence Targets Using a Pyramid-Like Self-Attention Model 交易++:利用金字塔式自我关注模型加强对真实低置信度目标的多目标跟踪
Chenxin Wen, Yanlei Gao, Jie Li
{"title":"Trades++: Enhancing Multi-Object Tracking of Real Low Confidence Targets Using a Pyramid-Like Self-Attention Model","authors":"Chenxin Wen, Yanlei Gao, Jie Li","doi":"10.1109/icassp48485.2024.10446257","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10446257","url":null,"abstract":"","PeriodicalId":517764,"journal":{"name":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"236 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140703885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maskstr: Guide Scene Text Recognition Models with Masking Maskstr:带遮罩的场景文本识别模型指南
Baole Wei, Minghang He, Liangcai Gao, Duoyou Zhou, Xiang Bai, Zhi Tang
{"title":"Maskstr: Guide Scene Text Recognition Models with Masking","authors":"Baole Wei, Minghang He, Liangcai Gao, Duoyou Zhou, Xiang Bai, Zhi Tang","doi":"10.1109/icassp48485.2024.10446874","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10446874","url":null,"abstract":"","PeriodicalId":517764,"journal":{"name":"ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"220 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140704631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信