基于DT-CWT理论的食物喜爱度评定新方法探索

Yanan Guo, Jing Han, Zixing Zhang, Björn Schuller, Yide Ma
{"title":"基于DT-CWT理论的食物喜爱度评定新方法探索","authors":"Yanan Guo, Jing Han, Zixing Zhang, Björn Schuller, Yide Ma","doi":"10.1145/3242969.3243684","DOIUrl":null,"url":null,"abstract":"In this paper, we mainly investigate subjects' food likability based on audio-related features as a contribution to EAT ? the ICMI 2018 Eating Analysis and Tracking challenge. Specifically, we conduct 4-level Double Tree Complex Wavelet Transform decomposition of an audio signal, and obtain five sub-audio signals with frequencies ranging from low to high. For each sub-audio signal, not only 'traditional' functional-based features but also deep learning-based features via pretrained CNNs based on SliCQ-nonstationary Gabor transform and a cochleagram map, are calculated. Besides, the original audio signals based Bag-of-Audio-Words features extracted by the openXBOW toolkit are used to enhance the model as well. Finally, the early fusion of all these three kinds of features can lead to promising results, yielding the highest UAR of 79.2 % by means of a leave-one-speaker-out cross-validation, which holds a 12.7 % absolute gain compared with the baseline of 66.5 % UAR.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"231 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Exploring A New Method for Food Likability Rating Based on DT-CWT Theory\",\"authors\":\"Yanan Guo, Jing Han, Zixing Zhang, Björn Schuller, Yide Ma\",\"doi\":\"10.1145/3242969.3243684\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we mainly investigate subjects' food likability based on audio-related features as a contribution to EAT ? the ICMI 2018 Eating Analysis and Tracking challenge. Specifically, we conduct 4-level Double Tree Complex Wavelet Transform decomposition of an audio signal, and obtain five sub-audio signals with frequencies ranging from low to high. For each sub-audio signal, not only 'traditional' functional-based features but also deep learning-based features via pretrained CNNs based on SliCQ-nonstationary Gabor transform and a cochleagram map, are calculated. Besides, the original audio signals based Bag-of-Audio-Words features extracted by the openXBOW toolkit are used to enhance the model as well. Finally, the early fusion of all these three kinds of features can lead to promising results, yielding the highest UAR of 79.2 % by means of a leave-one-speaker-out cross-validation, which holds a 12.7 % absolute gain compared with the baseline of 66.5 % UAR.\",\"PeriodicalId\":308751,\"journal\":{\"name\":\"Proceedings of the 20th ACM International Conference on Multimodal Interaction\",\"volume\":\"231 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th ACM International Conference on Multimodal Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3242969.3243684\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3242969.3243684","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

在本文中,我们主要研究受试者基于音频相关特征的食物喜爱度,作为对EAT ?ICMI 2018饮食分析和追踪挑战。具体来说,我们对音频信号进行4级双树复小波变换分解,得到频率从低到高的5个亚音频信号。对于每个亚音频信号,不仅计算“传统”基于函数的特征,还计算基于slicq -非平稳Gabor变换和耳蜗图映射的预训练cnn的深度学习特征。此外,利用openXBOW工具箱提取的基于Bag-of-Audio-Words特征的原始音频信号对模型进行增强。最后,所有这三种特征的早期融合可以导致有希望的结果,通过留下一个扬声器的交叉验证,产生最高的UAR为79.2%,与基线的66.5% UAR相比,它具有12.7%的绝对增益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Exploring A New Method for Food Likability Rating Based on DT-CWT Theory
In this paper, we mainly investigate subjects' food likability based on audio-related features as a contribution to EAT ? the ICMI 2018 Eating Analysis and Tracking challenge. Specifically, we conduct 4-level Double Tree Complex Wavelet Transform decomposition of an audio signal, and obtain five sub-audio signals with frequencies ranging from low to high. For each sub-audio signal, not only 'traditional' functional-based features but also deep learning-based features via pretrained CNNs based on SliCQ-nonstationary Gabor transform and a cochleagram map, are calculated. Besides, the original audio signals based Bag-of-Audio-Words features extracted by the openXBOW toolkit are used to enhance the model as well. Finally, the early fusion of all these three kinds of features can lead to promising results, yielding the highest UAR of 79.2 % by means of a leave-one-speaker-out cross-validation, which holds a 12.7 % absolute gain compared with the baseline of 66.5 % UAR.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信