基于DT-CWT理论的食物喜爱度评定新方法探索

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI:10.1145/3242969.3243684

Yanan Guo, Jing Han, Zixing Zhang, Björn Schuller, Yide Ma

{"title":"基于DT-CWT理论的食物喜爱度评定新方法探索","authors":"Yanan Guo, Jing Han, Zixing Zhang, Björn Schuller, Yide Ma","doi":"10.1145/3242969.3243684","DOIUrl":null,"url":null,"abstract":"In this paper, we mainly investigate subjects' food likability based on audio-related features as a contribution to EAT ? the ICMI 2018 Eating Analysis and Tracking challenge. Specifically, we conduct 4-level Double Tree Complex Wavelet Transform decomposition of an audio signal, and obtain five sub-audio signals with frequencies ranging from low to high. For each sub-audio signal, not only 'traditional' functional-based features but also deep learning-based features via pretrained CNNs based on SliCQ-nonstationary Gabor transform and a cochleagram map, are calculated. Besides, the original audio signals based Bag-of-Audio-Words features extracted by the openXBOW toolkit are used to enhance the model as well. Finally, the early fusion of all these three kinds of features can lead to promising results, yielding the highest UAR of 79.2 % by means of a leave-one-speaker-out cross-validation, which holds a 12.7 % absolute gain compared with the baseline of 66.5 % UAR.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"231 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Exploring A New Method for Food Likability Rating Based on DT-CWT Theory\",\"authors\":\"Yanan Guo, Jing Han, Zixing Zhang, Björn Schuller, Yide Ma\",\"doi\":\"10.1145/3242969.3243684\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we mainly investigate subjects' food likability based on audio-related features as a contribution to EAT ? the ICMI 2018 Eating Analysis and Tracking challenge. Specifically, we conduct 4-level Double Tree Complex Wavelet Transform decomposition of an audio signal, and obtain five sub-audio signals with frequencies ranging from low to high. For each sub-audio signal, not only 'traditional' functional-based features but also deep learning-based features via pretrained CNNs based on SliCQ-nonstationary Gabor transform and a cochleagram map, are calculated. Besides, the original audio signals based Bag-of-Audio-Words features extracted by the openXBOW toolkit are used to enhance the model as well. Finally, the early fusion of all these three kinds of features can lead to promising results, yielding the highest UAR of 79.2 % by means of a leave-one-speaker-out cross-validation, which holds a 12.7 % absolute gain compared with the baseline of 66.5 % UAR.\",\"PeriodicalId\":308751,\"journal\":{\"name\":\"Proceedings of the 20th ACM International Conference on Multimodal Interaction\",\"volume\":\"231 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th ACM International Conference on Multimodal Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3242969.3243684\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3242969.3243684","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

在本文中，我们主要研究受试者基于音频相关特征的食物喜爱度，作为对EAT ?ICMI 2018饮食分析和追踪挑战。具体来说，我们对音频信号进行4级双树复小波变换分解，得到频率从低到高的5个亚音频信号。对于每个亚音频信号，不仅计算“传统”基于函数的特征，还计算基于slicq -非平稳Gabor变换和耳蜗图映射的预训练cnn的深度学习特征。此外，利用openXBOW工具箱提取的基于Bag-of-Audio-Words特征的原始音频信号对模型进行增强。最后，所有这三种特征的早期融合可以导致有希望的结果，通过留下一个扬声器的交叉验证，产生最高的UAR为79.2%，与基线的66.5% UAR相比，它具有12.7%的绝对增益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploring A New Method for Food Likability Rating Based on DT-CWT Theory

In this paper, we mainly investigate subjects' food likability based on audio-related features as a contribution to EAT ? the ICMI 2018 Eating Analysis and Tracking challenge. Specifically, we conduct 4-level Double Tree Complex Wavelet Transform decomposition of an audio signal, and obtain five sub-audio signals with frequencies ranging from low to high. For each sub-audio signal, not only 'traditional' functional-based features but also deep learning-based features via pretrained CNNs based on SliCQ-nonstationary Gabor transform and a cochleagram map, are calculated. Besides, the original audio signals based Bag-of-Audio-Words features extracted by the openXBOW toolkit are used to enhance the model as well. Finally, the early fusion of all these three kinds of features can lead to promising results, yielding the highest UAR of 79.2 % by means of a leave-one-speaker-out cross-validation, which holds a 12.7 % absolute gain compared with the baseline of 66.5 % UAR.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 20th ACM International Conference on Multimodal Interaction

自引率

0.00%

发文量