SAAMEAT

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI:10.1145/3242969.3243685

F. Haider, Senja Pollak, E. Zarogianni, S. Luz

{"title":"SAAMEAT","authors":"F. Haider, Senja Pollak, E. Zarogianni, S. Luz","doi":"10.1145/3242969.3243685","DOIUrl":null,"url":null,"abstract":"Automatic recognition of eating conditions of humans could be a useful technology in health monitoring. The audio-visual information can be used in automating this process, and feature engineering approaches can reduce the dimensionality of audio-visual information. The reduced dimensionality of data (particularly feature subset selection) can assist in designing a system for eating conditions recognition with lower power, cost, memory and computation resources than a system which is designed using full dimensions of data. This paper presents Active Feature Transformation (AFT) and Active Feature Selection (AFS) methods, and applies them to all three tasks of the ICMI 2018 EAT Challenge for recognition of user eating conditions using audio and visual features. The AFT method is used for the transformation of the Mel-frequency Cepstral Coefficient and ComParE features for the classification task, while the AFS method helps in selecting a feature subset. Transformation by Principal Component Analysis (PCA) is also used for comparison. We find feature subsets of audio features using the AFS method (422 for Food Type, 104 for Likability and 68 for Difficulty out of 988 features) which provide better results than the full feature set. Our results show that AFS outperforms PCA and AFT in terms of accuracy for the recognition of user eating conditions using audio features. The AFT of visual features (facial landmarks) provides less accurate results than the AFS and AFT sets of audio features. However, the weighted score fusion of all the feature set improves the results.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3242969.3243685","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Automatic recognition of eating conditions of humans could be a useful technology in health monitoring. The audio-visual information can be used in automating this process, and feature engineering approaches can reduce the dimensionality of audio-visual information. The reduced dimensionality of data (particularly feature subset selection) can assist in designing a system for eating conditions recognition with lower power, cost, memory and computation resources than a system which is designed using full dimensions of data. This paper presents Active Feature Transformation (AFT) and Active Feature Selection (AFS) methods, and applies them to all three tasks of the ICMI 2018 EAT Challenge for recognition of user eating conditions using audio and visual features. The AFT method is used for the transformation of the Mel-frequency Cepstral Coefficient and ComParE features for the classification task, while the AFS method helps in selecting a feature subset. Transformation by Principal Component Analysis (PCA) is also used for comparison. We find feature subsets of audio features using the AFS method (422 for Food Type, 104 for Likability and 68 for Difficulty out of 988 features) which provide better results than the full feature set. Our results show that AFS outperforms PCA and AFT in terms of accuracy for the recognition of user eating conditions using audio features. The AFT of visual features (facial landmarks) provides less accurate results than the AFS and AFT sets of audio features. However, the weighted score fusion of all the feature set improves the results.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 20th ACM International Conference on Multimodal Interaction

自引率

0.00%

发文量