Automated estimation of food type and amount consumed from body-worn audio and motion sensors

Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing Pub Date : 2016-09-12 DOI:10.1145/2971648.2971677

Mark Mirtchouk, Christopher A. Merck, Samantha Kleinberg

{"title":"Automated estimation of food type and amount consumed from body-worn audio and motion sensors","authors":"Mark Mirtchouk, Christopher A. Merck, Samantha Kleinberg","doi":"10.1145/2971648.2971677","DOIUrl":null,"url":null,"abstract":"Determining when an individual is eating can be useful for tracking behavior and identifying patterns, but to create nutrition logs automatically or provide real-time feedback to people with chronic disease, we need to identify both what they are consuming and in what quantity. However, food type and amount have mainly been estimated using image data (requiring user involvement) or acoustic sensors (tested with a restricted set of foods rather than representative meals). As a result, there is not yet a highly accurate automated nutrition monitoring method that can be used with a variety of foods. We propose that multi-modal sensing (in-ear audio plus head and wrist motion) can be used to more accurately classify food type, as audio and motion features provide complementary information. Further, we propose that knowing food type is critical for estimating amount consumed in combination with sensor data. To test this we use data from people wearing audio and motion sensors, with ground truth annotated from video and continuous scale data. With data from 40 unique foods we achieve a classification accuracy of 82.7% with a combination of sensors (versus 67.8% for audio alone and 76.2% for head and wrist motion). Weight estimation error was reduced from a baseline of 127.3% to 35.4% absolute relative error. Ultimately, our estimates of food type and amount can be linked to food databases to provide automated calorie estimates from continuously-collected data.","PeriodicalId":303792,"journal":{"name":"Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"94","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2971648.2971677","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 94

Abstract

Determining when an individual is eating can be useful for tracking behavior and identifying patterns, but to create nutrition logs automatically or provide real-time feedback to people with chronic disease, we need to identify both what they are consuming and in what quantity. However, food type and amount have mainly been estimated using image data (requiring user involvement) or acoustic sensors (tested with a restricted set of foods rather than representative meals). As a result, there is not yet a highly accurate automated nutrition monitoring method that can be used with a variety of foods. We propose that multi-modal sensing (in-ear audio plus head and wrist motion) can be used to more accurately classify food type, as audio and motion features provide complementary information. Further, we propose that knowing food type is critical for estimating amount consumed in combination with sensor data. To test this we use data from people wearing audio and motion sensors, with ground truth annotated from video and continuous scale data. With data from 40 unique foods we achieve a classification accuracy of 82.7% with a combination of sensors (versus 67.8% for audio alone and 76.2% for head and wrist motion). Weight estimation error was reduced from a baseline of 127.3% to 35.4% absolute relative error. Ultimately, our estimates of food type and amount can be linked to food databases to provide automated calorie estimates from continuously-collected data.

查看原文本刊更多论文

通过佩戴的声音和运动传感器自动估计食物类型和消耗的数量

确定一个人什么时候吃东西对于跟踪行为和识别模式很有用，但要自动创建营养日志或向慢性病患者提供实时反馈，我们需要确定他们吃了什么，吃了多少。然而，食物种类和数量主要是使用图像数据(需要用户参与)或声学传感器(用一组有限的食物而不是代表性食物进行测试)来估计的。因此，目前还没有一种高度精确的自动化营养监测方法，可以用于各种食物。我们建议使用多模态传感(入耳音频加上头部和手腕运动)来更准确地分类食物类型，因为音频和运动特征提供了互补的信息。此外，我们建议了解食物类型对于结合传感器数据估计消耗量至关重要。为了验证这一点，我们使用了佩戴音频和运动传感器的人的数据，以及从视频和连续尺度数据中注释的地面真相。使用来自40种独特食物的数据，我们在结合传感器的情况下实现了82.7%的分类准确率(相比之下，单独使用音频的准确率为67.8%，头部和手腕运动的准确率为76.2%)。绝对相对误差从基线的127.3%降至35.4%。最终，我们对食物种类和数量的估计可以与食物数据库联系起来，从持续收集的数据中提供自动的卡路里估计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing

自引率

0.00%

发文量