Describing Lifelogs with Convolutional Neural Networks: A Comparative Study

Proceedings of the first Workshop on Lifelogging Tools and Applications Pub Date : 2016-10-16 DOI:10.1145/2983576.2983579

A. Molino, Qianli Xu, Joo-Hwee Lim

{"title":"Describing Lifelogs with Convolutional Neural Networks: A Comparative Study","authors":"A. Molino, Qianli Xu, Joo-Hwee Lim","doi":"10.1145/2983576.2983579","DOIUrl":null,"url":null,"abstract":"Life-logging technologies, e.g. wearable cameras taking pictures at a fixed interval, can be used as a means of memory preservation (in digital form), caregiver monitoring and even cognitive therapy to train our brains. Yet, such large amount of data needs to be processed and edited to be of use. Automatic summarization of the life-logs into short story boards is a possible solution. But how good are these summaries? Are the selected key-frames informative and representative enough as to be good memory cues? The proposed approach (i) filters uninformative images by analyzing their ratio of edges and (ii) describes the images using the available Convolutional Neural Networks (CNN) models for objects and places with egocentric-driven data augmentation. We perform a comparative study to evaluate different summarization methods in terms of coverage, informativeness and representativeness in two different datasets, both with annotated ground truth and an on-line user study. Results show that filtering uninformative images improves the user satisfaction: users would request to change less frames from the original summary than without filtering. Moreover, the proposed egocentric image descriptor generates more diverse content than the standard cropping strategy used by most CNN-based approaches.","PeriodicalId":352947,"journal":{"name":"Proceedings of the first Workshop on Lifelogging Tools and Applications","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the first Workshop on Lifelogging Tools and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2983576.2983579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Life-logging technologies, e.g. wearable cameras taking pictures at a fixed interval, can be used as a means of memory preservation (in digital form), caregiver monitoring and even cognitive therapy to train our brains. Yet, such large amount of data needs to be processed and edited to be of use. Automatic summarization of the life-logs into short story boards is a possible solution. But how good are these summaries? Are the selected key-frames informative and representative enough as to be good memory cues? The proposed approach (i) filters uninformative images by analyzing their ratio of edges and (ii) describes the images using the available Convolutional Neural Networks (CNN) models for objects and places with egocentric-driven data augmentation. We perform a comparative study to evaluate different summarization methods in terms of coverage, informativeness and representativeness in two different datasets, both with annotated ground truth and an on-line user study. Results show that filtering uninformative images improves the user satisfaction: users would request to change less frames from the original summary than without filtering. Moreover, the proposed egocentric image descriptor generates more diverse content than the standard cropping strategy used by most CNN-based approaches.

查看原文本刊更多论文

用卷积神经网络描述生活日志:比较研究

生活记录技术，例如以固定间隔拍摄照片的可穿戴相机，可以用作记忆保存(以数字形式)，看护人监控甚至认知疗法来训练我们的大脑。然而，如此大量的数据需要经过处理和编辑才能使用。将生活日志自动总结成短篇故事板是一种可能的解决方案。但是这些总结有多好呢?所选择的关键帧信息和代表性是否足以作为好的记忆线索?所提出的方法(i)通过分析其边缘比例来过滤无信息的图像，(ii)使用可用的卷积神经网络(CNN)模型来描述具有自我中心驱动数据增强的物体和地点的图像。我们进行了一项比较研究，以评估不同的总结方法在两个不同的数据集上的覆盖率、信息量和代表性，这两个数据集都带有注释的地面事实和在线用户研究。结果表明，过滤无信息的图像提高了用户满意度:用户要求从原始摘要中更改的帧数比未过滤的少。此外，与大多数基于cnn的方法使用的标准裁剪策略相比，所提出的以自我为中心的图像描述符生成的内容更多样化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the first Workshop on Lifelogging Tools and Applications

自引率

0.00%

发文量