{"title":"Describing Lifelogs with Convolutional Neural Networks: A Comparative Study","authors":"A. Molino, Qianli Xu, Joo-Hwee Lim","doi":"10.1145/2983576.2983579","DOIUrl":null,"url":null,"abstract":"Life-logging technologies, e.g. wearable cameras taking pictures at a fixed interval, can be used as a means of memory preservation (in digital form), caregiver monitoring and even cognitive therapy to train our brains. Yet, such large amount of data needs to be processed and edited to be of use. Automatic summarization of the life-logs into short story boards is a possible solution. But how good are these summaries? Are the selected key-frames informative and representative enough as to be good memory cues? The proposed approach (i) filters uninformative images by analyzing their ratio of edges and (ii) describes the images using the available Convolutional Neural Networks (CNN) models for objects and places with egocentric-driven data augmentation. We perform a comparative study to evaluate different summarization methods in terms of coverage, informativeness and representativeness in two different datasets, both with annotated ground truth and an on-line user study. Results show that filtering uninformative images improves the user satisfaction: users would request to change less frames from the original summary than without filtering. Moreover, the proposed egocentric image descriptor generates more diverse content than the standard cropping strategy used by most CNN-based approaches.","PeriodicalId":352947,"journal":{"name":"Proceedings of the first Workshop on Lifelogging Tools and Applications","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the first Workshop on Lifelogging Tools and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2983576.2983579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Life-logging technologies, e.g. wearable cameras taking pictures at a fixed interval, can be used as a means of memory preservation (in digital form), caregiver monitoring and even cognitive therapy to train our brains. Yet, such large amount of data needs to be processed and edited to be of use. Automatic summarization of the life-logs into short story boards is a possible solution. But how good are these summaries? Are the selected key-frames informative and representative enough as to be good memory cues? The proposed approach (i) filters uninformative images by analyzing their ratio of edges and (ii) describes the images using the available Convolutional Neural Networks (CNN) models for objects and places with egocentric-driven data augmentation. We perform a comparative study to evaluate different summarization methods in terms of coverage, informativeness and representativeness in two different datasets, both with annotated ground truth and an on-line user study. Results show that filtering uninformative images improves the user satisfaction: users would request to change less frames from the original summary than without filtering. Moreover, the proposed egocentric image descriptor generates more diverse content than the standard cropping strategy used by most CNN-based approaches.