Personal Diary Generation from Wearable Cameras with Concept Augmented Image Captioning and Wide Trail Strategy

Viet-Khoa Vo-Ho, Quoc-An Luong, Duy-Tam Nguyen, Mai-Khiem Tran, Minh-Triet Tran
{"title":"Personal Diary Generation from Wearable Cameras with Concept Augmented Image Captioning and Wide Trail Strategy","authors":"Viet-Khoa Vo-Ho, Quoc-An Luong, Duy-Tam Nguyen, Mai-Khiem Tran, Minh-Triet Tran","doi":"10.1145/3287921.3287955","DOIUrl":null,"url":null,"abstract":"Writing diary is not only a hobby but also provides a personal lifelog for better analysis and understanding of a user's daily activities and events. However, in a busy society, people may not have enough time to write in diary all their social interaction. This motivates our proposal to develop a ubiquitous system to automatically generate daily text diary using our novel method for image captioning from photos taken periodically from wearable cameras. We propose to incorporate common visual concepts extracted from a photo to enhance the details of the image description. We also propose a wide trail beam search strategy to enhance the naturalness of text caption. Our captioning method improves the results on MSCOCO dataset on four metrics: BLEU, METEOR, ROUGE-L, CIDEr. As compared to the method proposed by Xu et.al and Neuraltalk of Karpathy, our model has better performance on all four metrics. We also develop smart glasses and a prototype smart workplace in which people can have their personal diary generated from photos taken by smart glasses. Furthermore, we also apply a transformer machine translation model in order to translate captions into Vietnamese language. The results are promising and can be used for Vietnamese people.","PeriodicalId":448008,"journal":{"name":"Proceedings of the 9th International Symposium on Information and Communication Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th International Symposium on Information and Communication Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3287921.3287955","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Writing diary is not only a hobby but also provides a personal lifelog for better analysis and understanding of a user's daily activities and events. However, in a busy society, people may not have enough time to write in diary all their social interaction. This motivates our proposal to develop a ubiquitous system to automatically generate daily text diary using our novel method for image captioning from photos taken periodically from wearable cameras. We propose to incorporate common visual concepts extracted from a photo to enhance the details of the image description. We also propose a wide trail beam search strategy to enhance the naturalness of text caption. Our captioning method improves the results on MSCOCO dataset on four metrics: BLEU, METEOR, ROUGE-L, CIDEr. As compared to the method proposed by Xu et.al and Neuraltalk of Karpathy, our model has better performance on all four metrics. We also develop smart glasses and a prototype smart workplace in which people can have their personal diary generated from photos taken by smart glasses. Furthermore, we also apply a transformer machine translation model in order to translate captions into Vietnamese language. The results are promising and can be used for Vietnamese people.
基于概念增强图像字幕和宽径策略的可穿戴相机的个人日记生成
写日记不仅是一种爱好,也是一种个人生活日志,可以更好地分析和理解用户的日常活动和事件。然而,在一个繁忙的社会中,人们可能没有足够的时间在日记中写下他们所有的社交活动。这促使我们提出开发一种无处不在的系统,使用我们的新方法从可穿戴相机定期拍摄的照片中自动生成每日文本日记。我们建议结合从照片中提取的常见视觉概念来增强图像描述的细节。我们还提出了一种宽尾束搜索策略来增强文本标题的自然度。我们的字幕方法在四个指标上改进了MSCOCO数据集上的结果:BLEU, METEOR, ROUGE-L, CIDEr。与Xu等人和Karpathy的Neuraltalk提出的方法相比,我们的模型在所有四个指标上都有更好的表现。我们还开发了智能眼镜和一个原型智能工作场所,人们可以通过智能眼镜拍摄的照片生成个人日记。此外,我们还应用了一个变压器机器翻译模型,将字幕翻译成越南语。结果是有希望的,可以用于越南人民。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信