What is happening in a still picture?

Piji Li, Jun Ma
{"title":"What is happening in a still picture?","authors":"Piji Li, Jun Ma","doi":"10.1109/ACPR.2011.6166555","DOIUrl":null,"url":null,"abstract":"We consider the problem of generating concise sentences to describe still pictures automatically. We treat objects in images (nouns in sentences) as hidden information of actions (verbs). Therefore, the sentence generation problem can be transformed into action detection and scene classification problems. We employ Latent Multiple Kernel Learning (L-MKL) to learn the action detectors from “Exemplarlets”, and utilize MKL to learn the scene classifiers. The image features employed include distribution of edges, dense visual words and feature descriptors at different levels of spatial pyramid. For a new image we can detect the action using a sliding-window detector learnt via L-MKL, predict the scene the action happened in and build haction, scenei tuples. Finally, these tuples will be translated into concise sentences according to previously defined grammar template. We show both the classification and sentence generating results on our newly collected dataset of six actions as well as demonstrate improved performance over existing methods.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The First Asian Conference on Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACPR.2011.6166555","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

We consider the problem of generating concise sentences to describe still pictures automatically. We treat objects in images (nouns in sentences) as hidden information of actions (verbs). Therefore, the sentence generation problem can be transformed into action detection and scene classification problems. We employ Latent Multiple Kernel Learning (L-MKL) to learn the action detectors from “Exemplarlets”, and utilize MKL to learn the scene classifiers. The image features employed include distribution of edges, dense visual words and feature descriptors at different levels of spatial pyramid. For a new image we can detect the action using a sliding-window detector learnt via L-MKL, predict the scene the action happened in and build haction, scenei tuples. Finally, these tuples will be translated into concise sentences according to previously defined grammar template. We show both the classification and sentence generating results on our newly collected dataset of six actions as well as demonstrate improved performance over existing methods.
静止画面中发生了什么?
我们考虑了自动生成简明句子来描述静态图片的问题。我们将图像中的物体(句子中的名词)视为动作(动词)的隐藏信息。因此,句子生成问题可以转化为动作检测和场景分类问题。我们利用潜多核学习(L-MKL)从“Exemplarlets”中学习动作检测器,并利用潜多核学习学习场景分类器。采用的图像特征包括边缘分布、密集的视觉词和空间金字塔不同层次上的特征描述符。对于新图像,我们可以使用通过L-MKL学习的滑动窗口检测器来检测动作,预测动作发生的场景并构建动作、场景元组。最后,根据之前定义的语法模板将这些元组翻译成简明的句子。我们在新收集的六个动作数据集上展示了分类和句子生成结果,并展示了比现有方法更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信