基于目标检测衍生特征的视频语义索引

2016 24th European Signal Processing Conference (EUSIPCO) Pub Date : 2016-11-28 DOI:10.1109/EUSIPCO.2016.7760456

Kotaro Kikuchi, K. Ueki, Tetsuji Ogawa, Tetsunori Kobayashi

{"title":"基于目标检测衍生特征的视频语义索引","authors":"Kotaro Kikuchi, K. Ueki, Tetsuji Ogawa, Tetsunori Kobayashi","doi":"10.1109/EUSIPCO.2016.7760456","DOIUrl":null,"url":null,"abstract":"A new feature extraction method based on object detection to achieve accurate and robust semantic indexing of videos is proposed. Local features (e.g., SIFT and HOG) and convolutional neural network (CNN)-derived features, which have been used in semantic indexing, in general are extracted from the entire image and do not explicitly represent the information of meaningful objects that contributes to the determination of semantic categories. In this case, the background region, which does not contain the meaningful objects, is unduly considered, exerting a harmful effect on the indexing performance. In the present study, an attempt was made to suppress the undesirable effects derived from the redundant background information by incorporating object detection technology into semantic indexing. In the proposed method, a combination of the meaningful objects detected in the video frame image is represented as a feature vector for verification of semantic categories. Experimental comparisons demonstrate that the proposed method facilitates the TRECVID semantic indexing task.","PeriodicalId":127068,"journal":{"name":"2016 24th European Signal Processing Conference (EUSIPCO)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Video semantic indexing using object detection-derived features\",\"authors\":\"Kotaro Kikuchi, K. Ueki, Tetsuji Ogawa, Tetsunori Kobayashi\",\"doi\":\"10.1109/EUSIPCO.2016.7760456\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A new feature extraction method based on object detection to achieve accurate and robust semantic indexing of videos is proposed. Local features (e.g., SIFT and HOG) and convolutional neural network (CNN)-derived features, which have been used in semantic indexing, in general are extracted from the entire image and do not explicitly represent the information of meaningful objects that contributes to the determination of semantic categories. In this case, the background region, which does not contain the meaningful objects, is unduly considered, exerting a harmful effect on the indexing performance. In the present study, an attempt was made to suppress the undesirable effects derived from the redundant background information by incorporating object detection technology into semantic indexing. In the proposed method, a combination of the meaningful objects detected in the video frame image is represented as a feature vector for verification of semantic categories. Experimental comparisons demonstrate that the proposed method facilitates the TRECVID semantic indexing task.\",\"PeriodicalId\":127068,\"journal\":{\"name\":\"2016 24th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 24th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EUSIPCO.2016.7760456\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 24th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EUSIPCO.2016.7760456","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

提出了一种新的基于目标检测的特征提取方法，以实现准确、鲁棒的视频语义索引。用于语义索引的局部特征(例如SIFT和HOG)和卷积神经网络(CNN)衍生特征通常是从整个图像中提取的，并且不能明确表示有助于确定语义类别的有意义对象的信息。在这种情况下，不考虑包含有意义对象的背景区域，对索引性能产生不利影响。在本研究中，通过将目标检测技术引入语义索引，试图抑制冗余背景信息带来的不良影响。在该方法中，将视频帧图像中检测到的有意义对象组合表示为特征向量，用于验证语义类别。实验结果表明，该方法能够较好地完成TRECVID语义索引任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Video semantic indexing using object detection-derived features

A new feature extraction method based on object detection to achieve accurate and robust semantic indexing of videos is proposed. Local features (e.g., SIFT and HOG) and convolutional neural network (CNN)-derived features, which have been used in semantic indexing, in general are extracted from the entire image and do not explicitly represent the information of meaningful objects that contributes to the determination of semantic categories. In this case, the background region, which does not contain the meaningful objects, is unduly considered, exerting a harmful effect on the indexing performance. In the present study, an attempt was made to suppress the undesirable effects derived from the redundant background information by incorporating object detection technology into semantic indexing. In the proposed method, a combination of the meaningful objects detected in the video frame image is represented as a feature vector for verification of semantic categories. Experimental comparisons demonstrate that the proposed method facilitates the TRECVID semantic indexing task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 24th European Signal Processing Conference (EUSIPCO)

自引率

0.00%

发文量