{"title":"视频索引与检索的深度学习方法","authors":"X. Men, F. Zhou, Xiaoyong Li","doi":"10.2312/PG.20181287","DOIUrl":null,"url":null,"abstract":"In this paper, we proposed a deep neural network based method for content based video retrieval. Our approach leveraged the deep neural network to generate the semantic information and introduced the graph-based storage structure to establish the video indices. We devised the Inception-Single Shot Multibox Detector (ISSD) and RI3D model to extract spatial semantic information (objects) and extract temporal semantic information (actions). Our ISSD model achieved a mAP of 26.7% on MS COCO dataset, increasing 3.2% over the original SSD model, while the RI3D model achieved a top-1 accuracy of 97.7% on dataset UCF-101. And we also introduced the graph structure to build the video index with the temporal and spatial semantic information. Our experiment results showed that the deep learned semantic information is highly effective for video indexing and retrieval.","PeriodicalId":88304,"journal":{"name":"Proceedings. Pacific Conference on Computer Graphics and Applications","volume":"44 1","pages":"85-88"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Deep Learned Method for Video Indexing and Retrieval\",\"authors\":\"X. Men, F. Zhou, Xiaoyong Li\",\"doi\":\"10.2312/PG.20181287\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we proposed a deep neural network based method for content based video retrieval. Our approach leveraged the deep neural network to generate the semantic information and introduced the graph-based storage structure to establish the video indices. We devised the Inception-Single Shot Multibox Detector (ISSD) and RI3D model to extract spatial semantic information (objects) and extract temporal semantic information (actions). Our ISSD model achieved a mAP of 26.7% on MS COCO dataset, increasing 3.2% over the original SSD model, while the RI3D model achieved a top-1 accuracy of 97.7% on dataset UCF-101. And we also introduced the graph structure to build the video index with the temporal and spatial semantic information. Our experiment results showed that the deep learned semantic information is highly effective for video indexing and retrieval.\",\"PeriodicalId\":88304,\"journal\":{\"name\":\"Proceedings. Pacific Conference on Computer Graphics and Applications\",\"volume\":\"44 1\",\"pages\":\"85-88\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. Pacific Conference on Computer Graphics and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2312/PG.20181287\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Pacific Conference on Computer Graphics and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2312/PG.20181287","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Deep Learned Method for Video Indexing and Retrieval
In this paper, we proposed a deep neural network based method for content based video retrieval. Our approach leveraged the deep neural network to generate the semantic information and introduced the graph-based storage structure to establish the video indices. We devised the Inception-Single Shot Multibox Detector (ISSD) and RI3D model to extract spatial semantic information (objects) and extract temporal semantic information (actions). Our ISSD model achieved a mAP of 26.7% on MS COCO dataset, increasing 3.2% over the original SSD model, while the RI3D model achieved a top-1 accuracy of 97.7% on dataset UCF-101. And we also introduced the graph structure to build the video index with the temporal and spatial semantic information. Our experiment results showed that the deep learned semantic information is highly effective for video indexing and retrieval.