{"title":"Efficient Indexing of Regional Maximum Activations of Convolutions using Full-Text Search Engines","authors":"Giuseppe Amato, F. Carrara, F. Falchi, C. Gennaro","doi":"10.1145/3078971.3079035","DOIUrl":"https://doi.org/10.1145/3078971.3079035","url":null,"abstract":"In this paper, we adapt a surrogate text representation technique to develop efficient instance-level image retrieval using Regional Maximum Activations of Convolutions (R-MAC). R-MAC features have recently showed outstanding performance in visual instance retrieval. However, contrary to the activations of hidden layers adopting ReLU (Rectified Linear Unit), these features are dense. This constitutes an obstacle to the direct use of inverted indexes, which rely on sparsity of data. We propose the use of deep permutations, a recent approach for efficient evaluation of permutations, to generate surrogate text representation of R-MAC features, enabling indexing of visual features as text into a standard search-engine. The experiments, conducted on Lucene, show the effectiveness and efficiency of the proposed approach.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133865867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Awad, Duy-Dinh Le, C. Ngo, Vinh-Tiep Nguyen, G. Quénot, Cees G. M. Snoek, S. Satoh
{"title":"Video Indexing, Search, Detection, and Description with Focus on TRECVID","authors":"G. Awad, Duy-Dinh Le, C. Ngo, Vinh-Tiep Nguyen, G. Quénot, Cees G. M. Snoek, S. Satoh","doi":"10.1145/3078971.3079044","DOIUrl":"https://doi.org/10.1145/3078971.3079044","url":null,"abstract":"There has been a tremendous growth in video data the last decade. People are using mobile phones and tablets to take, share or watch videos more than ever before. Video cameras are around us almost everywhere in the public domain (e.g. stores, streets, public facilities, ...etc). Efficient and effective retrieval methods are critically needed in different applications. The goal of TRECVID is to encourage research in content-based video retrieval by providing large test collections, uniform scoring procedures, and a forum for organizations interested in comparing their results. In this tutorial, we present and discuss some of the most important and fundamental content-based video retrieval problems such as recognizing predefined visual concepts, searching in videos for complex ad-hoc user queries, searching by image/video examples in a video dataset to retrieve specific objects, persons, or locations, detecting events, and finally bridging the gap between vision and language by looking into how can systems automatically describe videos in a natural language. A review of the state of the art, current challenges, and future directions along with pointers to useful resources will be presented by different regular TRECVID participating teams. Each team will present one of the following tasks: Semantic INdexing (SIN) Zero-example (0Ex) Video Search (AVS) Instance Search (INS) Multimedia Event Detection (MED) Video to Text (VTT)","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"93-94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132630420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Oral Session 2: Multimedia Indexing (Spotlight presentations)","authors":"A. Ulges","doi":"10.1145/3254621","DOIUrl":"https://doi.org/10.1145/3254621","url":null,"abstract":"","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132829326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luca Rossetto, Ivan Giangreco, Claudiu Tanase, H. Schuldt
{"title":"Multimodal Video Retrieval with the 2017 IMOTION System","authors":"Luca Rossetto, Ivan Giangreco, Claudiu Tanase, H. Schuldt","doi":"10.1145/3078971.3079012","DOIUrl":"https://doi.org/10.1145/3078971.3079012","url":null,"abstract":"The IMOTION system is a multimodal content-based video search and browsing application offering a rich set of query modes on the basis of a broad range of different features. It is able to scale with the size of the collection due to its underlying flexible polystore called ADAMpro and its very effective retrieval engine Cineast, optimized for multi-feature fusion. IMOTION is simultaneously geared towards precision-focused searches, i.e., known-item search with image or text queries, and recall-focused, exploratory searches. In this demo, we will present the 2017 IMOTION system deployed on the IACC.3 collection consisting of 600 hours of Internet Archive video, which was also used in the TRECVID 2016 Ad-Hoc Video Search and in the 2017 Video Browser Showdown (VBS) challenge in which IMOTION ranked first. Conference attendees will have the chance to interact with the 2017 IMOTION system and quickly solve various retrieval tasks.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122007151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tiny Transform Net for Mobile Image Stylization","authors":"Shilun Lin, Pengfei Xiong, Hailong Liu","doi":"10.1145/3078971.3079034","DOIUrl":"https://doi.org/10.1145/3078971.3079034","url":null,"abstract":"Artistic stylization is an image transformation problem that renders an image in the style of another one. Existing methods either regard image style transfer as an optimization of perceptual loss function based on a pre-trained network, or train a feed forward network that achieves style transfer through one forward propagation. However, time-consuming optimization processes or relatively large feed forward networks are unacceptable for mobile application. In this work we propose a tiny transform net to accomplish image stylization on mobile devices. The advantages of our proposed architecture come from that: (i) The size of the carefully designed network is less than 40KB, which is more than 166 times smaller than the current popular network; (ii) Progressive training is put forward to keep the training stable, which is implemental to achieve semantics aware stylization; (iii) Deep convolutional network inference algorithm is reconstructed on mobile platform to reduce the overhead of storage and time. In addition, well-trained tiny transform nets and demo application will be made available.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125790326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Damianos Galanopoulos, Fotini Markatopoulou, V. Mezaris, I. Patras
{"title":"Concept Language Models and Event-based Concept Number Selection for Zero-example Event Detection","authors":"Damianos Galanopoulos, Fotini Markatopoulou, V. Mezaris, I. Patras","doi":"10.1145/3078971.3079043","DOIUrl":"https://doi.org/10.1145/3078971.3079043","url":null,"abstract":"Zero-example event detection is a problem where, given an event query as input but no example videos for training a detector, the system retrieves the most closely related videos. In this paper we present a fully-automatic zero-example event detection method that is based on translating the event description to a predefined set of concepts for which previously trained visual concept detectors are available. We adopt the use of Concept Language Models (CLMs), which is a method of augmenting semantic concept definition, and we propose a new concept-selection method for deciding on the appropriate number of the concepts needed to describe an event query. The proposed system achieves state-of-the-art performance in automatic zero-example event detection.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125606857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Special Oral Session: Beyond Semantics: Multimodal Understanding of Subjective Properties","authors":"Miriam Redi","doi":"10.1145/3254617","DOIUrl":"https://doi.org/10.1145/3254617","url":null,"abstract":"","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130743995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Searching for A Thing","authors":"A. Smeulders, R. Tao","doi":"10.1145/3078971.3079006","DOIUrl":"https://doi.org/10.1145/3078971.3079006","url":null,"abstract":"For humans, one picture usually suffices to identify an object of search. I am looking for this little girl, have you seen her? or Do you have such another one? are two ways to specify a target even to someone who has never seen the object of search before. Searching from one example in digital multimedia retrieval is a hard problem. From the one example one needs to derive an accurate estimate of all accidental variations in the target picture as well as the structural variation of the target in all other potential pictures. From the one example one needs to derive an accurate estimate of all accidental variations the target instance might have.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127497205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Keynote 2","authors":"N. Sebe","doi":"10.1145/3254613","DOIUrl":"https://doi.org/10.1145/3254613","url":null,"abstract":"","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126461068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Manga FaceNet: Face Detection in Manga based on Deep Neural Network","authors":"W. Chu, Wei-Wei Li","doi":"10.1145/3078971.3079031","DOIUrl":"https://doi.org/10.1145/3078971.3079031","url":null,"abstract":"Among various elements of manga, character's face plays one of the most important role in access and retrieval. We propose a DNN-based method to do manga face detection, which is a challenging but relatively unexplored topic. Given a manga page, we first find candidate regions based on the selective search scheme. A deep neural network is then proposed to detect manga faces of various appearance. We evaluate the proposed method based on a large-scale benchmark, and show performance comparison and convincing evaluation results that have rarely done before.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115075931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}