Georg Schroth, S. Hilsenbeck, Robert Huitl, F. Schweiger, E. Steinbach
{"title":"Exploiting Text-Related Features for Content-based Image Retrieval","authors":"Georg Schroth, S. Hilsenbeck, Robert Huitl, F. Schweiger, E. Steinbach","doi":"10.1109/ISM.2011.21","DOIUrl":null,"url":null,"abstract":"Distinctive visual cues are of central importance for image retrieval applications, in particular, in the context of visual location recognition. While in indoor environments typically only few distinctive features can be found, outdoors dynamic objects and clutter significantly impair the retrieval performance. We present an approach which exploits text, a major source of information for humans during orientation and navigation, without the need for error-prone optical character recognition. To this end, characters are detected and described using robust feature descriptors like SURF. By quantizing them into several hundred visual words we consider the distinctive appearance of the characters rather than reducing the set of possible features to an alphabet. Writings in images are transformed to strings of visual words termed visual phrases, which provide significantly improved distinctiveness when compared to individual features. An approximate string matching is performed using N-grams, which can be efficiently combined with an inverted file structure to cope with large datasets. An experimental evaluation on three different datasets shows significant improvement of the retrieval performance while reducing the size of the database by two orders of magnitude compared to state-of-the-art. Its low computational complexity makes the approach particularly suited for mobile image retrieval applications.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"125 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Symposium on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISM.2011.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 28
Abstract
Distinctive visual cues are of central importance for image retrieval applications, in particular, in the context of visual location recognition. While in indoor environments typically only few distinctive features can be found, outdoors dynamic objects and clutter significantly impair the retrieval performance. We present an approach which exploits text, a major source of information for humans during orientation and navigation, without the need for error-prone optical character recognition. To this end, characters are detected and described using robust feature descriptors like SURF. By quantizing them into several hundred visual words we consider the distinctive appearance of the characters rather than reducing the set of possible features to an alphabet. Writings in images are transformed to strings of visual words termed visual phrases, which provide significantly improved distinctiveness when compared to individual features. An approximate string matching is performed using N-grams, which can be efficiently combined with an inverted file structure to cope with large datasets. An experimental evaluation on three different datasets shows significant improvement of the retrieval performance while reducing the size of the database by two orders of magnitude compared to state-of-the-art. Its low computational complexity makes the approach particularly suited for mobile image retrieval applications.