Image classification and retrieval on the World Wide Web

IF 1.2 Q3 INFORMATION SCIENCE & LIBRARY SCIENCE

Digital Library Perspectives Pub Date : 1999-08-01 DOI:10.1145/313238.313316

Noureddine Abbadeni, D. Ziou, Shengrui Wang

{"title":"Image classification and retrieval on the World Wide Web","authors":"Noureddine Abbadeni, D. Ziou, Shengrui Wang","doi":"10.1145/313238.313316","DOIUrl":null,"url":null,"abstract":"Image retrieval is emerging as an important research area with many applications in various fields such as image and multimedia databases and digital libraries. The World Wide Web is an enormous, distributed, hypermedia and non-structured information system. Tens of millions of images exist on the World Wide Web. Developing tools which would make it possible to seek specific images in this enormous image database is of an unquestionable utility and would give to the WWW all its potential. However, the search for images in the context of the WWW is an extremely difficult task and poses new challenges. Two characteristics are to be taken into account when dealing with images on the WWW: 1. the incredibly large size of all these images and the extraordinary diversity of types of images which one can find on the WWW; 2. in the field of image processing and computer vision, there is no general algorithm able to process all types of images. Two fundamental issues must be addressed when developing retrieval tools: effectiveness of search and effectivity of search. The effectiveness implies that one can find information in a reasonable time. With the power of the current workstations and the development of techniques such as parallel programming and multi-thread programming, the effectiveness is not a bottleneck. However the effectivity of the images retrieved compared to the request is a major problem and should be examined more closely. The majority of the retrieval tools existing on the WWW are not pertinent: many retrieved documents are not pertinent to the request (noise) and many documents pertinent to the request are not retrieved (silence). Taking into account all these facts, we believe that a preliminary and crucial step before developing an image retrieval tool on the WWW is, first, to classify these images in many classes such as photographs, graphics, cartoons, faces, textured images, color images, etc. and then perform a search in each class. Doing so, we take at least two advantages: 1. effectivity is improved (noise and silence are reduced) since search is done in a specific class and not in all the database; 2. we can apply appropriate algorithms on each class of images given that a general algorithm for all types of images does not exist. In this paper, we are interested in performing image search on the WWW using both the image content and textual key-words. The following features are yet available in our system:","PeriodicalId":42447,"journal":{"name":"Digital Library Perspectives","volume":"22 1","pages":"208-209"},"PeriodicalIF":1.2000,"publicationDate":"1999-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Library Perspectives","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/313238.313316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 3

Abstract

Image retrieval is emerging as an important research area with many applications in various fields such as image and multimedia databases and digital libraries. The World Wide Web is an enormous, distributed, hypermedia and non-structured information system. Tens of millions of images exist on the World Wide Web. Developing tools which would make it possible to seek specific images in this enormous image database is of an unquestionable utility and would give to the WWW all its potential. However, the search for images in the context of the WWW is an extremely difficult task and poses new challenges. Two characteristics are to be taken into account when dealing with images on the WWW: 1. the incredibly large size of all these images and the extraordinary diversity of types of images which one can find on the WWW; 2. in the field of image processing and computer vision, there is no general algorithm able to process all types of images. Two fundamental issues must be addressed when developing retrieval tools: effectiveness of search and effectivity of search. The effectiveness implies that one can find information in a reasonable time. With the power of the current workstations and the development of techniques such as parallel programming and multi-thread programming, the effectiveness is not a bottleneck. However the effectivity of the images retrieved compared to the request is a major problem and should be examined more closely. The majority of the retrieval tools existing on the WWW are not pertinent: many retrieved documents are not pertinent to the request (noise) and many documents pertinent to the request are not retrieved (silence). Taking into account all these facts, we believe that a preliminary and crucial step before developing an image retrieval tool on the WWW is, first, to classify these images in many classes such as photographs, graphics, cartoons, faces, textured images, color images, etc. and then perform a search in each class. Doing so, we take at least two advantages: 1. effectivity is improved (noise and silence are reduced) since search is done in a specific class and not in all the database; 2. we can apply appropriate algorithms on each class of images given that a general algorithm for all types of images does not exist. In this paper, we are interested in performing image search on the WWW using both the image content and textual key-words. The following features are yet available in our system:

查看原文本刊更多论文

万维网上的图像分类和检索

图像检索正在成为一个重要的研究领域，在图像和多媒体数据库、数字图书馆等各个领域都有广泛的应用。万维网是一个庞大的、分布式的、超媒体的、非结构化的信息系统。万维网上有数千万张图片。开发能够在这个庞大的图像数据库中寻找特定图像的工具是毫无疑问的实用工具，并将赋予WWW所有的潜力。然而，在WWW的背景下搜索图像是一项极其困难的任务，并提出了新的挑战。在处理WWW上的图像时，要考虑两个特点:1。所有这些图片的尺寸都大得令人难以置信，人们可以在WWW上找到各种各样的图片;2. 在图像处理和计算机视觉领域，没有一种通用的算法能够处理所有类型的图像。在开发检索工具时必须解决两个基本问题:搜索的有效性和搜索的有效性。这种有效性意味着人们可以在合理的时间内找到信息。随着当前工作站的强大功能和并行编程、多线程编程等技术的发展，其有效性已不再是瓶颈。然而，与请求相比，检索到的图像的有效性是一个主要问题，应该更仔细地检查。WWW上现有的大多数检索工具都是不相关的:许多检索到的文档与请求无关(噪音)，许多与请求相关的文档没有检索到(沉默)。考虑到所有这些事实，我们认为在开发WWW上的图像检索工具之前的一个初步和关键的步骤是，首先，将这些图像分类为许多类，如照片，图形，漫画，人脸，纹理图像，彩色图像等，然后在每个类中执行搜索。这样做，我们至少有两个好处:1。由于搜索是在特定的类中进行的，而不是在所有数据库中进行的，因此提高了效率(减少了噪声和沉默);2. 在不存在适用于所有类型图像的通用算法的情况下，我们可以对每一类图像应用合适的算法。在本文中，我们感兴趣的是在WWW上使用图像内容和文本关键字进行图像搜索。以下功能在我们的系统中还可用:

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital Library Perspectives INFORMATION SCIENCE & LIBRARY SCIENCE-

CiteScore

3.90

自引率

11.80%

发文量

期刊介绍： Digital Library Perspectives (DLP) is a peer-reviewed journal concerned with digital content collections. It publishes research related to the curation and web-based delivery of digital objects collected for the advancement of scholarship, teaching and learning. And which advance the digital information environment as it relates to global knowledge, communication and world memory. The journal aims to keep readers informed about current trends, initiatives, and developments. Including those in digital libraries and digital repositories, along with their standards and technologies. The editor invites contributions on the following, as well as other related topics: Digitization, Data as information, Archives and manuscripts, Digital preservation and digital archiving, Digital cultural memory initiatives, Usability studies, K-12 and higher education uses of digital collections.