基于语义的数字音视频检索模型

IEEE International Conference on Multimedia and Expo, 2001. ICME 2001. Pub Date : 2001-08-22 DOI:10.1109/ICME.2001.1237924

S. Nepal, Uma Srinivasan, G. Reynolds

{"title":"基于语义的数字音视频检索模型","authors":"S. Nepal, Uma Srinivasan, G. Reynolds","doi":"10.1109/ICME.2001.1237924","DOIUrl":null,"url":null,"abstract":"Recent content-based retrieval systems such as QBIC [7] and VisualSEEk [8] use low-level audio-visual features such as color, pan, zoom, and loudness for retrieval. However, users prefer to retrieve videos using high-level semantics based on their perception such as \"bright color\" and \"very loud sound\". This results in a gap between what users would like and what systems can generate. This paper is an attempt to bridge this gap by mapping users’ perception (of semantic concepts) to lowlevel feature values. This paper proposes a model for providing high-level semantics for an audio feature that determines loudness. We first perform a pilot user study to capture the user perception of loudness level on a collection of audio clips of sound effects, and map them to five different semantic terms. We then describe how the loudness measure in MPEG-1 layer II audio files can be mapped to user perceived loudness. We then devise a fuzzy technique for retrieving audio/video clips from the collections using those semantic terms.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Semantic based retrieval model for digital audio and video\",\"authors\":\"S. Nepal, Uma Srinivasan, G. Reynolds\",\"doi\":\"10.1109/ICME.2001.1237924\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent content-based retrieval systems such as QBIC [7] and VisualSEEk [8] use low-level audio-visual features such as color, pan, zoom, and loudness for retrieval. However, users prefer to retrieve videos using high-level semantics based on their perception such as \\\"bright color\\\" and \\\"very loud sound\\\". This results in a gap between what users would like and what systems can generate. This paper is an attempt to bridge this gap by mapping users’ perception (of semantic concepts) to lowlevel feature values. This paper proposes a model for providing high-level semantics for an audio feature that determines loudness. We first perform a pilot user study to capture the user perception of loudness level on a collection of audio clips of sound effects, and map them to five different semantic terms. We then describe how the loudness measure in MPEG-1 layer II audio files can be mapped to user perceived loudness. We then devise a fuzzy technique for retrieving audio/video clips from the collections using those semantic terms.\",\"PeriodicalId\":405589,\"journal\":{\"name\":\"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME.2001.1237924\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2001.1237924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

最近的基于内容的检索系统，如QBIC[7]和VisualSEEk[8]使用低级的视听特征，如颜色、平移、缩放和响度进行检索。然而，用户更喜欢使用基于他们感知的高级语义来检索视频，例如“明亮的颜色”和“非常响亮的声音”。这导致了用户想要的和系统能生成的之间的差距。本文试图通过将用户的感知(语义概念)映射到低级特征值来弥合这一差距。本文提出了一个模型，为决定响度的音频特征提供高级语义。我们首先进行了一个试点用户研究，以捕捉用户对声音效果的音频剪辑集合的响度水平的感知，并将它们映射到五个不同的语义术语。然后，我们描述了MPEG-1第二层音频文件中的响度测量如何映射到用户感知的响度。然后，我们设计了一种模糊技术，用于使用这些语义术语从集合中检索音频/视频剪辑。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semantic based retrieval model for digital audio and video

Recent content-based retrieval systems such as QBIC [7] and VisualSEEk [8] use low-level audio-visual features such as color, pan, zoom, and loudness for retrieval. However, users prefer to retrieve videos using high-level semantics based on their perception such as "bright color" and "very loud sound". This results in a gap between what users would like and what systems can generate. This paper is an attempt to bridge this gap by mapping users’ perception (of semantic concepts) to lowlevel feature values. This paper proposes a model for providing high-level semantics for an audio feature that determines loudness. We first perform a pilot user study to capture the user perception of loudness level on a collection of audio clips of sound effects, and map them to five different semantic terms. We then describe how the loudness measure in MPEG-1 layer II audio files can be mapped to user perceived loudness. We then devise a fuzzy technique for retrieving audio/video clips from the collections using those semantic terms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.

自引率

0.00%

发文量