视听指纹和跨模态聚合:组件和应用

2008 IEEE International Symposium on Consumer Electronics Pub Date : 2008-04-14 DOI:10.1109/ISCE.2008.4559483

P. Dunker, M. Gruhne

{"title":"视听指纹和跨模态聚合:组件和应用","authors":"P. Dunker, M. Gruhne","doi":"10.1109/ISCE.2008.4559483","DOIUrl":null,"url":null,"abstract":"Within the last years the amount of digital media has been spread due to efficient media encoding algorithms. Hence, a large number of audio and video files are stored on the users hard disks and on popular video community platforms. Due to the lack of suitable or disobeyed metadata standards, the description of these data is often missing or misleading. Therefore, audio and visual identification algorithms have been developed, which identify videos or pieces of music and provide a suitable metadata description or copyright information based on a content database. Integrating both information, the visual and the audio part of the video for simultaneous identification is called cross-modal processing. In this paper the principle structure of an audio and a visual identification system is identified and different state-of-the-art algorithms are discussed. Furthermore, a cross-modal system is presented and especially the cross aggregation is discussed. Finally, current use cases for audio, visual and cross-modal search and retrieval are depicted.","PeriodicalId":378486,"journal":{"name":"2008 IEEE International Symposium on Consumer Electronics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Audio-visual fingerprinting and cross-modal aggregation: Components and applications\",\"authors\":\"P. Dunker, M. Gruhne\",\"doi\":\"10.1109/ISCE.2008.4559483\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Within the last years the amount of digital media has been spread due to efficient media encoding algorithms. Hence, a large number of audio and video files are stored on the users hard disks and on popular video community platforms. Due to the lack of suitable or disobeyed metadata standards, the description of these data is often missing or misleading. Therefore, audio and visual identification algorithms have been developed, which identify videos or pieces of music and provide a suitable metadata description or copyright information based on a content database. Integrating both information, the visual and the audio part of the video for simultaneous identification is called cross-modal processing. In this paper the principle structure of an audio and a visual identification system is identified and different state-of-the-art algorithms are discussed. Furthermore, a cross-modal system is presented and especially the cross aggregation is discussed. Finally, current use cases for audio, visual and cross-modal search and retrieval are depicted.\",\"PeriodicalId\":378486,\"journal\":{\"name\":\"2008 IEEE International Symposium on Consumer Electronics\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Symposium on Consumer Electronics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCE.2008.4559483\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Symposium on Consumer Electronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCE.2008.4559483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在过去的几年里，由于高效的媒体编码算法，数字媒体的数量已经扩散。因此，大量的音视频文件存储在用户的硬盘和流行的视频社区平台上。由于缺乏合适的元数据标准或不遵守元数据标准，这些数据的描述经常缺失或具有误导性。因此，音频和视觉识别算法已经被开发出来，它们可以识别视频或音乐片段，并根据内容数据库提供合适的元数据描述或版权信息。将视频的视觉部分和音频部分同时进行识别，称为跨模态处理。本文介绍了声视识别系统的基本结构，并讨论了不同的算法。在此基础上，提出了一个跨模态系统，并着重讨论了交叉聚合。最后，描述了音频、视觉和跨模态搜索和检索的当前用例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Audio-visual fingerprinting and cross-modal aggregation: Components and applications

Within the last years the amount of digital media has been spread due to efficient media encoding algorithms. Hence, a large number of audio and video files are stored on the users hard disks and on popular video community platforms. Due to the lack of suitable or disobeyed metadata standards, the description of these data is often missing or misleading. Therefore, audio and visual identification algorithms have been developed, which identify videos or pieces of music and provide a suitable metadata description or copyright information based on a content database. Integrating both information, the visual and the audio part of the video for simultaneous identification is called cross-modal processing. In this paper the principle structure of an audio and a visual identification system is identified and different state-of-the-art algorithms are discussed. Furthermore, a cross-modal system is presented and especially the cross aggregation is discussed. Finally, current use cases for audio, visual and cross-modal search and retrieval are depicted.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE International Symposium on Consumer Electronics

自引率

0.00%

发文量