视听指纹和跨模态聚合:组件和应用

P. Dunker, M. Gruhne
{"title":"视听指纹和跨模态聚合:组件和应用","authors":"P. Dunker, M. Gruhne","doi":"10.1109/ISCE.2008.4559483","DOIUrl":null,"url":null,"abstract":"Within the last years the amount of digital media has been spread due to efficient media encoding algorithms. Hence, a large number of audio and video files are stored on the users hard disks and on popular video community platforms. Due to the lack of suitable or disobeyed metadata standards, the description of these data is often missing or misleading. Therefore, audio and visual identification algorithms have been developed, which identify videos or pieces of music and provide a suitable metadata description or copyright information based on a content database. Integrating both information, the visual and the audio part of the video for simultaneous identification is called cross-modal processing. In this paper the principle structure of an audio and a visual identification system is identified and different state-of-the-art algorithms are discussed. Furthermore, a cross-modal system is presented and especially the cross aggregation is discussed. Finally, current use cases for audio, visual and cross-modal search and retrieval are depicted.","PeriodicalId":378486,"journal":{"name":"2008 IEEE International Symposium on Consumer Electronics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Audio-visual fingerprinting and cross-modal aggregation: Components and applications\",\"authors\":\"P. Dunker, M. Gruhne\",\"doi\":\"10.1109/ISCE.2008.4559483\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Within the last years the amount of digital media has been spread due to efficient media encoding algorithms. Hence, a large number of audio and video files are stored on the users hard disks and on popular video community platforms. Due to the lack of suitable or disobeyed metadata standards, the description of these data is often missing or misleading. Therefore, audio and visual identification algorithms have been developed, which identify videos or pieces of music and provide a suitable metadata description or copyright information based on a content database. Integrating both information, the visual and the audio part of the video for simultaneous identification is called cross-modal processing. In this paper the principle structure of an audio and a visual identification system is identified and different state-of-the-art algorithms are discussed. Furthermore, a cross-modal system is presented and especially the cross aggregation is discussed. Finally, current use cases for audio, visual and cross-modal search and retrieval are depicted.\",\"PeriodicalId\":378486,\"journal\":{\"name\":\"2008 IEEE International Symposium on Consumer Electronics\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Symposium on Consumer Electronics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCE.2008.4559483\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Symposium on Consumer Electronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCE.2008.4559483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

在过去的几年里,由于高效的媒体编码算法,数字媒体的数量已经扩散。因此,大量的音视频文件存储在用户的硬盘和流行的视频社区平台上。由于缺乏合适的元数据标准或不遵守元数据标准,这些数据的描述经常缺失或具有误导性。因此,音频和视觉识别算法已经被开发出来,它们可以识别视频或音乐片段,并根据内容数据库提供合适的元数据描述或版权信息。将视频的视觉部分和音频部分同时进行识别,称为跨模态处理。本文介绍了声视识别系统的基本结构,并讨论了不同的算法。在此基础上,提出了一个跨模态系统,并着重讨论了交叉聚合。最后,描述了音频、视觉和跨模态搜索和检索的当前用例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Audio-visual fingerprinting and cross-modal aggregation: Components and applications
Within the last years the amount of digital media has been spread due to efficient media encoding algorithms. Hence, a large number of audio and video files are stored on the users hard disks and on popular video community platforms. Due to the lack of suitable or disobeyed metadata standards, the description of these data is often missing or misleading. Therefore, audio and visual identification algorithms have been developed, which identify videos or pieces of music and provide a suitable metadata description or copyright information based on a content database. Integrating both information, the visual and the audio part of the video for simultaneous identification is called cross-modal processing. In this paper the principle structure of an audio and a visual identification system is identified and different state-of-the-art algorithms are discussed. Furthermore, a cross-modal system is presented and especially the cross aggregation is discussed. Finally, current use cases for audio, visual and cross-modal search and retrieval are depicted.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信