用于视听事件检测的时间依赖输入输出马尔可夫模型

IEEE International Conference on Multimedia and Expo, 2001. ICME 2001. Pub Date : 2001-08-22 DOI:10.1109/ICME.2001.1237704

M. Naphade, A. Garg, Thomas S. Huang

{"title":"用于视听事件检测的时间依赖输入输出马尔可夫模型","authors":"M. Naphade, A. Garg, Thomas S. Huang","doi":"10.1109/ICME.2001.1237704","DOIUrl":null,"url":null,"abstract":"Detecting semantic events from audio-visual data with Spatiotemporal support is a challenging multimedia Understanding problem. The difficulty lies in the gap that exists between low level media features and high level semantic concept. We present a duration dependent input output Markov model (DDIOMM) to detect events based on multiple modalities. The DDIOMM combines the ability to model nonexponential duration densities with the mapping of input sequences to output sequences. In spirit it resembles the IOHMMs [1] as well as inhomogeneousHMMs [2]. We use the DDIOMM to model the audio-visual event explosion. We compare the detection performance of the DDIOMM with the IOMM as well as the HMM. Experiments reveal that modeling of duration improves detection performance.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Duration dependent input output markov models for audio-visual event detection\",\"authors\":\"M. Naphade, A. Garg, Thomas S. Huang\",\"doi\":\"10.1109/ICME.2001.1237704\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Detecting semantic events from audio-visual data with Spatiotemporal support is a challenging multimedia Understanding problem. The difficulty lies in the gap that exists between low level media features and high level semantic concept. We present a duration dependent input output Markov model (DDIOMM) to detect events based on multiple modalities. The DDIOMM combines the ability to model nonexponential duration densities with the mapping of input sequences to output sequences. In spirit it resembles the IOHMMs [1] as well as inhomogeneousHMMs [2]. We use the DDIOMM to model the audio-visual event explosion. We compare the detection performance of the DDIOMM with the IOMM as well as the HMM. Experiments reveal that modeling of duration improves detection performance.\",\"PeriodicalId\":405589,\"journal\":{\"name\":\"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME.2001.1237704\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2001.1237704","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

在时空支持下从视听数据中检测语义事件是一个具有挑战性的多媒体理解问题。难点在于低层次的媒介特征与高层次的语义概念之间存在差距。我们提出了一个依赖于时间的输入输出马尔可夫模型(DDIOMM)来检测基于多模态的事件。DDIOMM将建模非指数持续时间密度的能力与输入序列到输出序列的映射相结合。在精神上，它类似于iohmm b[1]以及非均匀的iohmm b[2]。我们使用DDIOMM来模拟视听事件爆炸。我们比较了DDIOMM与IOMM以及HMM的检测性能。实验表明，持续时间建模可以提高检测性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Duration dependent input output markov models for audio-visual event detection

Detecting semantic events from audio-visual data with Spatiotemporal support is a challenging multimedia Understanding problem. The difficulty lies in the gap that exists between low level media features and high level semantic concept. We present a duration dependent input output Markov model (DDIOMM) to detect events based on multiple modalities. The DDIOMM combines the ability to model nonexponential duration densities with the mapping of input sequences to output sequences. In spirit it resembles the IOHMMs [1] as well as inhomogeneousHMMs [2]. We use the DDIOMM to model the audio-visual event explosion. We compare the detection performance of the DDIOMM with the IOMM as well as the HMM. Experiments reveal that modeling of duration improves detection performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.

自引率

0.00%

发文量