{"title":"用于视听事件检测的时间依赖输入输出马尔可夫模型","authors":"M. Naphade, A. Garg, Thomas S. Huang","doi":"10.1109/ICME.2001.1237704","DOIUrl":null,"url":null,"abstract":"Detecting semantic events from audio-visual data with Spatiotemporal support is a challenging multimedia Understanding problem. The difficulty lies in the gap that exists between low level media features and high level semantic concept. We present a duration dependent input output Markov model (DDIOMM) to detect events based on multiple modalities. The DDIOMM combines the ability to model nonexponential duration densities with the mapping of input sequences to output sequences. In spirit it resembles the IOHMMs [1] as well as inhomogeneousHMMs [2]. We use the DDIOMM to model the audio-visual event explosion. We compare the detection performance of the DDIOMM with the IOMM as well as the HMM. Experiments reveal that modeling of duration improves detection performance.","PeriodicalId":405589,"journal":{"name":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Duration dependent input output markov models for audio-visual event detection\",\"authors\":\"M. Naphade, A. Garg, Thomas S. Huang\",\"doi\":\"10.1109/ICME.2001.1237704\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Detecting semantic events from audio-visual data with Spatiotemporal support is a challenging multimedia Understanding problem. The difficulty lies in the gap that exists between low level media features and high level semantic concept. We present a duration dependent input output Markov model (DDIOMM) to detect events based on multiple modalities. The DDIOMM combines the ability to model nonexponential duration densities with the mapping of input sequences to output sequences. In spirit it resembles the IOHMMs [1] as well as inhomogeneousHMMs [2]. We use the DDIOMM to model the audio-visual event explosion. We compare the detection performance of the DDIOMM with the IOMM as well as the HMM. Experiments reveal that modeling of duration improves detection performance.\",\"PeriodicalId\":405589,\"journal\":{\"name\":\"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME.2001.1237704\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Multimedia and Expo, 2001. ICME 2001.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2001.1237704","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Duration dependent input output markov models for audio-visual event detection
Detecting semantic events from audio-visual data with Spatiotemporal support is a challenging multimedia Understanding problem. The difficulty lies in the gap that exists between low level media features and high level semantic concept. We present a duration dependent input output Markov model (DDIOMM) to detect events based on multiple modalities. The DDIOMM combines the ability to model nonexponential duration densities with the mapping of input sequences to output sequences. In spirit it resembles the IOHMMs [1] as well as inhomogeneousHMMs [2]. We use the DDIOMM to model the audio-visual event explosion. We compare the detection performance of the DDIOMM with the IOMM as well as the HMM. Experiments reveal that modeling of duration improves detection performance.