Augmented transition networks as video browsing models for multimedia databases and multimedia information systems

Proceedings 11th International Conference on Tools with Artificial Intelligence Pub Date : 1999-11-08 DOI:10.1109/TAI.1999.809783

Shu‐Ching Chen, S. Sista, M. Shyu, R. Kashyap

{"title":"Augmented transition networks as video browsing models for multimedia databases and multimedia information systems","authors":"Shu‐Ching Chen, S. Sista, M. Shyu, R. Kashyap","doi":"10.1109/TAI.1999.809783","DOIUrl":null,"url":null,"abstract":"In an interactive multimedia information system, users should have the flexibility to browse and choose various scenarios they want to see. This means that two-way communications should be captured by the conceptual model. Digital video has gained increasing popularity in many multimedia applications. Instead of sequential access to the video contents, the structuring and modeling of video data so that users can quickly and easily browse and retrieve interesting materials has become an important issue in designing multimedia information systems. An abstract semantic model called the augmented transition network (ATN), which can model video data and user interactions, is proposed in this paper. An ATN and its subnetworks can model video data based on different granularities, such as scenes, shots and key frames. Multimedia input strings are used as inputs for ATNs. The details of how to use multimedia input strings to model video data are also discussed. Key frame selection is based on the temporal and spatial relations of semantic objects in each shot. These relations are captured from our proposed unsupervised video segmentation method, which considers the problem of partitioning each frame as a joint estimation of the partition and class parameter variables. Unlike existing semantic models, which only model multimedia presentation, multimedia database searching or browsing, ATNs together with multimedia input strings can model these three in one framework.","PeriodicalId":194023,"journal":{"name":"Proceedings 11th International Conference on Tools with Artificial Intelligence","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"45","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 11th International Conference on Tools with Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TAI.1999.809783","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 45

Abstract

In an interactive multimedia information system, users should have the flexibility to browse and choose various scenarios they want to see. This means that two-way communications should be captured by the conceptual model. Digital video has gained increasing popularity in many multimedia applications. Instead of sequential access to the video contents, the structuring and modeling of video data so that users can quickly and easily browse and retrieve interesting materials has become an important issue in designing multimedia information systems. An abstract semantic model called the augmented transition network (ATN), which can model video data and user interactions, is proposed in this paper. An ATN and its subnetworks can model video data based on different granularities, such as scenes, shots and key frames. Multimedia input strings are used as inputs for ATNs. The details of how to use multimedia input strings to model video data are also discussed. Key frame selection is based on the temporal and spatial relations of semantic objects in each shot. These relations are captured from our proposed unsupervised video segmentation method, which considers the problem of partitioning each frame as a joint estimation of the partition and class parameter variables. Unlike existing semantic models, which only model multimedia presentation, multimedia database searching or browsing, ATNs together with multimedia input strings can model these three in one framework.

查看原文本刊更多论文

增强转换网络作为多媒体数据库和多媒体信息系统的视频浏览模型

在交互式多媒体信息系统中，用户应该能够灵活地浏览和选择他们想要看到的各种场景。这意味着双向通信应该由概念模型捕获。数字视频在许多多媒体应用中越来越受欢迎。代替对视频内容的顺序访问，对视频数据进行结构化和建模，使用户能够快速方便地浏览和检索感兴趣的材料，已成为设计多媒体信息系统的一个重要问题。本文提出了一种抽象的语义模型——增强转换网络(ATN)，它可以对视频数据和用户交互进行建模。ATN及其子网可以根据不同的粒度(如场景、镜头和关键帧)对视频数据进行建模。多媒体输入字符串用作atn的输入。还讨论了如何使用多媒体输入字符串对视频数据建模的细节。关键帧的选择是基于每个镜头中语义对象的时空关系。这些关系是从我们提出的无监督视频分割方法中捕获的，该方法将每帧的分割问题视为分割和类参数变量的联合估计。现有的语义模型只能对多媒体表示、多媒体数据库搜索或浏览进行建模，而atn结合多媒体输入字符串可以在一个框架中对这三种模型进行建模。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 11th International Conference on Tools with Artificial Intelligence

自引率

0.00%

发文量