{"title":"Region-Based Image Retrieval with High-Level Semantic Color Names","authors":"Y. Liu, Dengsheng Zhang, Guojun Lu, Wei-Ying Ma","doi":"10.1109/MMMC.2005.62","DOIUrl":"https://doi.org/10.1109/MMMC.2005.62","url":null,"abstract":"Performance of traditional content-based image retrieval systems is far from user’s expectation due to the ‘semantic gap’ between low-level visual features and the richness of human semantics. In attempt to reduce the ‘semantic gap’, this paper introduces a region-based image retrieval system with high-level semantic color names. In this system, database images are segmented into color-texture homogeneous regions. For each region, we define a color name as that used in our daily life. In the retrieval process, images containing regions of same color name as that of the query are selected as candidates. These candidate images are further ranked based on their color and texture features. In this way, the system reduces the ‘semantic gap’ between numerical image features and the rich semantics in the user’s mind. Experimental results show that the proposed system provides promising retrieval results with few features used.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133259390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Image Matrix Compression for Face Recognition","authors":"Dong Xu, Shuicheng Yan, Lei Zhang, Mingjing Li, Wei-Ying Ma, Zhengkai Liu, HongJiang Zhang","doi":"10.1109/MMMC.2005.57","DOIUrl":"https://doi.org/10.1109/MMMC.2005.57","url":null,"abstract":"The canonical face recognition algorithm Eigenface and Fisherface are both based on one dimensional vector representation. However, with the high feature dimensions and the small training data, face recognition often suffers from the curse of dimension and the small sample problem. Recent research [4] shows that face recognition based on direct 2D matrix representation, i.e. 2DPCA, obtains better performance than that based on traditional vector representation. However, there are three questions left unresolved in the 2DPCA algorithm: I ) what is the meaning of the eigenvalue and eigenvector of the covariance matrix in 2DPCA; 2) why 2DPCA can outperform Eigenface; and 3) how to reduce the dimension after 2DPCA directly. In this paper, we analyze 2DPCA in a different view and proof that is 2DPCA actually a \"localized\" PCA with each row vector of an image as object. With this explanation, we discover the intrinsic reason that 2DPCA can outperform Eigenface is because fewer feature dimensions and more samples are used in 2DPCA when compared with Eigenface. To further reduce the dimension after 2DPCA, a two-stage strategy, namely parallel image matrix compression (PIMC), is proposed to compress the image matrix redundancy, which exists among row vectors and column vectors. The exhaustive experiment results demonstrate that PIMC is superior to 2DPCA and Eigenface, and PIMC+LDA outperforms 2DPC+LDA and Fisherface.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"7 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114010479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Color Time Petri Net for Interactive Adaptive Multimedia Objects","authors":"A. Gomaa, N. Adam, V. Atluri","doi":"10.1109/MMMC.2005.26","DOIUrl":"https://doi.org/10.1109/MMMC.2005.26","url":null,"abstract":"A composite multimedia object (cmo) is comprised of different media components such as text, video, audio and image, with a variety of constraints that must be adhered to. The constraints are 1) rendering relationships that comprise the temporal and spatial constraints between different components, 2) behavioral requirements that include the security and fidelity constraints on each component and, 3) user interactions on a set of related media components. Different users have different capabilities (e.g. age), characteristics (e.g. monitor size) and credentials (e.g. subscription to service). Our objective is to author an interactive adaptive cmo that renders itself correctly to different users. Therefore, it is important to guarantee the consistency of the cmo specifications in all possible scenarios. In this paper, we include the user interaction with temporal and spatio-temporal behavior in the specification of the adaptive cmo. We then check the consistency of user interaction specifications by transforming the specifications into a color time Petri net model. We perform a reachability analysis on the Petri net to identify inconsistencies. We then resolve the identified inconsistencies to have a consistent Petri net. A consistent Petri net presents an error-free interactive cmo that can adapt to different users, by guaranteeing that link user interactions are reachable for all eligible users.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115622448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tao Mei, Yu-Fei Ma, He-Qin Zhou, Wei-Ying Ma, HongJiang Zhang
{"title":"Sports Video Mining with Mosaic","authors":"Tao Mei, Yu-Fei Ma, He-Qin Zhou, Wei-Ying Ma, HongJiang Zhang","doi":"10.1109/MMMC.2005.68","DOIUrl":"https://doi.org/10.1109/MMMC.2005.68","url":null,"abstract":"Video is an information-intensive media with much redundancy. Therefore, it is desirable to be able to mine structure or semantics of video data for efficient browsing, summarization and highlight extraction. In this paper, we propose a generic approach to key-event as well as structure mining for sports video analysis. Mosaic is generated for each shot as the representative image of shot content. Based on mosaic, sports video is mined by the method with prior knowledge and without prior knowledge. Without prior knowledge, our system may locate plays by discriminating those segments without essential content, such as breaks. If prior knowledge is available, the key-events in plays are detected using robust features extracted from mosaic. Experimental results have demonstrated the effectiveness and robustness of this sports video mining approach.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126552851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image Mining and Retrieval Using Hierarchical Support Vector Machines","authors":"R. Brown, Binh Pham","doi":"10.1109/MMMC.2005.48","DOIUrl":"https://doi.org/10.1109/MMMC.2005.48","url":null,"abstract":"For some time now, image retrieval approaches have been developed that use low-level features, such as colour histograms, edge distributions and texture measures. What has been lacking in image retrieval approaches is the development of general methods for more structured object recognition. This paper describes in detail a general hierarchical image classifier approach, and illustrates the ease with which it can be trained to find objects in a scene. To further illustrate the wide capabilities of this approach, results from its application to particle picking in biology and Vietnamese art image retrieval are listed.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"598 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132657271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cyber Composer: Hand Gesture-Driven Intelligent Music Composition and Generation","authors":"H. Ip, K. Law, Belton Kwong","doi":"10.1109/MMMC.2005.32","DOIUrl":"https://doi.org/10.1109/MMMC.2005.32","url":null,"abstract":"Cyber Composer is a novel and interactive cyber instrument that enables both musicians and music laypersons to dynamically control the tonality and the melody of the music that they generate/compose through hand motion and gestures. Cyber Composer generates music according to hand motions and gestures of the users in the absence of real musical instruments. Music theories are embedded in the design so that melody flow and musical expressions like the pitch, rhythm and volume of the melody can be controlled and generated in real-time by wearing a pair of motion-sensing gloves. Also central to the design is the mapping of the hand motions and gestures to musical expressions that is intuitive and requires minimal training. Cyber Composer is expected to find applications in the fields of performance, composing, entertainment, education as well as psychotherapy.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127912481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing Tennis Tactics from Broadcasting Tennis Video Clips","authors":"Jenny R. Wang, N. Parameswaran","doi":"10.1109/MMMC.2005.20","DOIUrl":"https://doi.org/10.1109/MMMC.2005.20","url":null,"abstract":"This paper attempts to classify tennis games into 58 winning patterns for training purpose. It bases on tracking ball movement from broadcasting tennis video. Trajectory and landing position are used as the basic features for classification. We use an improved Bayesian networks to classify the landing position of different patterns. Intelligent agents are used to combine trajectories and landing positions as two features are in different dimensions. Semantic labels are granted after classification. The aim of the analysis is to provide a browsing tool for coachers or other personnel to retrieve tennis video clips.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126858636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Interactive Camera Planning System for Automatic Cinematographer","authors":"Tsai-Yen Li, X. Xiao","doi":"10.1109/MMMC.2005.19","DOIUrl":"https://doi.org/10.1109/MMMC.2005.19","url":null,"abstract":"Currently most systems capable of performing intelligent camera control use cinematographic idioms or a constraint satisfaction mechanism to determine a sequence of camera configurations for a given animation script. However, an automated cinematography system cannot be made practical without taking idiosyncrasy and the distinct role of each member in a filmmaking team into account. In this paper, we propose an interactive virtual cinematographer model imitating the key functions of a real filmmaking team consisting of three modules: director, photographer, and editor. The system uses parameterized cinematographic idioms in the three modules to determine the best camera configurations for an animation script. The system allows a user to interact with the virtual cinematographer to specify stylistic preferences, which can be carried over to other animation scripts.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126875362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khanh Vu, K. Hua, N. Hiransakolwong, Sirikunya Nilpanich
{"title":"Recognition of Enhanced Images","authors":"Khanh Vu, K. Hua, N. Hiransakolwong, Sirikunya Nilpanich","doi":"10.1109/MMMC.2005.61","DOIUrl":"https://doi.org/10.1109/MMMC.2005.61","url":null,"abstract":"Image enhancement such as adjusting brightness and contrast is central to improving human visualization of images’ content. Images in desired enhanced quality facilitate analysis, interpretation, classification, information exchange, indexing and retrieval. The adjustment process, guided by diverse enhancement objectives and subjective human judgment, often produces various versions of the same image. Despite the preservation of content under these operations, enhanced images are treated as new in most existing techniques via their widely different features. This leads to difficulties in recognition and retrieval of images across application domains and user interest. To allow unrestricted enhancement flexibility, accurate identification of images and their enhanced versions is therefore essential. In this paper, we introduce a measure that theoretically guarantees the identification of all enhanced images originated from one. In our approach, images are represented by points in multidimensional intensity-based space. We show that points representing images of the same content are confined in a well-defined area that can be identified by a so-devised formula. We evaluated our technique on large sets of images from various categories, including medical, satellite, texture, color images and scanned documents. The proposed measure yields an actual recognition rate approaching 100% in all image categories, outperforming other well-known techniques by a wide margin. Our analysis at the same time can serve as a basis for determining the minimum criterion a similarity measure should satisfy. We discuss also how to apply the formula as a similarity measure in existing systems to support general image retrieval.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"58 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126972877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video Snapshot: A Bird View of Video Sequence","authors":"Yu-Fei Ma, HongJiang Zhang","doi":"10.1109/MMMC.2005.71","DOIUrl":"https://doi.org/10.1109/MMMC.2005.71","url":null,"abstract":"Video is an information-intensive media with much redundancy. Therefore, it is desirable to be able to quickly browse video content or deliver videos in limited bandwidth, which can be achieved by effective video summarization. In this paper, we present a novel pictorial video summary, called Video Snapshot, which is a bird view of video enabling viewers to grasp main contents of video at a glance. Moreover, a comprehensive scoring scheme for content filtering called PRID (Pleasurable, Representative, Informative and Distinctive), and an optimized video visualization algorithm are also proposed. The encouraging results indicate that many potential digital video applications may be leveraged by Video Snapshot, such as video browsing, retrieval and delivery in heterogeneous computing and networking environments.","PeriodicalId":121228,"journal":{"name":"11th International Multimedia Modelling Conference","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124251098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}