{"title":"Melody extraction on MIDI music files","authors":"Giyasettin Ozcan, Cihan Isikhan, A. Alpkocak","doi":"10.1109/ISM.2005.77","DOIUrl":"https://doi.org/10.1109/ISM.2005.77","url":null,"abstract":"In this study, we propose a new approach to extract monophonic melody from MIDI files and provide a comparison of existing methods. Our approach is based on the elimination of MIDI channels those do not contain melodic information. First, MIDI channels are clustered depending on pitch histogram. Afterwards, a channel is selected from each cluster as representative and remaining channels and their notes are removed. Finally, skyline algorithm is applied on the modified MIDI set to ensure accuracy of monophonic melody. We evaluated our approach within a test bed of MIDI files, composed of variable music styles. Both our approach and the results from experiments are presented in detail.","PeriodicalId":322363,"journal":{"name":"Seventh IEEE International Symposium on Multimedia (ISM'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123082682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video broadcasting using overlay multicast","authors":"D. Milic, M. Brogle, T. Braun","doi":"10.1109/ISM.2005.119","DOIUrl":"https://doi.org/10.1109/ISM.2005.119","url":null,"abstract":"Despite the availability of high bandwidth Internet access for end-users, video broadcasting over the Internet is not widely spread. Multicast communication decreases the network load by eliminating redundancy of the data transfer. However IP multicast was never widely accepted by commercial Internet service providers (ISP). Existing solutions solving this problem, like MBONE tunneling, are not available for end-users accessing the Internet via xDSL or TV cable. Application layer multicast using peer-to-peer (P2P) overlay networks could solve the problem of sparse IP multicast support in the Internet. A limitation of this approach is the lack of standardized interfaces for existing IP multicast applications. We propose a solution, which bridges application layer multicast and IP multicast and uses a P2P (overlay) network to transport multicast data. Our solution - including a \"proof-of-concept\" prototype - enables video broadcasting over the Internet using existing IP multicast applications without requiring additional service deployment.","PeriodicalId":322363,"journal":{"name":"Seventh IEEE International Symposium on Multimedia (ISM'05)","volume":"1993 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128621137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the usefulness of object shape coding with MPEG-4","authors":"A. Prati, R. Cucchiara","doi":"10.1109/ISM.2005.87","DOIUrl":"https://doi.org/10.1109/ISM.2005.87","url":null,"abstract":"This paper reports the results of an in-depth analysis of the degree of usefulness of object shape coding in video compression. In particular, MPEG-4 is used as reference standard. The influence of different coding parameters on the performance is deeply examined and discussions on the results are provided. Object shape coding is compared with classical (MPEG-2) frame-based coding both at an objective level (by comparing PSNR/quality and bitrate/filesize) and at a subjective level (asking to a set of users to express their opinion on overall quality, cognitive effectiveness, and willingness to pay). In conclusion, this paper aims at answering to the question whether it is convenient to use object shape coding instead of frame-based coding or not.","PeriodicalId":322363,"journal":{"name":"Seventh IEEE International Symposium on Multimedia (ISM'05)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127227422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A pitch-based rapid speech segmentation for speaker indexing","authors":"Min Yang, Yingchun Yang, Zhaohui Wu","doi":"10.1109/ISM.2005.17","DOIUrl":"https://doi.org/10.1109/ISM.2005.17","url":null,"abstract":"Segmentation of continuous audio is an important processing in many applications. In speaker indexing, the reliability of speaker model depends much on segmentation. Commonly used methods are based on the Bayesian information criteria (BIC), which is however not so capable when dealing with short utterances. In this paper, we present a pitch-based speech segmentation method, which can detect frequent speaker changes accurately and rapidly. In our algorithm, pitch is introduced in speaker segmentation. Firstly, utterance segments are detected by pitch. Then distances of pitch are computed, and compared with a self-adaptable threshold. Speaker changes are finally decided among utterance segments. We applied our method and three comparative methods on the HUB4-NE broadcast data. Speaker indexing experiments have been taken following each algorithm. We also suggested two indicators as complements of false alarm and missing rate in the evaluation of segmentation. The experiment results show that our algorithm works faster and better, with most of short time speaker changes detected. Speaker indexing equal error rate of our method is 10.43%, which is much lower than 12.94%, 25.84% and 15.91% of other methods.","PeriodicalId":322363,"journal":{"name":"Seventh IEEE International Symposium on Multimedia (ISM'05)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126643370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Striping delay-sensitive packets over multiple burst-loss channels with random delays","authors":"Gene Cheung, P. Sharma, Sung-Ju Lee","doi":"10.1109/ISM.2005.110","DOIUrl":"https://doi.org/10.1109/ISM.2005.110","url":null,"abstract":"Multi-homed mobile devices have multiple wireless communication interfaces, each connecting to the Internet via a long range but low speed and bursty WAN link such as a cellular link. We propose a packet striping system for such multi-homed devices - a mapping of delay-sensitive packets by an intermediate gateway to multiple channels, such that the overall performance is enhanced. In particular, we model and analyze the striping of delay-sensitive packets over multiple burst-loss channels with random delays. We first derive the expected packet loss ratio when forward error correction (FEC) is applied for error protection over multiple channels. We next model and analyze the case when the channels are bandwidth-limited with shifted-gamma-distributed transmission delays. We develop a dynamic programming-based algorithm that solves the optimal striping problem for the ARQ, the FEC, and the hybrid FEC/ARQ case.","PeriodicalId":322363,"journal":{"name":"Seventh IEEE International Symposium on Multimedia (ISM'05)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114063204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating MPEG-21 BSDL descriptions using context-related attributes","authors":"D. D. Schrijver, W. D. Neve, K. D. Wolf, R. Walle","doi":"10.1109/ISM.2005.63","DOIUrl":"https://doi.org/10.1109/ISM.2005.63","url":null,"abstract":"In order to efficiently deal with the heterogeneity in the current and future multimedia ecosystem, it is necessary that content can be adapted in a format-agnostic manner. A first step toward a solution, able to fulfill the just mentioned requirement, is to rely on a scalable video codec and to describe the high-level structure of the resulting bitstreams in such a way that every terminal can understand it, in particular by using XML. This paper describes how such descriptions can be generated by making use of the media format independent BintoBSD tool of the MPEG-21 BSDL standard. However, regarding the current status of BSDL, it is impossible to create a description in real time and to keep the generation speed constant over the complete sequence. In this paper, we describe a number of extensions and algorithmic modifications that make it possible to generate a description of a bitstream in real time and at a constant speed. Our approach results in a significant reduction of the original execution times (up to 99% for the H.264/AVC coding format) and in a constant memory usage.","PeriodicalId":322363,"journal":{"name":"Seventh IEEE International Symposium on Multimedia (ISM'05)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115543032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Framework and network based multimedia object management environment","authors":"Chris Germano, Taehyung Wang, A. Onoma","doi":"10.1109/ISM.2005.61","DOIUrl":"https://doi.org/10.1109/ISM.2005.61","url":null,"abstract":"Because multimedia objects are becoming more prevalent, on ever increasing volume, inventing an efficient multimedia object management environment is a matter of increasing urgency. In recognition of this crisis, we developed the multimedia object management environment (MOME) that includes a suite of tools such as the Vortex framework, and network file indexer (NFI). MOME also includes a fully featured graphical user interface for maximum user control and flexibility. With meta data, it automatically generates indexes and paths for different types of multimedia objects and allows users to quickly find what they are looking for. In this paper, we address the background, architecture, and performance of MOME in detail.","PeriodicalId":322363,"journal":{"name":"Seventh IEEE International Symposium on Multimedia (ISM'05)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130054818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatically generating user interfaces for device federations","authors":"E. Braun, M. Mühlhäuser","doi":"10.1109/ISM.2005.38","DOIUrl":"https://doi.org/10.1109/ISM.2005.38","url":null,"abstract":"One of the ideas of ubiquitous computing is that computing resources should be embedded ubiquitously in the environment, making them available to any nearby users. Some researchers have applied this to interaction, and tried to embed an abundance of interactive devices, such as touch screens, in rooms and whole buildings. The opposite concept is that of a single personal mobile device, which users carry at all times and use for all interactions. Because both concepts have different strengths, we explore building interfaces for federations of personal mobile and stationary embedded devices, exploiting the capabilities of both rather than forcing users to choose between either. We have developed an infrastructure that coordinates multiple devices for that purpose: groups of devices work together to render a user interface. As one of the main challenges for such federated user interfaces we have identified their authoring. How should the interface be divided in multiple parts, and can that decision be made by a computer rather than a human designer?.","PeriodicalId":322363,"journal":{"name":"Seventh IEEE International Symposium on Multimedia (ISM'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127944753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast adaptive inter mode decision method in H.264 based on spatial correlation","authors":"Bin Feng, G.X. Zhu, Wen-yu Liu","doi":"10.1109/ISM.2005.58","DOIUrl":"https://doi.org/10.1109/ISM.2005.58","url":null,"abstract":"The computation complexity of H.264 is so large that make it difficult to be used in practical applications especially in real time environment. In this paper, a fast adaptive inter modes decision method is proposed to reduce the complexity of H.264 encoder. Firstly the candidate inter modes can be limited in a small mode group (MG) by using the characteristics of the motion compensated residual image. Then the most probable mode (MPM) of the MB is predicted on the basis of the modes of the neighboring macroblocks. The overlapped mode groups and dynamic adjusted thresholds adopted in the proposed method can make the best mode lies within the chosen MG with great possibility which leads to extensively computation reduction with acceptable loss in quality. The experimental results show that the proposed method can save the encoding time up to 64% on average with -0.45dB performance degradation.","PeriodicalId":322363,"journal":{"name":"Seventh IEEE International Symposium on Multimedia (ISM'05)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115951578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Web-based video editing system for sharing clips collected from multi-users","authors":"Satoshi Ichimura, Y. Matsushita","doi":"10.1109/ISM.2005.123","DOIUrl":"https://doi.org/10.1109/ISM.2005.123","url":null,"abstract":"Movies edited by amateurs are likely to be a sequence of monotonous scenes, where people record video from a single direction, and use video from a single camcorder when editing video. We developed VideoBlocks, a Web-based video editing system. In our system, video clips are collected from multi-users' camcorders via the Internet and shared among the users. The system allows users to easily use other users' video clips, each of which is spontaneously videotaped using different digital video camcorders. VideoBlocks provides the capability to extract the date and time information of the video-recording from the digital video cameras, and automatically synchronize them. Through evaluations of the system, we determined the effectiveness of the system.","PeriodicalId":322363,"journal":{"name":"Seventh IEEE International Symposium on Multimedia (ISM'05)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126737066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}