2007 IEEE 9th Workshop on Multimedia Signal Processing最新文献

筛选
英文 中文
Joint Analysis of the Emotional Fingerprint in the Face and Speech: A single subject study 面部和言语情感指纹的联合分析:单受试者研究
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-12-01 DOI: 10.1109/MMSP.2007.4412814
C. Busso, Shrikanth S. Narayanan
{"title":"Joint Analysis of the Emotional Fingerprint in the Face and Speech: A single subject study","authors":"C. Busso, Shrikanth S. Narayanan","doi":"10.1109/MMSP.2007.4412814","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412814","url":null,"abstract":"In daily human interaction, speech and gestures are used to express an intended message, enriched with verbal and non-verbal information. Although many communicative goals are simultaneously encoded using the same modalities such as the face or the voice, listeners are generally good at decoding each aspect of the message. This encoding process includes an underlying interplay between communicative goals and channels, which is yet not well understood. In this direction, this paper explores the interplay between linguistic and affective goals in speech and facial expression. We hypothesize that when one modality is constrained by the articulatory speech process, other channels with more degrees of freedom are used to convey the emotions. The results presented here support this hypothesis, since it is observed that facial expression and prosodic speech tend to have a stronger emotional modulation when the vocal tract is physically constrained by the articulation to convey other linguistic communicative goals.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130862081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Systematic comparison of BIC-based speaker segmentation systems 基于bic的说话人分割系统的系统比较
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-12-01 DOI: 10.1109/MMSP.2007.4412819
V. Moschou, M. Kotti, Emmanouil Benetos, Constantine Kotropoulos
{"title":"Systematic comparison of BIC-based speaker segmentation systems","authors":"V. Moschou, M. Kotti, Emmanouil Benetos, Constantine Kotropoulos","doi":"10.1109/MMSP.2007.4412819","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412819","url":null,"abstract":"Unsupervised speaker change detection is addressed in this paper. Three speaker segmentation systems are examined. The first system investigates the AudioSpectrumCentroid and the AudioWaveformEnvelope features, implements a dynamic fusion scheme, and applies the Bayesian Information Criterion (BIC). The second system consists of three modules. In the first module, a second-order statistic-measure is extracted; the Euclidean distance and the T2 Hotelling statistic are applied sequentially in the second module; and BIC is utilized in the third module. The third system, first uses a metric-based approach, in order to detect potential speaker change points, and then the BIC criterion is applied to validate the previously detected change points. Experiments are carried out on a dataset, which is created by concatenating speakers from the TIMIT database. A systematic performance comparison among the three systems is carried out by means of one-way ANOVA method and post hoc Tukey's method.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"514 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116086820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Unequal Growth Codes: Intermediate Performance and Unequal Error Protection for Video Streaming 不等增长码:视频流的中间性能和不等错误保护
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-12-01 DOI: 10.1109/MMSP.2007.4412829
A. Dimakis, Jiajun Wang, K. Ramchandran
{"title":"Unequal Growth Codes: Intermediate Performance and Unequal Error Protection for Video Streaming","authors":"A. Dimakis, Jiajun Wang, K. Ramchandran","doi":"10.1109/MMSP.2007.4412829","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412829","url":null,"abstract":"We investigate the design of fountain codes with good intermediate performance and built-in unequal error protection for low-delay video multicast. In particular, we design novel short-blocklength fountain codes for media streaming applications to multiple heterogeneous receivers and analyze their performance. Our theoretical contribution is the generalization of the growth code analysis for unequal error protection to suit the characteristics of video data. Simulation results show that the proposed method can effectively increase the number of decodable packets over a very wide range of packet drop rates and provide smooth and graceful video quality degradation for users with various channel conditions. The proposed scheme also enjoys the important benefits of much lower decoder complexity and simpler system architecture compared to traditional MDS erasure coding based solutions.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134560860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Dual-Mode Wideband Speech Compression 双模宽带语音压缩
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-12-01 DOI: 10.1109/MMSP.2007.4412816
Visar Berisha, A. Spanias
{"title":"Dual-Mode Wideband Speech Compression","authors":"Visar Berisha, A. Spanias","doi":"10.1109/MMSP.2007.4412816","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412816","url":null,"abstract":"Many bandwidth extension techniques attempt to predict the high-band frequencies based on features extracted from the lower band. Recent work suggests that such methods are limiting because the correlation between the low band and the high band is insufficient for adequate representation. As a result, additional high-band information must be sent to the decoder. In this paper, we propose a dual mode wideband speech coding algorithm based on the principles of bandwidth extension. The principal contributions include a mode selection algorithm based on greedy algorithm that maximizes the loudness criteria, and a bandwidth extension algorithm based on a constrained MMSE estimator. Results reveal that the proposed system improves the quality of narrowband speech while performing at a lower bit rate.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"472 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128939440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Grid-based Template Matching for People Counting 基于网格的人口计数模板匹配
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-12-01 DOI: 10.1109/MMSP.2007.4412881
J. Hsieh, Cheng-Shuang Peng, Kuo-Chin Fan
{"title":"Grid-based Template Matching for People Counting","authors":"J. Hsieh, Cheng-Shuang Peng, Kuo-Chin Fan","doi":"10.1109/MMSP.2007.4412881","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412881","url":null,"abstract":"This paper presents a novel template matching method to detect and track pedestrians for people counting in real-time. Firstly, a novel background subtraction method is proposed for extracting all foreground objects from background. Then, a shadow elimination method is used to remove unwanted shadow from the background. In order to identify pedestrians from non-pedestrian objects, this paper proposed a novel grid-based template matching scheme to robustly verify each pedestrian. Usually, a pedestrian will have different appearances at different positions. The grid-based approach can effectively reduce the perspective effects into a minimum since it uses different templates to record the appearance changes at each grid. When more templates are used, the detection process will become more inefficient. To speed up its efficiency, an integral image is used to filter out all impossible candidates in advance. Lastly, a tracking method is applied to tracking the direction of each moving pedestrian so that the real number of passing people per direction can be counted more accurately. Experimental results have proved that the proposed method is robust, accurate, and powerful in people counting.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126470480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Robust Image Watermarking Based on Local Zernike Moments 基于局部泽尼克矩的鲁棒图像水印
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-12-01 DOI: 10.1109/MMSP.2007.4412901
Nitin Singhal, Young-Yoon Lee, Chang-Su Kim, Sang Uk Lee
{"title":"Robust Image Watermarking Based on Local Zernike Moments","authors":"Nitin Singhal, Young-Yoon Lee, Chang-Su Kim, Sang Uk Lee","doi":"10.1109/MMSP.2007.4412901","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412901","url":null,"abstract":"Invariant image features can be used to carry watermarks so as to improve the robustness of the watermarks against geometric transformations. However, most previous watermarking algorithms using invariant features are still sensitive to cropping attacks and combinations of rotation, scaling, and translation (RST) attacks. To improve the resilience against these attacks, we propose a multi-bit image watermarking algorithm using local Zernike moments (LZMs). The magnitude of LZMs are dither-modulated to embed watermark bits. To achieve scale invariance, we restore the original sampling rate using invariant centroid and geometric moments. Simulation results demonstrate that the proposed watermarking algorithm is robust against various geometric attacks as well as signal processing attacks.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114871476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
New Directions in Image and Video Quality Assessment Plenary Talk 图像和视频质量评估的新方向
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-12-01 DOI: 10.1109/MMSP.2007.4412802
A. Bovik
{"title":"New Directions in Image and Video Quality Assessment Plenary Talk","authors":"A. Bovik","doi":"10.1109/MMSP.2007.4412802","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412802","url":null,"abstract":"Perceptual Image Processing is taking an increasingly important role in the field of multimedia processing. Designing algorithms to accord with visual perception is a natural idea, but has met with limited success owing to our imperfect knowledge of the intended receiver, and indeed, of the transmitter. The receiver in this context, of course, is the marvelous human eye-cortex system, while the transmitter is the environment, which casts images of extraordinary variability onto camera and retinal sensors.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122313364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multimodal Meeting Monitoring: Improvements on Speaker Tracking and Segmentation through a Modified Mixture Particle Filter 多模态会议监控:改进的混合粒子滤波对说话人跟踪和分割的影响
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-12-01 DOI: 10.1109/MMSP.2007.4412818
Viktor Rozgic, C. Busso, P. Georgiou, Shrikanth S. Narayanan
{"title":"Multimodal Meeting Monitoring: Improvements on Speaker Tracking and Segmentation through a Modified Mixture Particle Filter","authors":"Viktor Rozgic, C. Busso, P. Georgiou, Shrikanth S. Narayanan","doi":"10.1109/MMSP.2007.4412818","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412818","url":null,"abstract":"In this paper we address improvements to our multimodal system for tracking of meeting participants and speaker segmentation with a focus on the microphone array modality. We propose an algorithm that uses Directions-of-Arrival estimated for each microphone pair as observations and performs tracking of an unknown number of acoustically-active meeting participants and subsequent speaker segmentation. We propose modified mixture particle fillter (mMPF) for tracking of acoustic sources in the track-before-detection (TbD) framework. Trajectories of sound sources are reconstructed by the optimal assignment of posterior mixture components produced by mMPF in consecutive frames. Further, we propose a sequential optimal change-point detection algorithm which discovers speech segments in the reconstructed trajectories i.e., performs speaker segmentation. The algorithm is tested on a multi-participant meeting dataset both separately and as a part of the multimodal system. On the task of speaker detection in the multimodal setup we report significant improvement over our previous state of the art implementation.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130015153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Semantics Interpretation of Superimposed Captions in Sports Videos 体育录像中叠加字幕的语义解释
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412861
H. Shih, Chung-Lin Huang
{"title":"Semantics Interpretation of Superimposed Captions in Sports Videos","authors":"H. Shih, Chung-Lin Huang","doi":"10.1109/MMSP.2007.4412861","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412861","url":null,"abstract":"This paper proposes a semantics understanding system to interpret the superimposed caption box (SCB) in sports videos of which the template of SCB is presumably not a priori. The embedded captions in sports video programs represent digested key information of the video content. Most of the previous studies assume that the SCB template and the character bitmaps are known. Nevertheless, the representative character bitmaps are required for recognizing the captions. This paper has the following novelty: (1) SCB extraction and identification, (2) symbol extraction, and (3) semantic interpretation of the identified captions and symbols. Experimental results show that the algorithm performs the SCB contents understanding of several commercial sports video programs.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121088220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Relevant Feature Selection for Audio-Visual Speech Recognition 视听语音识别的相关特征选择
2007 IEEE 9th Workshop on Multimedia Signal Processing Pub Date : 2007-10-01 DOI: 10.1109/MMSP.2007.4412847
Thomas Drugman, Mihai Gurban, J. Thiran
{"title":"Relevant Feature Selection for Audio-Visual Speech Recognition","authors":"Thomas Drugman, Mihai Gurban, J. Thiran","doi":"10.1109/MMSP.2007.4412847","DOIUrl":"https://doi.org/10.1109/MMSP.2007.4412847","url":null,"abstract":"We present a feature selection method based on information theoretic measures, targeted at multimodal signal processing, showing how we can quantitatively assess the relevance of features from different modalities. We are able to find the features with the highest amount of information relevant for the recognition task, and at the same having minimal redundancy. Our application is audio-visual speech recognition, and in particular selecting relevant visual features. Experimental results show that our method outperforms other feature selection algorithms from the literature by improving recognition accuracy even with a significantly reduced number of features.","PeriodicalId":225295,"journal":{"name":"2007 IEEE 9th Workshop on Multimedia Signal Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121373162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信