Chao Chen, L. Choi, G. Veciana, C. Caramanis, R. Heath, A. Bovik
{"title":"A dynamic system model of time-varying subjective quality of video streams over HTTP","authors":"Chao Chen, L. Choi, G. Veciana, C. Caramanis, R. Heath, A. Bovik","doi":"10.1109/ICASSP.2013.6638329","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638329","url":null,"abstract":"Newly developed HTTP-based video streaming technology enables flexible rate-adaptation in varying channel conditions. The users' Quality of Experience (QoE) of rate-adaptive HTTP video streams, however, is not well understood. Therefore, designing QoE-optimized rate-adaptive video streaming algorithms remains a challenging task. An important aspect of understanding and modeling QoE is to be able to predict the up-to-the-moment subjective quality of video as it is played. We propose a dynamic system model to predict the time-varying subjective quality (TVSQ) of rate-adaptive videos that is transported over HTTP. For this purpose, we built a video database and measured TVSQ via a subjective study. A dynamic system model is developed using the database and the measured human data. We show that the proposed model can effectively predict the TVSQ of rate-adaptive videos in an online manner, which is necessary to be able to conduct QoE-optimized online rate-adaptation for HTTP-based video streaming.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116669331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fedor Chernogorov, T. Ristaniemi, Kimmo Brigatti, Sergey Chernov
{"title":"N-gram analysis for sleeping cell detection in LTE networks","authors":"Fedor Chernogorov, T. Ristaniemi, Kimmo Brigatti, Sergey Chernov","doi":"10.1109/ICASSP.2013.6638499","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638499","url":null,"abstract":"Sleeping cell detection in a wireless network means to find the cells which are not working properly due to various reasons. The research in the area has mostly focused on cell outage detection, e.g. due to hardware failures at the base station antennas or non-optimal network planning. In this paper we extend the research into a more challenging setting which is overlooked in the literature: the case where no outages occur in the network. The essence of the proposed method for detection of problematic cells is to analyze the sequences of the events reported by the mobile terminals to the serving base stations. The suggested n-gram analysis includes dimensionality reduction and classification of the data and ends up with providing a set of abnormal users, which at the end reveal the location of the problematic cell. We verify the proposed framework with simulated LTE network data and using the minimization of drive testing (MDT) functionality to gather the training and testing data sets.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"8 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116852651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised discovery of linguistic structure including two-level acoustic patterns using three cascaded stages of iterative optimization","authors":"Cheng-Tao Chung, Chun-an Chan, Lin-Shan Lee","doi":"10.1109/ICASSP.2013.6639239","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639239","url":null,"abstract":"Techniques for unsupervised discovery of acoustic patterns are getting increasingly attractive, because huge quantities of speech data are becoming available but manual annotations remain hard to acquire. In this paper, we propose an approach for unsupervised discovery of linguistic structure for the target spoken language given raw speech data. This linguistic structure includes two-level (subword-like and word-like) acoustic patterns, the lexicon of word-like patterns in terms of subword-like patterns and the N-gram language model based on word-like patterns. All patterns, models, and parameters can be automatically learned from the unlabelled speech corpus. This is achieved by an initialization step followed by three cascaded stages for acoustic, linguistic, and lexical iterative optimization. The lexicon of word-like patterns defines allowed consecutive sequence of HMMs for subword-like patterns. In each iteration, model training and decoding produces updated labels from which the lexicon and HMMs can be further updated. In this way, model parameters and decoded labels are respectively optimized in each iteration, and the knowledge about the linguistic structure is learned gradually layer after layer. The proposed approach was tested in preliminary experiments on a corpus of Mandarin broadcast news, including a task of spoken term detection with performance compared to a parallel test using models trained in a supervised way. Results show that the proposed system not only yields reasonable performance on its own, but is also complimentary to existing large vocabulary ASR systems.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116991492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SSIM-based adaptive quantization in HEVC","authors":"Chuohao Yeo, Hui Li Tan, Y. H. Tan","doi":"10.1109/ICASSP.2013.6637940","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6637940","url":null,"abstract":"HEVC is an emerging video coding standard that can achieve significant compression gains compared to H.264/AVC due to the inclusion of numerous new coding tools. In particular, it allows for a flexible quadtree based block partitioning of each coding tree unit (CTU) and an ability to switch quantization parameters (QP) on a sub-CTU level. In this paper, we present an approach for selecting quantization parameters for each block of pixels on the basis of optimizing the SSIM of the entire picture. Our simulation results show that when SSIM is the quality metric, the proposed approach is able to give average BD-Rate gains of 5.5% to 7.4% compared to using a constant QP per picture while having a negligible increase in encoding runtime. In addition, our proposed method also significantly outperforms the MPEG-2 TM5 adaptive quantization algorithm implemented in the HEVC reference software.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"7 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121005242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction of creaky voice from contextual factors","authors":"Thomas Drugman, John Kane, T. Raitio, C. Gobl","doi":"10.1109/ICASSP.2013.6639216","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639216","url":null,"abstract":"Creaky voice, also referred to as vocal fry, is a voice quality frequently produced in many languages, in both read and conversational speech. In order to enhance the naturalness of speech synthesisers, these latter should be able to generate speech in all its expressive diversity. This includes a proper use of creaky voice. The goal of this paper is two-fold. Firstly we analyse how contextual factors can be informative for the prediction of creaky use. It is observed that a few contextual factors related to speech production preceding a silence or a pause are of particular interest. This study validates that creaky voice plays a crucial syntactic role, allowing for a better structuring of phrases. In a second experiment, we investigate the prediction of creakiness from contextual factors based on HMMs. Four methods are compared on a US English and a Finnish speaker. It is shown that the best prediction technique achieves a promising performance comparable to what is carried out with the creaky detection algorithm on which HMMs were trained.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121125057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UCS-NT: An unbiased compressive sensing framework for Network Tomography","authors":"H. Mahyar, H. Rabiee, Z. S. Hashemifar","doi":"10.1109/ICASSP.2013.6638518","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638518","url":null,"abstract":"This paper addresses the problem of recovering sparse link vectors with network topological constraints that is motivated by network inference and tomography applications. We propose a novel framework called UCS-NT in the context of compressive sensing for sparse recovery in networks. In order to efficiently recover sparse specification of link vectors, we construct a feasible measurement matrix using this framework through connected paths. It is theoretically shown that, only O(k log(n)) path measurements are sufficient for uniquely recovering any k-sparse link vector. Moreover, extensive simulations demonstrate that this framework would converge to an accurate solution for a wide class of networks.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127255556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transient modeling for overlap-add sinusoidal model of speech","authors":"Slava Shechtman","doi":"10.1109/ICASSP.2013.6639261","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639261","url":null,"abstract":"Speech sinusoidal modeling has been successfully applied to a broad range of speech analysis, synthesis and modification tasks. At most, it reproduces a high quality speech, however for speech transients (e.g. plosives, glottal stops) it suffers from reduced fidelity due to lack of intra-frame modeling of irregularities. Various extensions had been proposed for the stationary sinusoidal model to cope with this problem. One of simple and well-known in the art approaches is incorporating of an intra-frame magnitude envelope into the sinusoidal model. It used to be done by iterative analysis-by-synthesis procedure. In this paper we derive an optimal analytic solution for this problem. We will show that this solution yields significantly better model fit than the known-in-the-art analysis-by-synthesis approach.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127265419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matteo Salmistraro, M. Zamarin, L. L. Rakêt, Søren Forchhammer
{"title":"Distributed multi-hypothesis coding of depth maps using texture motion information and optical flow","authors":"Matteo Salmistraro, M. Zamarin, L. L. Rakêt, Søren Forchhammer","doi":"10.1109/ICASSP.2013.6637939","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6637939","url":null,"abstract":"Distributed Video Coding (DVC) is a video coding paradigm allowing a shift of complexity from the encoder to the decoder. Depth maps are images enabling the calculation of the distance of an object from the camera, which can be used in multiview coding in order to generate virtual views, but also in single view coding for motion detection or image segmentation. In this work, we address the problem of depth map video DVC encoding in a single-view scenario. We exploit the motion of the corresponding texture video which is highly correlated with the depth maps. In order to extract the motion information, a block-based and an optical flow-based methods are employed. Finally we fuse the proposed Side Informations using a multi-hypothesis DVC decoder, which allows us to exploit the strengths of all the proposed methods at the same time.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127290724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Weighted sum rate maximization for cognitive MISO broadcast channel: Large system analysis","authors":"Y. He, S. Dey","doi":"10.1109/ICASSP.2013.6638587","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638587","url":null,"abstract":"This paper considers the ergodic weighted sum rate (WSR) maximization problem for an underlay cognitive radio MISO broadcast channel, where a secondary network, consisting of a base-station with M transmit antennas and K single-antenna secondary users (SUs), is allowed to share the same spectrum with a primary user (PU), under an average transmit sum power (ATTP) constraint Pav and an average interference power (AIP) constraint on the PU. We show that the ATTP constraint is always active, and as Pav → ∞, the ergodic WSR approaches infinity similar to the conventional non-CR network case. A low-complexity suboptimal beamforming scheme (called partially-projected regularized zero-forcing beamforming `PP-RZFBF') with a closed-form beamformer is proposed. Due to the non-convexity of PP-RZFBF scheme, a large system analysis is conducted in the limit as M and K approach infinity with a fixed finite ratio r = K/M. We derive deterministic limiting approximations for the PP-RZFBF problem which enables us to determine asymptotically optimal beamformers for PP-RZFBF. Numerical simulations illustrate that the asymptotically optimal beamformers turn out to be quite effective even for small M, K.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124996662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-complexity and high-performance non-coherent cell identification detection schemes for OFDM-based systems","authors":"Ying-Tsung Lin, Yi-Hsiang Wang, Sau-Gee Chen, Chih-Liang Chen","doi":"10.1109/ICASSP.2013.6638596","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638596","url":null,"abstract":"This work proposes two low-complexity and high-performance cell ID detection schemes for cellular communication systems. The first one, called real-correlation multiple differential detection (RMDD), derived from our previous work on cell ID detection called CERCD method, has much less complex multiplication operations while maintains the same performance. Although CERCD algorithm is more robust than existing cell detection methods in AWGN and multipath channel conditions, its performance still can be further improved. As such, the second scheme, called multiple differential detection (MDD), is proposed to improve CERCD method. Simulation results show that MDD has much better performance in frequency-selective channels. Performances and computational complexities of proposed schemes are also evaluated and analyzed under different channel environments to demonstrate their effectiveness.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125023239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}