2013 IEEE International Conference on Acoustics, Speech and Signal Processing最新文献

筛选
英文 中文
A dynamic system model of time-varying subjective quality of video streams over HTTP 基于HTTP的视频流主观质量时变的动态系统模型
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-05-26 DOI: 10.1109/ICASSP.2013.6638329
Chao Chen, L. Choi, G. Veciana, C. Caramanis, R. Heath, A. Bovik
{"title":"A dynamic system model of time-varying subjective quality of video streams over HTTP","authors":"Chao Chen, L. Choi, G. Veciana, C. Caramanis, R. Heath, A. Bovik","doi":"10.1109/ICASSP.2013.6638329","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638329","url":null,"abstract":"Newly developed HTTP-based video streaming technology enables flexible rate-adaptation in varying channel conditions. The users' Quality of Experience (QoE) of rate-adaptive HTTP video streams, however, is not well understood. Therefore, designing QoE-optimized rate-adaptive video streaming algorithms remains a challenging task. An important aspect of understanding and modeling QoE is to be able to predict the up-to-the-moment subjective quality of video as it is played. We propose a dynamic system model to predict the time-varying subjective quality (TVSQ) of rate-adaptive videos that is transported over HTTP. For this purpose, we built a video database and measured TVSQ via a subjective study. A dynamic system model is developed using the database and the measured human data. We show that the proposed model can effectively predict the TVSQ of rate-adaptive videos in an online manner, which is necessary to be able to conduct QoE-optimized online rate-adaptation for HTTP-based video streaming.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116669331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
N-gram analysis for sleeping cell detection in LTE networks LTE网络中睡眠小区检测的n图分析
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-05-26 DOI: 10.1109/ICASSP.2013.6638499
Fedor Chernogorov, T. Ristaniemi, Kimmo Brigatti, Sergey Chernov
{"title":"N-gram analysis for sleeping cell detection in LTE networks","authors":"Fedor Chernogorov, T. Ristaniemi, Kimmo Brigatti, Sergey Chernov","doi":"10.1109/ICASSP.2013.6638499","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638499","url":null,"abstract":"Sleeping cell detection in a wireless network means to find the cells which are not working properly due to various reasons. The research in the area has mostly focused on cell outage detection, e.g. due to hardware failures at the base station antennas or non-optimal network planning. In this paper we extend the research into a more challenging setting which is overlooked in the literature: the case where no outages occur in the network. The essence of the proposed method for detection of problematic cells is to analyze the sequences of the events reported by the mobile terminals to the serving base stations. The suggested n-gram analysis includes dimensionality reduction and classification of the data and ends up with providing a set of abnormal users, which at the end reveal the location of the problematic cell. We verify the proposed framework with simulated LTE network data and using the minimization of drive testing (MDT) functionality to gather the training and testing data sets.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"8 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116852651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Unsupervised discovery of linguistic structure including two-level acoustic patterns using three cascaded stages of iterative optimization 语言结构的无监督发现,包括使用三个级联迭代优化阶段的两级声学模式
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-05-26 DOI: 10.1109/ICASSP.2013.6639239
Cheng-Tao Chung, Chun-an Chan, Lin-Shan Lee
{"title":"Unsupervised discovery of linguistic structure including two-level acoustic patterns using three cascaded stages of iterative optimization","authors":"Cheng-Tao Chung, Chun-an Chan, Lin-Shan Lee","doi":"10.1109/ICASSP.2013.6639239","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639239","url":null,"abstract":"Techniques for unsupervised discovery of acoustic patterns are getting increasingly attractive, because huge quantities of speech data are becoming available but manual annotations remain hard to acquire. In this paper, we propose an approach for unsupervised discovery of linguistic structure for the target spoken language given raw speech data. This linguistic structure includes two-level (subword-like and word-like) acoustic patterns, the lexicon of word-like patterns in terms of subword-like patterns and the N-gram language model based on word-like patterns. All patterns, models, and parameters can be automatically learned from the unlabelled speech corpus. This is achieved by an initialization step followed by three cascaded stages for acoustic, linguistic, and lexical iterative optimization. The lexicon of word-like patterns defines allowed consecutive sequence of HMMs for subword-like patterns. In each iteration, model training and decoding produces updated labels from which the lexicon and HMMs can be further updated. In this way, model parameters and decoded labels are respectively optimized in each iteration, and the knowledge about the linguistic structure is learned gradually layer after layer. The proposed approach was tested in preliminary experiments on a corpus of Mandarin broadcast news, including a task of spoken term detection with performance compared to a parallel test using models trained in a supervised way. Results show that the proposed system not only yields reasonable performance on its own, but is also complimentary to existing large vocabulary ASR systems.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116991492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
SSIM-based adaptive quantization in HEVC 基于ssim的HEVC自适应量化
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-05-26 DOI: 10.1109/ICASSP.2013.6637940
Chuohao Yeo, Hui Li Tan, Y. H. Tan
{"title":"SSIM-based adaptive quantization in HEVC","authors":"Chuohao Yeo, Hui Li Tan, Y. H. Tan","doi":"10.1109/ICASSP.2013.6637940","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6637940","url":null,"abstract":"HEVC is an emerging video coding standard that can achieve significant compression gains compared to H.264/AVC due to the inclusion of numerous new coding tools. In particular, it allows for a flexible quadtree based block partitioning of each coding tree unit (CTU) and an ability to switch quantization parameters (QP) on a sub-CTU level. In this paper, we present an approach for selecting quantization parameters for each block of pixels on the basis of optimizing the SSIM of the entire picture. Our simulation results show that when SSIM is the quality metric, the proposed approach is able to give average BD-Rate gains of 5.5% to 7.4% compared to using a constant QP per picture while having a negligible increase in encoding runtime. In addition, our proposed method also significantly outperforms the MPEG-2 TM5 adaptive quantization algorithm implemented in the HEVC reference software.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"7 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121005242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Prediction of creaky voice from contextual factors 从语境因素预测沙哑的声音
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-05-26 DOI: 10.1109/ICASSP.2013.6639216
Thomas Drugman, John Kane, T. Raitio, C. Gobl
{"title":"Prediction of creaky voice from contextual factors","authors":"Thomas Drugman, John Kane, T. Raitio, C. Gobl","doi":"10.1109/ICASSP.2013.6639216","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639216","url":null,"abstract":"Creaky voice, also referred to as vocal fry, is a voice quality frequently produced in many languages, in both read and conversational speech. In order to enhance the naturalness of speech synthesisers, these latter should be able to generate speech in all its expressive diversity. This includes a proper use of creaky voice. The goal of this paper is two-fold. Firstly we analyse how contextual factors can be informative for the prediction of creaky use. It is observed that a few contextual factors related to speech production preceding a silence or a pause are of particular interest. This study validates that creaky voice plays a crucial syntactic role, allowing for a better structuring of phrases. In a second experiment, we investigate the prediction of creakiness from contextual factors based on HMMs. Four methods are compared on a US English and a Finnish speaker. It is shown that the best prediction technique achieves a promising performance comparable to what is carried out with the creaky detection algorithm on which HMMs were trained.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121125057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
UCS-NT: An unbiased compressive sensing framework for Network Tomography UCS-NT:网络断层扫描的无偏压缩感知框架
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-05-26 DOI: 10.1109/ICASSP.2013.6638518
H. Mahyar, H. Rabiee, Z. S. Hashemifar
{"title":"UCS-NT: An unbiased compressive sensing framework for Network Tomography","authors":"H. Mahyar, H. Rabiee, Z. S. Hashemifar","doi":"10.1109/ICASSP.2013.6638518","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638518","url":null,"abstract":"This paper addresses the problem of recovering sparse link vectors with network topological constraints that is motivated by network inference and tomography applications. We propose a novel framework called UCS-NT in the context of compressive sensing for sparse recovery in networks. In order to efficiently recover sparse specification of link vectors, we construct a feasible measurement matrix using this framework through connected paths. It is theoretically shown that, only O(k log(n)) path measurements are sufficient for uniquely recovering any k-sparse link vector. Moreover, extensive simulations demonstrate that this framework would converge to an accurate solution for a wide class of networks.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127255556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Transient modeling for overlap-add sinusoidal model of speech 语音叠加正弦模型的瞬态建模
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-05-26 DOI: 10.1109/ICASSP.2013.6639261
Slava Shechtman
{"title":"Transient modeling for overlap-add sinusoidal model of speech","authors":"Slava Shechtman","doi":"10.1109/ICASSP.2013.6639261","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6639261","url":null,"abstract":"Speech sinusoidal modeling has been successfully applied to a broad range of speech analysis, synthesis and modification tasks. At most, it reproduces a high quality speech, however for speech transients (e.g. plosives, glottal stops) it suffers from reduced fidelity due to lack of intra-frame modeling of irregularities. Various extensions had been proposed for the stationary sinusoidal model to cope with this problem. One of simple and well-known in the art approaches is incorporating of an intra-frame magnitude envelope into the sinusoidal model. It used to be done by iterative analysis-by-synthesis procedure. In this paper we derive an optimal analytic solution for this problem. We will show that this solution yields significantly better model fit than the known-in-the-art analysis-by-synthesis approach.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127265419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Distributed multi-hypothesis coding of depth maps using texture motion information and optical flow 基于纹理运动信息和光流的深度图分布式多假设编码
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-05-26 DOI: 10.1109/ICASSP.2013.6637939
Matteo Salmistraro, M. Zamarin, L. L. Rakêt, Søren Forchhammer
{"title":"Distributed multi-hypothesis coding of depth maps using texture motion information and optical flow","authors":"Matteo Salmistraro, M. Zamarin, L. L. Rakêt, Søren Forchhammer","doi":"10.1109/ICASSP.2013.6637939","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6637939","url":null,"abstract":"Distributed Video Coding (DVC) is a video coding paradigm allowing a shift of complexity from the encoder to the decoder. Depth maps are images enabling the calculation of the distance of an object from the camera, which can be used in multiview coding in order to generate virtual views, but also in single view coding for motion detection or image segmentation. In this work, we address the problem of depth map video DVC encoding in a single-view scenario. We exploit the motion of the corresponding texture video which is highly correlated with the depth maps. In order to extract the motion information, a block-based and an optical flow-based methods are employed. Finally we fuse the proposed Side Informations using a multi-hypothesis DVC decoder, which allows us to exploit the strengths of all the proposed methods at the same time.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127290724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Weighted sum rate maximization for cognitive MISO broadcast channel: Large system analysis 认知MISO广播信道加权和速率最大化:大系统分析
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-05-26 DOI: 10.1109/ICASSP.2013.6638587
Y. He, S. Dey
{"title":"Weighted sum rate maximization for cognitive MISO broadcast channel: Large system analysis","authors":"Y. He, S. Dey","doi":"10.1109/ICASSP.2013.6638587","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638587","url":null,"abstract":"This paper considers the ergodic weighted sum rate (WSR) maximization problem for an underlay cognitive radio MISO broadcast channel, where a secondary network, consisting of a base-station with M transmit antennas and K single-antenna secondary users (SUs), is allowed to share the same spectrum with a primary user (PU), under an average transmit sum power (ATTP) constraint Pav and an average interference power (AIP) constraint on the PU. We show that the ATTP constraint is always active, and as Pav → ∞, the ergodic WSR approaches infinity similar to the conventional non-CR network case. A low-complexity suboptimal beamforming scheme (called partially-projected regularized zero-forcing beamforming `PP-RZFBF') with a closed-form beamformer is proposed. Due to the non-convexity of PP-RZFBF scheme, a large system analysis is conducted in the limit as M and K approach infinity with a fixed finite ratio r = K/M. We derive deterministic limiting approximations for the PP-RZFBF problem which enables us to determine asymptotically optimal beamformers for PP-RZFBF. Numerical simulations illustrate that the asymptotically optimal beamformers turn out to be quite effective even for small M, K.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124996662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Low-complexity and high-performance non-coherent cell identification detection schemes for OFDM-based systems 基于ofdm系统的低复杂度高性能非相干小区识别检测方案
2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-05-26 DOI: 10.1109/ICASSP.2013.6638596
Ying-Tsung Lin, Yi-Hsiang Wang, Sau-Gee Chen, Chih-Liang Chen
{"title":"Low-complexity and high-performance non-coherent cell identification detection schemes for OFDM-based systems","authors":"Ying-Tsung Lin, Yi-Hsiang Wang, Sau-Gee Chen, Chih-Liang Chen","doi":"10.1109/ICASSP.2013.6638596","DOIUrl":"https://doi.org/10.1109/ICASSP.2013.6638596","url":null,"abstract":"This work proposes two low-complexity and high-performance cell ID detection schemes for cellular communication systems. The first one, called real-correlation multiple differential detection (RMDD), derived from our previous work on cell ID detection called CERCD method, has much less complex multiplication operations while maintains the same performance. Although CERCD algorithm is more robust than existing cell detection methods in AWGN and multipath channel conditions, its performance still can be further improved. As such, the second scheme, called multiple differential detection (MDD), is proposed to improve CERCD method. Simulation results show that MDD has much better performance in frequency-selective channels. Performances and computational complexities of proposed schemes are also evaluated and analyzed under different channel environments to demonstrate their effectiveness.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125023239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信