2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

筛选
英文 中文
A Prosodic Mandarin Text-to-Speech System Based on Tacotron 基于Tacotron的汉语韵律文转语音系统
Chuxiong Zhang, S. Zhang, Haibin Zhong
{"title":"A Prosodic Mandarin Text-to-Speech System Based on Tacotron","authors":"Chuxiong Zhang, S. Zhang, Haibin Zhong","doi":"10.1109/APSIPAASC47483.2019.9023283","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023283","url":null,"abstract":"The Tacotron performs well in English speech synthesis and successfully aligns two arbitrary sequences from different domain in an automatic way. However, to introduce Tacotron into Mandarin Chinese Text-to-Speech (TTS), a prosody system is needed for generating more natural speech. This paper proposes a practical method to involve the prosodic annotation into Tacotron training for Mandarin Chinese synthesis system. A prosody model predicting the prosodic boundaries from the given text serves as the front-end system in our approach, followed by a Tacotron synthesis system trained with well-labeled TTS database containing the prosodic annotations. Under subjective evaluation in terms of the prosody, results show that the synthesis system performs better by adding the prosodic system as the front-end system for Tacotron.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123881271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Infrared Pedestrian Detection with Converted Temperature Map 红外行人检测转换温度图
Yifan Zhao, Jingchun Cheng, Wei Zhou, Chunxi Zhang, Xiong Pan
{"title":"Infrared Pedestrian Detection with Converted Temperature Map","authors":"Yifan Zhao, Jingchun Cheng, Wei Zhou, Chunxi Zhang, Xiong Pan","doi":"10.1109/APSIPAASC47483.2019.9023228","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023228","url":null,"abstract":"Infrared pedestrian detection aims to detect persons in outdoor thermal images. It shows a unique advantage in dark environment or bad weather compared to daytime visible images (the RGB image). Most current methods treat infrared detection the same way as with visible images, e.g. regarding the infrared image as a special gray-scale visible image. In this paper, we tackle this problem with more emphasis on the underlying temperature information in infrared images. We build an image-temperature transformation formula based upon infrared image formation theory, which can convert infrared image into temperature map with the prior of pedestrian pixel-temperature value. The whole detection process follows a two-stage manner. In the first stage, we use a common detector which treats the infrared image as the gray-scale visible image to provide primary detection results and a pedestrian position prior (the highest-confidence pedestrian detection box in each image). In the second stage, we convert infrared images into corresponding temperature maps and train a temperature net for detection. The final results consist of both the primary detection and the temperature net outputs, detecting pedestrians with characteristics in both image and temperature domain. We show that the converted temperature image is less affected by environmental factors, and that its detector shows amazing complementary ability with the primary detector. We carry out extensive experiments and analysis on two public infrared datasets, the OTCBVS dataset and the FLIR dataset; and demonstrate the effectiveness of incorporating temperature maps.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"40 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114042196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A Study on Low-resource Language Identification 低资源语言识别研究
Zhaodi Qi, Yong Ma, M. Gu
{"title":"A Study on Low-resource Language Identification","authors":"Zhaodi Qi, Yong Ma, M. Gu","doi":"10.1109/APSIPAASC47483.2019.9023075","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023075","url":null,"abstract":"Modern language identification (LID) systems require a large amount of data to train language-discriminative models, either statistical (e.g., i-vector) or neural (e.g., x-vector). Unfortunately, most of languages in the world have very limited accumulation of data resources, which result in limited performance on most languages. In this study, two approaches are investigated to deal with the LID task on low-resource languages. The first approach is data augmentation, which enlarges the data set by incorporating various distortions into the original data; and the second approach is multi-lingual bottleneck feature extraction, which extracts multiple sets of bottleneck features (BNF) based on speech recognition systems of multiple languages. Experiments conducted on both the i-vector and x-vector models demonstrated that the two approach are effective, and can obtain promising results on both in-domain data and out-of-domain data.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125278733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Audio Integrated Active Noise Control System with Auto Gain Controller 带有自动增益控制器的音频集成有源噪声控制系统
Kenta Iwai, T. Nishiura
{"title":"Audio Integrated Active Noise Control System with Auto Gain Controller","authors":"Kenta Iwai, T. Nishiura","doi":"10.1109/APSIPAASC47483.2019.9023148","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023148","url":null,"abstract":"This paper proposes an audio integrated active noise control (ANC) system with an auto gain controller. An ANC system is one of the techniques for reducing unwanted noise and used to reduce the factory noise, engine noise, and so forth. In general, the ANC system cannot completely reduce the unwanted noise due to its principle. To solve this problem, an audio integrated ANC (AIANC) system has been proposed. The AIANC system uses the additional audio signal to mask the residual noise called error signal. Also, the AIANC system can be used for telecommunication under noisy environment, in which the voice is treated as the audio signal of the AIANC system. However, in the conventional AIANC system, the power of the audio signal cannot be adjusted to that of the error signal and it causes that the audio signal is too larger or smaller than the error signal. To solve this problem, the AIANC with an auto gain controller is proposed. The proposed AIANC system has the auto gain controller to adjust the power of the audio signal and that of the error signal. Then, the audio signal is emitted with the same power as the error signal. Simulation results shows that the proposed AIANC system can reduce the unwanted noise and adjust the power of the audio signal to that of the error signal.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"220 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122706160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Loss With Mixed Penalty for Speech Enhancement Generative Adversarial Network 语音增强生成对抗网络的混合惩罚损失
Jie Cao, Yaofeng Zhou, Hong Yu, Xiaoxu Li, Dan Wang, Zhanyu Ma
{"title":"A Loss With Mixed Penalty for Speech Enhancement Generative Adversarial Network","authors":"Jie Cao, Yaofeng Zhou, Hong Yu, Xiaoxu Li, Dan Wang, Zhanyu Ma","doi":"10.1109/APSIPAASC47483.2019.9023273","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023273","url":null,"abstract":"Speech enhancement based on generative adversarial networks (GANs) can overcome the problems of many classical speech enhancement methods, such as relying on the first-order statistics of signals and ignoring the phase mismatch between the noisy and the clean signals. However, GANs are hard to train and have the vanishing gradients problem which may lead to generate poor samples. In this paper, we propose a relativistic average least squares loss function with a mixed penalty term for speech enhancement generative adversarial network. The mixed penalty term can minimize the distance between generated and clean samples more effectively. Experimental results on Valentini 2016 and Valentini 2017 dataset show that the proposed loss can make the training of GAN more stable, and achieves good performance in both objective and subjective evaluation.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131763893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiple-Operation Image Anti-Forensics with WGAN-GP Framework 基于WGAN-GP框架的多操作图像反取证
Jianyuan Wu, Zheng Wang, Hui Zeng, Xiangui Kang
{"title":"Multiple-Operation Image Anti-Forensics with WGAN-GP Framework","authors":"Jianyuan Wu, Zheng Wang, Hui Zeng, Xiangui Kang","doi":"10.1109/APSIPAASC47483.2019.9023173","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023173","url":null,"abstract":"A challenging task in the field of multimedia security involves concealing or eliminating the traces left by a chain of multiple manipulating operations, i.e., multiple-operation anti-forensics in short. However, the existing anti-forensic works concentrate on one specific manipulation, referred as single-operation anti-forensics. In this work, we propose using the improved Wasserstein generative adversarial networks with gradient penalty (WGAN-GP) to model image anti-forensics as an image-to-image translation problem and obtain the optimized anti-forensic models of multiple-operation. The experimental results demonstrate that our multiple-operation anti-forensic scheme successfully deceives the state-of-the-art forensic algorithms without significantly degrading the quality of the image, and even enhancing quality in most cases. To our best knowledge, this is the first attempt to explore the problem of multiple-operation anti-forensics.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115148903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Lightweight models for weather identification 用于天气识别的轻型模型
Congcong Wang, Pengyu Liu, Ke-bin Jia, Siwei Chen
{"title":"Lightweight models for weather identification","authors":"Congcong Wang, Pengyu Liu, Ke-bin Jia, Siwei Chen","doi":"10.1109/APSIPAASC47483.2019.9023242","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023242","url":null,"abstract":"At present, the recognition of weather phenomena mainly depends on the weather sensors and the weather radar. However, large-scale deployment of meteorological observation equipment for intensive weather monitoring is difficult because it is expensive and difficult to maintain. Moreover, convolutional neural networks (CNNs) can also be used to identify weather phenomena, but existing methods require high computing power of equipment, making it difficult to deploy in practice. Therefore, designing a lightweight model that can be deployed in a small device with weak computing power is crucial for intensive weather monitoring. In this paper, we study the shortcomings of some existing lightweight models. By comparing the disadvantages of these models, a new lightweight model is proposed. In addition, considering the number of existing weather datasets are too small to meet real monitoring needs, so we produced a dataset with a more complex variety of weather phenomena. Through the experiments, the proposed method can save more than 25 times memory usage with only 1.55% accuracy lost compared with the best CNNs method which achieves state-of-the-art performance among the other lightweight models.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121853485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Speech representation based on tensor factor analysis and its application to speaker recognition and language identification 基于张量因子分析的语音表示及其在说话人识别和语言识别中的应用
D. Saito, So Suzuki, N. Minematsu
{"title":"Speech representation based on tensor factor analysis and its application to speaker recognition and language identification","authors":"D. Saito, So Suzuki, N. Minematsu","doi":"10.1109/APSIPAASC47483.2019.9023128","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023128","url":null,"abstract":"Ahstract-This paper proposes a novel approach to speech representation for both speaker recognition and language identification by characterizing the entire feature space by a tensor. In conventional studies of both tasks, i-vector is commonly used as the state-of-the-art representation. Here, i-vector extraction can be regarded as projection of utterance-based GMM supervector onto a low-dimensional space. In this paper, for the aim of explicit modeling of the correlation among mean vectors of a GMM, an utterance is not modeled as its GMM-based supervector but as its matrix and the entire set of utterances is modeled as its tensor. By applying tensor factor analysis, we obtain a new representation for an input utterance. Experimental evaluations for speaker recognition and language identification show that our proposed approach has effectiveness especially for the speaker recognition task.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127569594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving the Spectra Recovering of Bone-Conducted Speech via Structural SIMilarity Loss Function 利用结构相似度损失函数改进骨传导语音的频谱恢复
Changyan Zheng, Jibin Yang, Xiongwei Zhang, Meng Sun, Kun Yao
{"title":"Improving the Spectra Recovering of Bone-Conducted Speech via Structural SIMilarity Loss Function","authors":"Changyan Zheng, Jibin Yang, Xiongwei Zhang, Meng Sun, Kun Yao","doi":"10.1109/APSIPAASC47483.2019.9023226","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023226","url":null,"abstract":"Bone-conducted (BC) speech is immune to background noise, but suffers from low speech quality due to the severe loss of high-frequency components. The key to BC speech enhancement is to restore the missing parts in the spectra. However, even with advanced deep neural networks (DNN), some of the recovered components still lack expected spectro-temproal structures. Mean Square Error loss function (MSE) is the typical choice for supervised DNN training, but it can only measure the distance of the spectro-temporal points and is not able to evaluate the similarity of structures. In this paper, Structural SIMilarity loss function (SSIM) originated from image quality assessment is proposed to train the spectral mapping model in BC speech enhancement, and to our best knowledge, it is the first time that SSIM is deployed in DNN- based speech signal processing tasks. Experimental results show that compared with MSE, SSIM can acquire better objective results and obtain spectra with spectro-temporal structures more similar to the target one. Some adjustments of hyper-parameters in SSIM are made due to the difference between natural image and magnitude spectrogram, and the optimal choice of them are suggested. In addition, the effects of three components in SSIM are analyzed individually, aiming to help further study on the applications of this loss function in other speech signal processing tasks.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"34 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132680010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Question Mark Prediction By Bert 伯特的问号预测
Yunqi Cai, Dong Wang
{"title":"Question Mark Prediction By Bert","authors":"Yunqi Cai, Dong Wang","doi":"10.1109/APSIPAASC47483.2019.9023090","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023090","url":null,"abstract":"Punctuation resotration is important for Automatic Speech Recognition and the down-stream applications, e.g., speech translation. Despite the continuous progress on punctuation restoration, discriminating question marks and periods remains very hard. This difficulty can be largely attributed to the fact that interrogatives and narrative sentences are mostly characterized and distinguished by long-distance syntactic and semantic dependencies, which are cannot well modeled by existing models (e.g., RNN or n-gram). In this paper we propose to solve this problem by the self-attention mechanism of the Bert model. Our experiments demonstrated that compared the best baseline, the new approach improved the F1 score of question mark prediction from 30% to 90%.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133962540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信