{"title":"A Study of Perceptual Quality Assessment for Stereoscopic Image Retargeting","authors":"Zhenqi Fu, Yan Yang, F. Shao, Xinghao Ding","doi":"10.1109/APSIPAASC47483.2019.9023009","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023009","url":null,"abstract":"Subjective and objective perceptual quality assessment for stereoscopic retargeted images is a fundamentally important issue in stereoscopic image retargeting (SIR) which has not been deeply investigated. Here, a stereoscopic image retargeting quality assessment (SIRQA) database is proposed to study the perceptual quality of different stereoscopic retargeted images. To construct the database, we collect 720 stereoscopic retargeted images generated by eight representative SIR methods. The perceptual quality (mean opinion scores, MOS) of each stereoscopic retargeted image is subjectively rated by 30 viewers. For objective assessment, several publicly available quality evaluation metrics are tested on the database. Experimental results show that there is a large room for improving the accuracy of objective quality assessment in SIRQA by comprehensively considering geometric distortion, content loss and stereoscopic perceptual quality.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114763826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image Reconstruction from Local Descriptors Using Conditional Adversarial Networks","authors":"Haiwei Wu, Jiantao Zhou, Yuanman Li","doi":"10.1109/APSIPAASC47483.2019.9023323","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023323","url":null,"abstract":"Many applications rely on the local descriptors extracted around a collection of interest points. Recently, the security of local descriptors has been attracting increasing attention. In this paper, we study the possibility of image reconstruction from these descriptors, and propose a coarse-to-fine framework for the image reconstruction. By resorting to our gradually reconstructing network architecture, the novel multiscale feature map generation algorithm, and the strategically designed loss functions, our proposed algorithm can recover the images with very high perceptual quality, even partial descriptors are provided only. Extensive experimental results are reported to show its superiority over the existing algorithms. Our study implies that the local descriptors contain surprisingly rich information of the original image. Users should pay more attention to sensitive information leakage when using local descriptors.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128049424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image Haze Removal By Adaptive CycleGAN","authors":"Yi-Fan Chen, A. Patel, Chia-Ping Chen","doi":"10.1109/APSIPAASC47483.2019.9023296","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023296","url":null,"abstract":"We introduce our machine-learning method to remove the fog and haze in image. Our model is based on CycleGAN, an ingenious image-to-image translation model, which can be applied to de-hazing task. The datasets that we used for training and testing are creatd according to the atmospheric scattering model. With the change of the adversarial loss from cross-entropy loss to hinge loss, and the change of the reconstruction loss from MAE loss to perceptual loss, we improve the performance measure of SSIM value from 0.828 to 0.841 on the NYU dataset. With the Middlebury stereo datasets, we achieve an SSIM value of 0.811, which is significantly better than the baseline CycleGAN model.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128069814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Beam Steering of Portable Parametric Array Loudspeaker","authors":"Kyosuke Nakagawa, Chuang Shi, Y. Kajikawa","doi":"10.1109/APSIPAASC47483.2019.9023116","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023116","url":null,"abstract":"Portable devices such as smartphones and tablet PCs have become increasingly sophisticated and explosively spread. Opportunities for outdoor use have been consequently increasing. When the portable devices are used in public areas, personal audio system is required to avoid sound spread in the vicinity. We have already proposed the portable parametric array loudspeaker which can realize personal audio without using earphones and headphones. In this system, parametric array loudspeakers are mounted on two edges of tablet PCs and can radiate highly directional stereo sound to the user. However, the radiated sound beams may not focus on the user's ears when the user's head is moving. In this paper, we examine the phased array technique to steer the sound beam based on the user's head position. We demonstrate that the sound beam angle can be appropriately steered by using the phased array technique through experimental results.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125988099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Encrypted JPEG image retrieval using histograms of transformed coefficients","authors":"Peiya Li, Zhenhui Situ","doi":"10.1109/APSIPAASC47483.2019.9023179","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023179","url":null,"abstract":"This work proposes an encrypted JPEG image retrieval mechanism based on the histograms of transformed coefficients. With this scheme, JPEG image is encrypted during its compression process by using other orthogonal transforms for blocks' transformation, rather than 8×8 DCT. Then the encrypted images are transferred to and stored in the cloud server. When receiving an encrypted query image from the authorized user, the server calculates the histograms of transformed coefficients located at different frequency positions. By computing the distance between the histograms of encrypted query image and database cipherimages, encrypted images with plaintext content similar to the query image are retured to the authorized user for decryption. Experiments are conducted to show that our scheme can provide effective cipherimage retrieval service, while ensure format compliance and compression friendly.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115799268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Madhu R. Kamble, Aditya Krishna Sai Pulikonda, Maddala Venkata Siva Krishna, Ankur T. Patil, R. Acharya, H. Patil
{"title":"Speech Demodulation-based Techniques for Replay and Presentation Attack Detection","authors":"Madhu R. Kamble, Aditya Krishna Sai Pulikonda, Maddala Venkata Siva Krishna, Ankur T. Patil, R. Acharya, H. Patil","doi":"10.1109/APSIPAASC47483.2019.9023046","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023046","url":null,"abstract":"Spoofing is one of the threats that bypass the voice biometrics and gains the access to the system. In particular, Automatic Speaker Verification (ASV) system is vulnerable to various kinds of spoofing attacks. This paper is an extension of our earlier work, the combination of different speech demodulation techniques, such as Hilbert Transform (HT), Energy Separation Algorithm (ESA), and its Variable length version (VESA) is investigated for replay Spoof Speech Detection (SSD) task. In particular, the feature sets are developed using Instantaneous Amplitude and Instantaneous Frequency (IA-IF) components of narrowband filtered speech signals obtained from linearly-spaced Gabor filterbank. We observed relative effectiveness of these demodulation techniques on two spoof speech databases, i.e., BTAS 2016 and ASVspoof 2017 version 2.0 challenge database that focus on the presentation and replay attacks, respectively. The results obtained from different demodulation techniques gave comparable results on both databases showing small variations in % Equal Error Rate (EER). For VESA, we found that with Dependency Index (DI) = 2 gave relatively better performance compared to the other DI on both the databases for SSD task. All the demodulation technique-based feature sets gave lower % EER than their baseline system for both the databases.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132222022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Late Reverberation Power Spectral Density Aware Approach to Speech Dereverberation Based on Deep Neural Networks","authors":"Yuanlei Qi, Feiran Yang, Jun Yang","doi":"10.1109/APSIPAASC47483.2019.9023202","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023202","url":null,"abstract":"In recent years, a variety of speech dereverberation algorithms based on deep neural network (DNN) have been proposed. These algorithms usually adopt anechoic speech as their target output. Consequently, speech distortion might occur which impairs the speech intelligibility. As a matter of fact, early reflections can increase the strength of the direct-path sound and therefore have a positive impact on the speech intelligibility. In traditional speech dereverberation methods, early reflections are generally remained together with the direct-path sound. Based on these observations, we propose to adopt both direct-path sound and early reflections as the target DNN output in this paper. Moreover, we propose a late reverberation power spectral density (PSD) aware training strategy to further suppress the late reverberation. Experimental results demonstrate that the proposed DNN framework achieves significant improvement in objective measures even under mismatched conditions.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132529966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transfer Learning for Punctuation Prediction","authors":"Karan Makhija, Thi-Nga Ho, Chng Eng Siong","doi":"10.1109/APSIPAASC47483.2019.9023200","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023200","url":null,"abstract":"The output from most of the Automatic Speech Recognition system is a continuous sequence of words without proper punctuation. This decreases human readability and the performance of downstream natural language processing tasks on ASR text. We treat the punctuation prediction task as a sequence tagging task and propose an architecture that uses pre-trained BERT embeddings. Our model significantly improves the state of art on the IWSLT dataset. We achieve an overall F1 of 81.4% on the joint prediction of period, comma and question mark.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134352616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generic Video-Based Motion Capture Data Retrieval","authors":"Zifei Jiang, Zhen Li, Wei Li, Xue-qing Li, Jingliang Peng","doi":"10.1109/APSIPAASC47483.2019.9023336","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023336","url":null,"abstract":"In this work we propose a novel and generic scheme for retrieval of motion capture (MoCap) data given a video query. We reconstruct skeleton animations from video clips by a convolutional neural network for 3-dimensional human pose estimation to narrow the gap between videos and MoCap data. A statistical motion signature is computed to extract both morphological and kinematic characteristics from the skeleton animations and the MoCap sequences. This as well ensures that the proposed scheme works on MoCap data with arbitrary skeleton structures. The retrieval is achieved by computing and sorting the distances between the motion signature of the query and those of the MoCap sequences which are pre-computed and stored in the MoCap database. For experimental evaluation, we respectively record a video dataset and capture a MoCap dataset with different performers, and conduct video-based MoCap data retrieval on them. Experimental results demonstrate the effectiveness of the proposed scheme.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130754840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient quantization of vocoded speech parameters without degradation","authors":"M. Morise, Genta Miyashita","doi":"10.1109/APSIPAASC47483.2019.9023279","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023279","url":null,"abstract":"In a statistical parametric speech synthesis (SPSS) system with a vocoder, the dimensions of speech parameters need to be reduced, and many SPSS systems have used companded speech parameters. This paper introduces quantization algorithms for 3 speech parameters: fundamental frequency (fo), spectral envelope, and aperiodicity. In full-band speech (speech with a sampling frequency above 40 kHz), the dimensions of the spectral envelope and the aperiodicity can be reduced to 50 and 5 dimensions based on previous studies. This paper compares the quantization coding without degradation with speech synthesized by the speech parameters without coding. Efficient quantization would be effective for a study that uses graphics processing unit (GPU) computing because recent GPUs support 16-bit floating-point computing. We did two subjective evaluations. The first evaluation determined the appropriate quantization bits in each speech parameter. We obtained the 9 bit values in fo, 13 bit values in the spectral envelope, and 3 bit values in the aperiodicity. The second evaluation verified the effectiveness of our proposed coding. Since a multiple of eight is generally used for data chunks, we employed the 16 quantization bits for fo, 16 for the spectral envelope, and 8 for aperiodicity in the evaluation. The results showed that our proposed algorithm achieved almost all the same sound quality as the speech parameters without coding.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131108391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}