2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

筛选
英文 中文
Visual Saliency Detection Algorithm in Compressed HEVC Domain 压缩HEVC域的视觉显著性检测算法
Rui Bai, Wei Zhou, Guanwen Zhang, Henglu Wei
{"title":"Visual Saliency Detection Algorithm in Compressed HEVC Domain","authors":"Rui Bai, Wei Zhou, Guanwen Zhang, Henglu Wei","doi":"10.23919/APSIPA.2018.8659565","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659565","url":null,"abstract":"Saliency detection has been widely used to predict human fixation. In this paper, a Visual Saliency Detection Algorithm in Compressed HEVC Domain is proposed which consists of three parts: static saliency detection, dynamic saliency detection and competitive fusion. Firstly, the Gauss model is used to filter out the background of the static features which are extracted by down-sampling and DCT. Secondly, the motion vectors are used to represent the dynamic feature. Then the dynamic saliency is calculated by filtering out the background of dynamic feature. Finally, the competitive fusion model is used to adaptively combine the characteristic of static and dynamic saliency maps. Experimental results show that the proposed method is superior to classic state-of-the-art saliency detection methods with 0.05 AUC value increasing and 0.17 KL divergence decreasing on average. The average time of one frame detection is 2.3 seconds.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114300222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Active Speech Obscuration with Speaker-dependent Human Speech-like Noise for Speech Privacy 基于说话人依赖的类人语音噪声的主动语音模糊
Yoshitaka Ohshio, Haruka Adachi, Kenta Iwai, T. Nishiura, Y. Yamashita
{"title":"Active Speech Obscuration with Speaker-dependent Human Speech-like Noise for Speech Privacy","authors":"Yoshitaka Ohshio, Haruka Adachi, Kenta Iwai, T. Nishiura, Y. Yamashita","doi":"10.23919/APSIPA.2018.8659754","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659754","url":null,"abstract":"This paper introduces a new active speech obscuration with speaker-dependent human speech-like noise (HSLN) for speech privacy. Recently, speech privacy is regarded as an important issue in open public spaces such as hospitals, pharmacies, banks, and so on. To protect speech privacy, speech obscuration methods utilizing HSLN have been studied. HSLNs are designed by superposing various speech signals and speech obscuration is achieved by hearing the target speech and HSLN at the same time. Conventionally, HSLN is designed with the pitch of the target speech as the sole speaker-dependent characteristic. However, additional speaker-dependent characteristics are required because the performance of speech obscuration is still insufficient. Therefore, we propose a speaker-dependent HSLN design method for effective speech obscuration that uses the third formant frequency of the target speech in addition to pitch as speaker-dependent characteristics. The third formant frequency is related to voice quality, which depends on the shape and length of the vocal tract. It follows that the proposed method can effectively mask the target speech by the HSLN considering the pitch and third formant frequency, which are analyzed from the speech. Experimental results demonstrate the effectiveness of the proposed method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116675977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Wavelet Scattering Transform for Variability Reduction in Cortical Potentials Evoked by Pitch Matched Electro-acoustic Stimulation in Unilateral Cochlear Implant Patients 小波散射变换对单侧人工耳蜗基音匹配电声刺激诱发皮层电位变异性的影响
M. Heydarzadeh, Sara Akbarzadeh, Chin-Tuan Tan
{"title":"Wavelet Scattering Transform for Variability Reduction in Cortical Potentials Evoked by Pitch Matched Electro-acoustic Stimulation in Unilateral Cochlear Implant Patients","authors":"M. Heydarzadeh, Sara Akbarzadeh, Chin-Tuan Tan","doi":"10.23919/APSIPA.2018.8659659","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659659","url":null,"abstract":"Cochlear implant (CI) restores the hearing sensation in profoundly deafen patients by directly stimulating auditory nerve with electric pulses using an array of tonotopically inserted electrodes. Basal electrodes stimulate in response to high input frequencies while apical electrodes stimulate to low input frequencies. The problem with this electrical stimulation, particularly in unilaterally implanted users who has residual hearing in the contra-lateral ear, lies in the frequency mismatch between characteristic frequency of auditory nerve and input signal. In this paper, we revisit our previously proposed mechanism for tuning intra-cochlear electrode to its pitch matched frequency using a single channel EEG [1]. We apply the wavelet scattering transform to extract a deformation invariant from the EEG signal recorded from each of 10 CI subjects when they were listening to pitch matched electro-acoustic stimulation. Results show that the wavelet scattering transform is able to capture the variability introduced by different subjects, and a more robust alternative to reveal the underlying neuro-physiological responses to this perceptual event.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115734372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reward Only Training of Encoder-Decoder Digit Recognition Systems Based on Policy Gradient Methods 基于策略梯度方法的编码器-解码器数字识别系统的奖励训练
Yilong Peng, Hayato Shibata, T. Shinozaki
{"title":"Reward Only Training of Encoder-Decoder Digit Recognition Systems Based on Policy Gradient Methods","authors":"Yilong Peng, Hayato Shibata, T. Shinozaki","doi":"10.23919/APSIPA.2018.8659527","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659527","url":null,"abstract":"Zero resource speech recognition is getting attention for engineering as well as scientific purposes. Based on the existing unsupervised learning frameworks using only speech input, however, it is impossible to associate automatically found linguistic units with spellings and concepts. In this paper, we propose an approach that uses a scalar reward that is assumed to be given for each decoding result of an utterance. While the approach is straightforward using reinforcement learning, the difficulty is to obtain a convergence without the help of supervised learning. Focusing on encoder-decoder based speech recognition, we explore several neural network architectures, optimization methods, and reward definitions, seeking a suitable configuration for policy gradient reinforcement learning. Experiments were performed using connected digit utterances from the TIDIGITS corpus as training and evaluation sets. While it is challenging, we show that learning a connected digit recognition system is possible achieving 13.6% of digit error rate. The success largely depends on the configurations and we reveal the appropriate condition that is largely different from supervised training.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116233046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Alternating Binary Classifier and Graph Learning from Partial Labels 交替二元分类器与部分标签图学习
Cheng Yang, Gene Cheung, V. Stanković
{"title":"Alternating Binary Classifier and Graph Learning from Partial Labels","authors":"Cheng Yang, Gene Cheung, V. Stanković","doi":"10.23919/APSIPA.2018.8659448","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659448","url":null,"abstract":"Semi-supervised binary classifier learning is a fundamental machine learning task where only partial binary labels are observed, and labels of the remaining data need to be interpolated. Leveraging on the advances of graph signal processing (GSP), recently binary classifier learning is posed as a signal restoration problem regularized using a graph smoothness prior, where the undirected graph consists of a set of vertices and a set of weighted edges connecting vertices with similar features. In this paper, we improve the performance of such a graph-based classifier by simultaneously optimizing the feature weights used in the construction of the similarity graph. Specifically, we start by interpolating missing labels by first formulating a boolean quadratic program with a graph signal smoothness objective, then relax it to a convex semi-definite program, solvable in polynomial time. Next, we optimize the feature weights used for construction of the similarity graph by reusing the smoothness objective but with a convex set constraint for the weight vector. The reposed convex but non-differentiable problem is solved via an iterative proximal gradient descent algorithm. The two steps are solved alternately until convergence. Experimental results show that our alternating classifier / graph learning algorithm outperforms existing graph-based methods and support vector machines with various kernels1The work is partly funded by the European Unions Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 734331..","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114988156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Unsupervised Singing Voice Separation Using Gammatone Auditory Filterbank and Constraint Robust Principal Component Analysis 基于伽玛酮听觉滤波组和约束鲁棒主成分分析的无监督歌声分离
Feng Li, M. Akagi
{"title":"Unsupervised Singing Voice Separation Using Gammatone Auditory Filterbank and Constraint Robust Principal Component Analysis","authors":"Feng Li, M. Akagi","doi":"10.23919/APSIPA.2018.8659640","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659640","url":null,"abstract":"This paper presents an unsupervised singing voice separation algorithm which using an extension of robust principal component analysis (RPCA) with rank-1 constraint (CRPCA) based on gammatone auditory filterbank on cochleagram. Unlike the conventional algorithms that focus on spectrogram analysis or its variants, we develop an extension of RPCA on cochleagram using an alternative time-frequency representation based on gammatone auditory filterbank. We also apply time-frequency masking to improve the results of separated low-rank and sparse matrices by using CRPCA method. Evaluation results demonstrate that the proposed algorithm can achieve better separation performance on MIR-IK dataset.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115117276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Effective Tensor Completion Method Based on Multi-linear Tensor Ring Decomposition 基于多线性张量环分解的有效张量补全方法
Jinshi Yu, Guoxu Zhou, Qibin Zhao, Kan Xie
{"title":"An Effective Tensor Completion Method Based on Multi-linear Tensor Ring Decomposition","authors":"Jinshi Yu, Guoxu Zhou, Qibin Zhao, Kan Xie","doi":"10.23919/APSIPA.2018.8659492","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659492","url":null,"abstract":"By considering the balance unfolding scheme does help to catch the global information for tensor completion and the recently proposed tensor ring decomposition, in this paper a weighted multilinear tensor ring decomposition model is proposed for tensor completion and called MTRD. Utilizing the circular dimensional permutation invariance of tensor ring decomposition, a very balance matricization scheme $< k, d >$-unfolding is employed in MTRD. In order to evaluate MTRD, it is applied on both synthetic data and image tensor data, and the experiment results show that MTRD are able to achieve the desired relative square error by spending much less time than its compared methods, i.e. TMac-TT and TR-ALS. The results of image completion also show that MTRD outperforms its compared methods in relative square error. Specifically, TMac-TT and TR-ALS fails to get the same relative square error as MTRD and TR-ALS prevails TMac-TT but requiring a large amount of running time. To sum up, MTRD is more applicable than its compared methods.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115519173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Microphone Position Realignment by Extrapolation of Virtual Microphone 基于虚拟麦克风外推的麦克风位置调整
R. Jinzai, K. Yamaoka, Mitsuo Matsumoto, Takeshi Yamada, S. Makino
{"title":"Microphone Position Realignment by Extrapolation of Virtual Microphone","authors":"R. Jinzai, K. Yamaoka, Mitsuo Matsumoto, Takeshi Yamada, S. Makino","doi":"10.23919/APSIPA.2018.8659728","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659728","url":null,"abstract":"In this paper, microphone realignment by phase extrapolation using the virtual microphone technique for reproducing binaural signals with adequate the interaural time differences (ITDs) for a listener is proposed. For a sound source in the horizontal plane, ITDs are a major cues for localizing a sound image. Since ITDs are not considered for headphones listening in conventional amplitude panning in multichannel recording, sound images are localized inside the head (lateralization). A microphone array is applicable to recording signals with time differences corresponding to the directions of sound sources. Since microphones in such an array are closely positioned, the time differences are inappropriate as ITDs for localizing sound images for the sources. In this paper, phase extrapolation using the virtual microphone technique is applied to the virtual realignment of a microphone in such an array for restoring ITD. In the experiments with two speeches as sound sources located at the leftmost and the rightmost positions from the viewpoint of two real microphones positioned 2.83 cm apart. Furthermore, the phase of a signal of a virtual realigned microphone is extrapolated eight times as much as the phase between the two real microphones. Time differences between signals of one of the real microphones and the realigned one are observed to be $-500 boldsymbol{mu}mathbf{s}$ for the source on the left and $500 boldsymbol{mu}mathbf{s}$ for the source on the right. Furthermore, the interaural cross correlations of the two signals suggest that sound images will be perceived on both the left and right of a listener. In this method, it is expected that prior information on the number of sources and the direction of arrival is not required, and the adjustment of individual differences is easy.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123046519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Lung Disease Classification Based on Feature Fusion Convolutional Neural Network with X-ray Image Enhancement 基于x射线图像增强特征融合卷积神经网络的肺部疾病分类
Yue Cheng, Jinchao Feng, Ke-bin Jia
{"title":"A Lung Disease Classification Based on Feature Fusion Convolutional Neural Network with X-ray Image Enhancement","authors":"Yue Cheng, Jinchao Feng, Ke-bin Jia","doi":"10.23919/APSIPA.2018.8659700","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659700","url":null,"abstract":"With the explosive growth of lung diseases in patients, automatically detecting diseases and obtaining accurate diagnosis through the X-ray medical images become the new research focus in the field of computer science and artificial intelligence to save the significant cost of manual labeling and classifying. However, the quality of common radiograph is not satisfied for the most tasks, and traditional methods are deficient to deal with the massive images. Therefore, we present a feature fusion convolutional neural network (CNN) model to detect pneumothorax from chest X-ray images. Firstly, the preprocessed image samples are enhanced by two methods. Then, a feature fusion CNN model is introduced to combine the Gabor features with the enhanced information extracted from the images and implement the final classification. Comprehensive qualitative and quantitative experiments demonstrate that our proposed model achieve better results in multi-angle views.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122043279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Privacy-Preserving SVM Computing in the Encrypted Domain 加密域的隐私保护SVM计算
Takahiro Maekawa, Ayana Kawamura, Yuma Kinoshita, H. Kiya
{"title":"Privacy-Preserving SVM Computing in the Encrypted Domain","authors":"Takahiro Maekawa, Ayana Kawamura, Yuma Kinoshita, H. Kiya","doi":"10.23919/APSIPA.2018.8659529","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659529","url":null,"abstract":"Privacy-preserving Support Vector Machine (SVM) computing scheme is proposed in this paper. Cloud computing has been spreading in many fields. However, the cloud computing has some serious issues for end users, such as unauthorized use and leak of data, and privacy compromise. We focus on templates protected by a block scrambling-based encryption scheme, and consider some properties of the protected templates for secure SVM computing, where templates mean features extracted from data. The proposed scheme enables us not only to protect templates, but also to have the same performance as that of unprotected templates under some useful kernel functions. Moreover, it can be directly carried out by using well-known SVM algorithms, without preparing any algorithms specialized for secure SVM computing. In an experiment, the pfroposed scheme is applied to a face-based authentication algorithm with SVM classifiers to confirm the effectiveness.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114063091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信