2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

筛选
英文 中文
Cross-Domain Speaker Recognition using Cycle-Consistent Adversarial Networks 使用周期一致对抗网络的跨域说话人识别
Y. Liu, Bairong Zhuang, Zhiyu Li, T. Shinozaki
{"title":"Cross-Domain Speaker Recognition using Cycle-Consistent Adversarial Networks","authors":"Y. Liu, Bairong Zhuang, Zhiyu Li, T. Shinozaki","doi":"10.1109/APSIPAASC47483.2019.9023042","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023042","url":null,"abstract":"Speaker recognition systems often suffer from severe performance degradation due to the difference between training and evaluation data, which is called domain mismatch problem. In this paper, we apply adversarial strategies in deep learning techniques and propose a method using cycle-consistent adversarial networks for i-vector domain adaptation. This method performs an i-vector domain transformation from the source domain to the target domain to reduce the domain mismatch. It uses a cycle structure that reduces the negative influence of losing speaker information in i-vector during the transformation and makes it possible to use unpaired dataset for training. The experimental results show that the proposed adaptation method improves recognition performance of a conventional i-vector and PLDA based speaker recognition system by reducing the domain mismatch between the training and the evaluation sets.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126767495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Many-to-many Cross-lingual Voice Conversion with a Jointly Trained Speaker Embedding Network 基于联合训练说话人嵌入网络的多对多跨语言语音转换
Yi Zhou, Xiaohai Tian, Rohan Kumar Das, Haizhou Li
{"title":"Many-to-many Cross-lingual Voice Conversion with a Jointly Trained Speaker Embedding Network","authors":"Yi Zhou, Xiaohai Tian, Rohan Kumar Das, Haizhou Li","doi":"10.1109/APSIPAASC47483.2019.9023277","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023277","url":null,"abstract":"Among various voice conversion (VC) techniques, average modeling approach has achieved good performance as it benefits from training data of multiple speakers, therefore, reducing the reliance on training data from the target speaker. Many existing average modeling approaches rely on the use of i-vector to represent the speaker identity for model adaptation. As such i-vector is extracted in a separate process, it is not optimized to achieve the best voice conversion quality for the average model. To address this problem, we propose a low dimensional trainable speaker embedding network that augments the primary VC network for joint training. We validate the effectiveness of the proposed idea by performing a many-to-many cross-lingual VC, which is one of the most challenging tasks in VC. We compare the i-vector scheme with the speaker embedding network in the experiments. It is found that the proposed system effectively improves the speech quality and speaker similarity.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126514989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Integrating Action-aware Features for Saliency Prediction via Weakly Supervised Learning 基于弱监督学习集成动作感知特征的显著性预测
Jiaqi Feng, Shuai Li, Yunfeng Sui, Lingtong Meng, Ce Zhu
{"title":"Integrating Action-aware Features for Saliency Prediction via Weakly Supervised Learning","authors":"Jiaqi Feng, Shuai Li, Yunfeng Sui, Lingtong Meng, Ce Zhu","doi":"10.1109/APSIPAASC47483.2019.9023127","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023127","url":null,"abstract":"Deep learning has been widely studied for saliency prediction. Despite the great performance improvement introduced by deep saliency models, some high-level concepts that contribute to the saliency prediction, such as text, objects of gaze and action, locations of motion, and expected locations of people, have not been explicitly considered. This paper investigates the objects of action and motion, and proposes to use action-aware features to compensate deep saliency models. The action-aware features are generated via weakly supervised learning using an extra action classification network trained with existing image based action datasets. Then a feature fusion module is developed to integrate the action-aware features for saliency prediction. Experiments show that the proposed saliency model with the action-aware features achieves better performance on three public benchmark datasets. More experiments are further conducted to analyze the effectiveness of the action-aware features in saliency prediction. To the best of our knowledge, this study is the first attempt on explicitly integrating objects of action and motion concept into deep saliency models.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126621189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anonymization of Gait Silhouette Video by Perturbing Its Phase and Shape Components 基于相位和形状分量扰动的步态轮廓视频匿名化
Yuki Hirose, Kazuaki Nakamura, Naoko Nitta, N. Babaguchi
{"title":"Anonymization of Gait Silhouette Video by Perturbing Its Phase and Shape Components","authors":"Yuki Hirose, Kazuaki Nakamura, Naoko Nitta, N. Babaguchi","doi":"10.1109/APSIPAASC47483.2019.9023196","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023196","url":null,"abstract":"Nowadays there are a lot of videos containing walking people on the web (e.g. YouTube). These videos can cause a privacy issue because the walking people can be identified by silhouette-based gait recognition systems which have been rapidly advanced in recent years. To solve the issue, in this paper, we propose a method for anonymizing human gait silhouettes. A gait silhouette consists of a static component including the body shape and a dynamic component including postures. We refer to the former and the latter as a shape component and a phase component, respectively. The proposed method anonymizes given gait silhouettes as follows: First, each of the given silhouettes is decomposed into its shape and phase components. Next, both components are separately perturbed. Finally, a new gait silhouette is generated from the perturbed components. Owing to the perturbation, the original silhouettes become less informative in the static aspect as well as the dynamic aspect, by which the gait recognition performance is seriously degraded. In our experimental results, the accuracy was actually degraded from 100% to 30% or less, without yielding any unnatural appearance in the output anonymized gait silhouettes.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114203333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Automatic Handwriting Verification and Suspect Identification for Chinese Characters Using Space and Frequency Domain Features 基于空间和频域特征的汉字手写自动验证与可疑识别
Wei-Cheng Liao, Jian-Jiun Ding
{"title":"Automatic Handwriting Verification and Suspect Identification for Chinese Characters Using Space and Frequency Domain Features","authors":"Wei-Cheng Liao, Jian-Jiun Ding","doi":"10.1109/APSIPAASC47483.2019.9023114","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023114","url":null,"abstract":"Automatic handwriting verification is to identify whether the script was written by a person himself or forged. Compared to related works about handwriting verification, the proposed algorithm adopts the features in both the time domain and the frequency domain. Moreover, in addition to distinguishing the forged manuscript from the genuine one, the proposed algorithm can also identify the suspect. The proposed algorithm is robust to writing instruments. In addition to the information of the luminance of the script, we also adopt the energy distribution on the 2-D frequency domain, the Pearson product-moment correlation coefficient (PPMCC) with genuine scripts, and vital information on characterized script points. Simulations show that the proposed method outperforms many advanced methods, including the deep-learning based method and manual identification by human beings. The proposed algorithm can well identify the script even if it is forged after several times of practice.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"187 1-6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120932332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Convolutional Recurrent Neural Networks Outperform CNN and RNN in Task-state EEG Detection for Parkinson's Disease 混合卷积递归神经网络在帕金森病的任务状态脑电图检测中优于CNN和RNN
Xinjie Shi, Tianqi Wang, Lan Wang, Hanjun Liu, N. Yan
{"title":"Hybrid Convolutional Recurrent Neural Networks Outperform CNN and RNN in Task-state EEG Detection for Parkinson's Disease","authors":"Xinjie Shi, Tianqi Wang, Lan Wang, Hanjun Liu, N. Yan","doi":"10.1109/APSIPAASC47483.2019.9023190","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023190","url":null,"abstract":"In hospitals, brain-related disorders such as Parkinson's disease (PD) could be diagnosed by analyzing electroencephalograms (EEG). However, conventional EEG-based diagnosis for PD relies on handcrafted feature extraction, which is laborious and time-consuming. With the emergence of deep learning, automated analysis of EEG signals can be realized by exploring the inherent information in data, and outputting the results of classification from the hidden layer. In the present study, four deep learning algorithm architectures, including two convention deep learning models (convolutional neural network, CNN; and recurrent neural network, RNN) and two hybrid convolutional recurrent neural networks (2D-CNN-RNN and 3D-CNN-RNN), were designed to detect PD based on task-state EEG signals. Our results showed that the hybrid models outperformed conventional ones (fivefold average accuracy: 3D-CNN-RNN 82.89%, 2D-CNN-RNN 81.13%, CNN 80.89%, and RNN 76.00%) as they combine the strong modeling power of CNN in temporal feature extraction, and the advantage of RNN in processing sequential information. This study represents the an attempt to use hybrid convolutional recurrent neural networks in classifying PD and normal take-state EEG signals, which carries important implications to the clinical practice.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121079260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Attribute Estimation Using Multi-CNNs from Hand Images 基于手部图像的多cnn属性估计
Yi-Chun Lin, Yusei Suzuki, Hiroya Kawai, Koichi Ito, Hwann-Tzong Chen, T. Aoki
{"title":"Attribute Estimation Using Multi-CNNs from Hand Images","authors":"Yi-Chun Lin, Yusei Suzuki, Hiroya Kawai, Koichi Ito, Hwann-Tzong Chen, T. Aoki","doi":"10.1109/APSIPAASC47483.2019.9023260","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023260","url":null,"abstract":"The human hand is one of the primary biometric traits in person authentication. A hand image also includes a lot of attribute information such as gender, age, skin color, accessory, and etc. Most conventional methods for hand-based biometric recognition rely on one distinctive attribute like palmprint and fingerprint. The other attributes as gender, age, skin color and accessory known as soft biometrics are expected to help identify individuals but are rarely used for identification. This paper proposes an attribute estimation method using multi-convolutional neural network (CNN) from hand images. We specially design new multi-CNN architectures dedicated to estimating multiple attributes from hand images. We train and test our models using 11K Hands, which consists of more than 10,000 images with 7 attributes and ID. The experimental results demonstrate that the proposed method exhibits the efficient performance on attribute estimation.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121095429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Spherical Position Dependent Rate-Distortion Optimization for 360-degree Video Coding 球面位置相关的360度视频编码率失真优化
Yuyang Liu, Hongwei Guo, Ce Zhu, Yipeng Liu
{"title":"Spherical Position Dependent Rate-Distortion Optimization for 360-degree Video Coding","authors":"Yuyang Liu, Hongwei Guo, Ce Zhu, Yipeng Liu","doi":"10.1109/APSIPAASC47483.2019.9023222","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023222","url":null,"abstract":"360-degree video in spherical format cannot be well handled by the conventional video coding tools. Currently, most of the 360-degree video coding methods first project the spherical video content onto a 2-dimensional plane and then compress the projected video using a conventional video codec. However, the projection conversion process will cause an irreversible conversion error, which indicates that the reconstruction quality of the projected video cannot fully represent that of the spherical video. In view of this, this paper proposes a spherical position dependent rate-distortion optimization (RDO) approach for 360-degree video coding. During the RDO process, spherical reconstruction quality is taken into consideration and calculated according to the spherical position of the pixels in each coding unit (CU). Furthermore, the Lagrangian multiplier and quantization parameter are adjusted accordingly. The proposed method is implemented on HEVC reference software HM-16.7. Experimental results show that the proposed method can achieve better coding performance, compared with HM-16.7.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116640809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing Native Chinese Listeners' Speech Reception Thresholds for Mandarin and English Consonants 汉语母语听者对普通话和英语辅音语音接受阈值的比较
Jian Gong, Ya‐ju Yu, William Bellamy, Feng Wang, Xiaoli Ji, Zhenzhen Yang
{"title":"Comparing Native Chinese Listeners' Speech Reception Thresholds for Mandarin and English Consonants","authors":"Jian Gong, Ya‐ju Yu, William Bellamy, Feng Wang, Xiaoli Ji, Zhenzhen Yang","doi":"10.1109/APSIPAASC47483.2019.9023246","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023246","url":null,"abstract":"The presence of noise can greatly affect listeners' speech perception. Previous studies have demonstrated that nonnative listeners' speech perception performance is reduced more than natives' in noise conditions. Most previous studies have focused on the effects of different noise types on non-native speech perception, and using a fixed signal to noise ratio level in different perception tasks. However, the masking effect of noise may be different for individual speech sounds, therefore leaving an incomplete picture of non-native speech perception in noise conditions. The current study applies an adaptive procedure to dynamically adjust the signal to noise ratio to measure listeners' Speech Reception Threshold (SRT) in noise conditions. More specifically, a group of native Chinese listeners' SRTs for Mandarin and English consonants in Speech Shaped Noise were measured and compared. The results showed that Chinese listeners' mean SRT for Mandarin consonants was 3.6dB lower than that for English consonants, indicating a general native language advantage. However, detailed analysis has revealed the mean SRT for the 5 most noise-tolerable consonants in Mandarin was 2.6dB higher than that in English. This result suggests that non-native speech perception in noise conditions may not always be more difficult than native ones. The acoustic features of different sounds could affect their intelligibility in noise conditions.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116651613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning Contextual Representation with Convolution Bank and Multi-head Self-attention for Speech Emphasis Detection 基于卷积库和多头自注意学习上下文表示的语音重点检测
Liangqi Liu, Zhiyong Wu, Runnan Li, Jia Jia, H. Meng
{"title":"Learning Contextual Representation with Convolution Bank and Multi-head Self-attention for Speech Emphasis Detection","authors":"Liangqi Liu, Zhiyong Wu, Runnan Li, Jia Jia, H. Meng","doi":"10.1109/APSIPAASC47483.2019.9023243","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023243","url":null,"abstract":"In speech interaction scenarios, speech emphasis plays an important role in conveying the underlying intention of the speaker. For better understanding of user intention and further enhancing user experience, techniques are employed to automatically detect emphasis from the user's input speech in human-computer interaction systems. However, even for state-of-the-art approaches, challenges still exist: 1) the various vocal characteristics and expressions of spoken language; 2) the long-range temporal dependencies in the speech utterance. Inspired by human perception mechanism, in this paper, we propose a novel attention-based emphasis detection architecture to address the above challenges. In the proposed approach, convolution bank is utilized to extract informative patterns of different dependency scope and learn various expressions of emphasis, and multi-head self-attention mechanism is utilized to detect local prominence in speech with the consideration of global contextual dependencies. Experimental results have shown the superior performance of the proposed approach, with 2.62% to 3.54% improvement on F1-measure compared with state-of-the-art approaches.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123868897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信