2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

筛选
英文 中文
A Lightweight and Robust Face Recognition Network on Noisy Condition 噪声条件下的轻量化鲁棒人脸识别网络
Lulu Guo, H. Bai, Yao Zhao
{"title":"A Lightweight and Robust Face Recognition Network on Noisy Condition","authors":"Lulu Guo, H. Bai, Yao Zhao","doi":"10.1109/APSIPAASC47483.2019.9023149","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023149","url":null,"abstract":"Recently, deep learning has a significant breakthrough in face recognition research. Using the state-of-art convolutional neural network (CNN) model is continually improving the accuracy of recognition. However, it is difficult that the large CNN models deploy on mobile phones or embedded devices with limited computation resources and memory. At the same time, these face recognition networks show low performance in the complex environment, such as noise, shadow, illumination and so on. To address these problems, we propose a lightweight and robust face recognition network (LD-MobileFaceNet) to improve the traditional MobileFaceNet in noisy environment. In this paper, an efficient and flexible denoising block is proposed, which is an independent module to apply in MobileFaceNet. The proposed denoising block uses non-local means algorithm to denoise features that are extracted by convolutional layers. With the residual connection and the 1 × 1 convolution, it can remain more information and be combined with any layers in MobileFaceNet. Furthermore, we set fewer bottleneck layers, replace PReLU with swish nonlinearity to compensate for the loss accuracy. The experimental results demonstrate that LD-MobileFaceNet with swish is 21.35% more accurate on noisy LFW dataset while reducing parameters by 25 % compared to MobileFaceNet.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117202917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SCRA: A Hybrid Deterministic Routing Algorithm for Aging-Resilient Network-an-Chip 一种用于抗老化芯片网络的混合确定性路由算法
Bowen Zhang, Huaxi Gu, Ruiqi Guo
{"title":"SCRA: A Hybrid Deterministic Routing Algorithm for Aging-Resilient Network-an-Chip","authors":"Bowen Zhang, Huaxi Gu, Ruiqi Guo","doi":"10.1109/APSIPAASC47483.2019.9023140","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023140","url":null,"abstract":"Network-on-Chip (NoC) has been proposed as a promising interconnection candidate solution for its high network bandwidth, low communication energy consumption and good parallel transmission capability. However, future many-cores processor will face aging problems such as negative bias temperature instability (NBTI), hot-carrier injection (HCI) and electro-migration (EM). These aging problems will cause switching delay and critical path depravation under imbalanced loads, which leads to bad system reliability. In this paper, a deterministic aging-resilient hybrid routing algorithm called SCRA (source-based configuration router algorithm) is proposed to evenly distribute packet flow over entire network and relieve the aging problems in NoC. In SCRA, a flow distribution model is used to achieve the best uniformity of network communications by combing the complementary characteristics of XY and YX routing algorithm. With the simulation and analysis results, SCRA can realize better uniformity and incremental longevity on the premise of ensuring accessibility and achieves acceptable network communication performance when compared with the single dimensional order routing algorithm.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124918743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consideration of a Selecting Frame of Finger-Spelled Words from Backhand View 从反手角度看手指拼写单词选择框架的思考
P. Chophuk, Kanjana Pattanaworapan, K. Chamnongthai
{"title":"Consideration of a Selecting Frame of Finger-Spelled Words from Backhand View","authors":"P. Chophuk, Kanjana Pattanaworapan, K. Chamnongthai","doi":"10.1109/APSIPAASC47483.2019.9023155","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023155","url":null,"abstract":"To understand finger alphabet from backhand sign video, there are many redundant video frames between consecutive alphabets and among video frames of an alphabet. These redundant video frames cause loss in finger alphabet understanding, and should be considered to delete. This paper proposes a method to select significant video frames of sign for finger-spelled words of each letter to make more information from backhand view. In this method, finger-spelled words video is divided into frames, and each frame is converted to a binary image by an automatic threshold, and a binary image change to contour frames. Then, we apply the located centroid as the center of the contour image frame to calculate the distance to all boundaries of image frames. After that, all distances of each frame are presented as signature signals that identify each frame, and these values are used with the selected frame equation to select a significant frame. Finally, 1D Signature signal as their feature is extracted from selected frames. For evaluation of our proposed method, 6 samples of finger-spelled words of the American Sign Language (ASL) are used to select a significant frame, and Hidden Markov Models (HMM) is used to classify the words. The accuracy of the proposed method is evaluated 97.5% approximately.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123575792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Dynamic-attention based Encoder-decoder model for Speaker Extraction with Anchor speech 基于动态注意力的锚定语音说话人提取编解码器模型
Hao Li, Xueliang Zhang, Guanglai Gao
{"title":"Dynamic-attention based Encoder-decoder model for Speaker Extraction with Anchor speech","authors":"Hao Li, Xueliang Zhang, Guanglai Gao","doi":"10.1109/APSIPAASC47483.2019.9023204","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023204","url":null,"abstract":"Speech plays an important role in human-computer interaction. For many real applications, an annoying problem is that speech is often degraded by interfering noise. Extracting target speech from background interference is a meaningful and challenging task, especially when interference is also human voice. This work addresses the problem of extracting target speaker from interfering speaker with a short piece of anchor speech which is used to obtain the target speaker identify. We propose a encoder-decoder neural network architecture. Specifically, the encoder transforms the anchor speech to a embedding which is used to represent the identity of target speaker. The decoder utilizes the speaker identity to extract the target speech from mixture. To make a acoustic-related speaker identity, The dynamic-attention mechanism is utilized to build a time-varying embedding for each frame of the mixture. Systematic evaluation indicates that our approach improves the quality of speaker extraction.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129605145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Effect of Relative Frequency of Lexical Meanings on Accessing Lexical Ambiguities: Evidence from the Coordinator ‘and’ 词义相对频率对词汇歧义获取的影响:来自“协调者”和“协调者”的证据
Xiaoqun Dong, Xueqin Zhao
{"title":"Effect of Relative Frequency of Lexical Meanings on Accessing Lexical Ambiguities: Evidence from the Coordinator ‘and’","authors":"Xiaoqun Dong, Xueqin Zhao","doi":"10.1109/APSIPAASC47483.2019.9023064","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023064","url":null,"abstract":"Lexical ambiguity is a common phenomenon in English. Research on the resolution of lexical ambiguity began since 1970s, and has developed several theories on how comprehenders settle on a single meaning [12], [21], [28], [30]. Many studies have investigated the effects of relative meaning frequency and other factors on lexical ambiguity resolution [27], [29], [36], while the research subjects are mainly content words. Whether there are effects of relative meaning frequency on accessing coordinators keeps unclear. The present study takes the coordinator ‘and’ as the research subject to explore the effect of relative meaning frequency on lexical access via a lexical decision task and further investigate whether related meanings of ‘and’ lead to confusions in lexical access. In the experiment, 21 participants who are advanced Chinese EFL learners were requested to choose one of the two meanings for ‘and’ which connects two clauses in a complex sentence, and the accuracy and reaction time (RT) were collected. It was found that relative meaning frequency did influence accessing meanings of coordinator ‘and’—the higher the relative meaning frequency, the shorter the response time, and the relatedness between meanings led to confusions in lexical access. These results confirm the effect of relative meaning frequency on accessing meanings of coordinators and reveal the importance of distinguishing the related meanings.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128302697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Experimental investigation on the efficacy of Affine-DTW in the quality of voice conversion 仿射- dtw对语音转换质量影响的实验研究
Gaku Kotani, Hitoshi Suda, D. Saito, N. Minematsu
{"title":"Experimental investigation on the efficacy of Affine-DTW in the quality of voice conversion","authors":"Gaku Kotani, Hitoshi Suda, D. Saito, N. Minematsu","doi":"10.1109/APSIPAASC47483.2019.9023107","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023107","url":null,"abstract":"In this paper, the performance of Affine-DTW, which performs appropriate time alignment between source and target features in voice conversion (VC), is experimentally and thoroughly investigated. In traditional VC, parallel data are often required to train a mapping model between source and target features. While VC with non-parallel data is also studied to avoid collecting parallel data, the quality of its converted speech is still inferior to the traditional one with parallel data. One approach to further progress in VC is exploiting both parallel and non-parallel data, the former of which is pre-stored and the latter of which is assumed to be easily collected. In this case, it is still worthwhile to study time-alignment techniques to obtain appropriate alignment of parallel data. Affine-DTW is a technique in which dynamic time warping (DTW) and coarse conversion based on affine transformation are iteratively performed. In Affine-DTW, time alignment and parameters of affine transformation can be analytically calculated so that it can be easily adopted as pre-processing in VC. However, the influence on the performance of trained models based on the obtained alignments has not been well investigated experimentally. Hence, this paper investigates the performance of Affine-DTW in terms of quality improvement of converted speech in traditional VC methods based on Gaussian mixture models, non-negative matrix factorization and neural networks. Experimental results show that Affine-DTW obtains appropriate alignments and the naturalness improvement of converted speech in subjective assessments is observed in trained models based on the alignments.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128365607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Allpass Modeling of Phase Spectrum of Speech Signals for Formant Tracking 语音信号相位谱的全通建模与峰形跟踪
K. Vijayan, K. Murty, Haizhou Li
{"title":"Allpass Modeling of Phase Spectrum of Speech Signals for Formant Tracking","authors":"K. Vijayan, K. Murty, Haizhou Li","doi":"10.1109/APSIPAASC47483.2019.9023271","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023271","url":null,"abstract":"Formant tracking is a very important task in speech applications. Most of the current formant tracking methods bank on peak picking from linear prediction (LP) spectrum of speech, which suffers from merged/spurious peaks in LP spectra, resulting in unreliable formant candidates. In this paper, we present the significance of phase spectrum of speech in refining the formant candidates from LP analysis. The short-time phase spectrum of speech is modeled as phase response of an allpass (AP) system, where the coefficients of AP system are initialized with LP coefficients and estimated with an iterative procedure. This technique refines the initial formants from LP analysis using phase spectrum of speech through an AP analysis, thereby accomplishing fusion of information from magnitude and phase spectra. The group delay of the resultant AP system exhibits unambiguous peaks at formants and, delivers reliable formant candidates. The formant trajectories obtained by selection of formants from these candidates are reported to be more accurate than those obtained from LP analysis. The fused information from magnitude and phase spectra has rendered relative improvements of 25%, 15% and 18% in tracking accuracy of first, second and third formants, respectively, over those from magnitude spectrum alone.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128560405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Speaker to Emotion: Domain Adaptation for Speech Emotion Recognition with Residual Adapters 说话人对情绪的适应:残馀调合器对语音情绪识别的领域适应
Yuxuan Xi, Pengcheng Li, Yan Song, Yiheng Jiang, Lirong Dai
{"title":"Speaker to Emotion: Domain Adaptation for Speech Emotion Recognition with Residual Adapters","authors":"Yuxuan Xi, Pengcheng Li, Yan Song, Yiheng Jiang, Lirong Dai","doi":"10.1109/APSIPAASC47483.2019.9023339","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023339","url":null,"abstract":"Despite considerable recent progress in deep learning methods for speech emotion recognition (SER), performance is severely restricted by the lack of large-scale labeled speech emotion corpora. For instance, it is difficult to employ complex neural network architectures such as ResNet, which accompanied by large-sale corpora like VoxCeleb and NIST SRE, have proven to perform well for the related speaker verification (SV) task. In this paper, a novel domain adaptation method is proposed for the speech emotion recognition (SER) task, which aims to transfer related information from a speaker corpus to an emotion corpus. Specifically, a residual adapter architecture is designed for the SER task where ResNet acts as a universal model for general information extraction. An adapter module then trains limited additional parameters to focus on modeling deviation for the specific SER task. To evaluate the effectiveness of the proposed method, we conduct extensive evaluations on benchmark IEMOCAP and CHEAVD 2.0 corpora. Results show significant improvement, with overall results in each task outperforming or matching state-of-the-art methods.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128671034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Modeling Content Interaction in Information Diffusion with Pre-trained Sentence Embedding 基于预训练句子嵌入的信息扩散内容交互建模
Qinyuan Ye, Yuejiang Li, Yan Chen, H. V. Zhao
{"title":"Modeling Content Interaction in Information Diffusion with Pre-trained Sentence Embedding","authors":"Qinyuan Ye, Yuejiang Li, Yan Chen, H. V. Zhao","doi":"10.1109/APSIPAASC47483.2019.9023215","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023215","url":null,"abstract":"Social networks have become indispensable parts of our daily life, and therefore understanding the process of information diffusion over social networks is a meaningful research topic. Usually, multiple pieces of information do not spread in isolation; rather, they interact with each other throughout the diffusion process. This paper aims to quantify these interactions by modeling users' forwarding behavior after reading a series of information. Inspired by several successful components prevalent in recent research of deep learning, i.e., long short term memory (LSTM) network and bi-directional encoder representation from transformers (BERT), we designed IMM Enhanced model and InfoLSTM model. In our experiments on real-world Weibo dataset, both models significantly outperform baselines such as the prior IMM model and IP model, with IMM Enhanced model improving 23.52% and InfoLSTM model improving 32.56% in F1 score (absolute value) compared to that of baseline IMM model. In addition, we visualize the dataset and the parameters learned in IMM Enhanced model, which further enables us to discuss the relationship between text similarity and information diffusion interaction with case studies.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126817086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Decentralized Tracing Protocol for Fingerprinting System with Index Table 基于索引表的指纹系统高效分散跟踪协议
M. Kuribayashi, N. Funabiki
{"title":"Efficient Decentralized Tracing Protocol for Fingerprinting System with Index Table","authors":"M. Kuribayashi, N. Funabiki","doi":"10.1109/APSIPAASC47483.2019.9023302","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023302","url":null,"abstract":"Due to the burden at a trusted center, a decentralized fingerprinting system has been proposed by delegating authority to an authorized server so that the center does not participate in the tracing protocol. As a fingerprinting code is used to retain a collusion resistance, the calculation of correlation score for each user is required to identify illegal users from a pirated copy. Considering the secrecy of code parameters, the computation must be executed by a seller in an encrypted domain to realize the decentralized tracing protocol. It requires much computational costs as well as the communication costs between the center and a seller because encrypted database (DB) is necessary for the computation. In this paper, we propose a method to reduce such costs by using the EIGamal cryptosystem over elliptic curve instead of the Paillier cryptosystem used in the conventional scheme. Our experimental results indicate that the time consumption becomes almost 100 times shorter and the size of encrypted DB reduced by a factor of 7/32 under 112-bit security level. The encrypted DB is further compressed by introducing an index table.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121159623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信