2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

筛选
英文 中文
Dynamic Attention Loss for Small-Sample Image Classification 小样本图像分类的动态注意力损失
Jie Cao, Yinping Qiu, Dongliang Chang, Xiaoxu Li, Zhanyu Ma
{"title":"Dynamic Attention Loss for Small-Sample Image Classification","authors":"Jie Cao, Yinping Qiu, Dongliang Chang, Xiaoxu Li, Zhanyu Ma","doi":"10.1109/APSIPAASC47483.2019.9023268","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023268","url":null,"abstract":"Convolutional Neural Networks (CNNs) have been successfully used in various image classification tasks and gradually become one of the most powerful machine learning approaches. To improve the capability of model generalization and performance on small-sample image classification, a new trend is to learn discriminative features via CNNs. The idea of this paper is to decrease the confusion between categories to extract discriminative features and enlarge inter-class variance, especially for classes which have indistinguishable features. In this paper, we propose a loss function termed as Dynamic Attention Loss (DAL), which introduces confusion rate-weighted soft label (target) as the controller of similarity measurement between categories, dynamically giving corresponding attention to samples especially for those classified wrongly during the training process. Experimental results demonstrate that compared with Cross-Entropy Loss and Focal Loss, the proposed DAL achieved a better performance on the LabelMe dataset and the Caltech101 dataset.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132294299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Median based Multi-label Prediction by Inflating Emotions with Dyads for Visual Sentiment Analysis 视觉情感分析中基于中位数的多标签情绪膨胀预测
Tetsuya Asakawa, Masaki Aono
{"title":"Median based Multi-label Prediction by Inflating Emotions with Dyads for Visual Sentiment Analysis","authors":"Tetsuya Asakawa, Masaki Aono","doi":"10.1109/APSIPAASC47483.2019.9023303","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023303","url":null,"abstract":"Visual sentiment analysis investigates sentiment estimation from images and has been an interesting and challenging research problem. Most studies have focused on estimating a few specific sentiments and their intensities. Multi-label sentiment estimation from images has not been sufficiently investigated. The purpose of this research is to accurately estimate the sentiments as a multi-label multi-class problem from given images that evoke multiple different emotions simultaneously. We first introduce the emotion inflation method from six emotions defined by the Emotion6 dataset into 13 emotions (which we call ‘Transf13’) by means of emotional dyads. We then perform multi-label sentiment analysis using the emotion-inflated dataset, where we propose a combined deep neural network model which enables inputs to come from both hand-crafted features (e.g. BoVW (Bag of Visual Words) features) and CNN features. We also introduce a median-based multi-label prediction algorithm, in which we assume that each emotion has a probability distribution. In other words, after training of our deep neural network, we predict the existence of an evoked emotion for a given unknown image if the intensity of the emotion is larger than the median of the corresponding emotion. Experimental results demonstrate that our model outperforms existing state-of-the-art algorithms in terms of subset accuracy.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"5 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134288807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dynamic Adjustment of Railway Emergency Plan Based on Utility Risk Entropy 基于效用风险熵的铁路应急预案动态调整
Qian Ren, Zhenhai Zhang
{"title":"Dynamic Adjustment of Railway Emergency Plan Based on Utility Risk Entropy","authors":"Qian Ren, Zhenhai Zhang","doi":"10.1109/APSIPAASC47483.2019.9023044","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023044","url":null,"abstract":"Raising the speed of high-speed railway train provides great convenience for people to travel, but once an emergency occurs, the consequences are incalculable. Because of the uncertainties in the development process of railway emergencies, emergency decision-making often needs to be adjusted according to the changes of the state of the incident, that is, dynamic adjustment. Aiming at the dynamic adjustment of railway emergency plan, the emergency decision-making process is divided into several stages according to the key nodes. At each stage, the perceived utility value under the combination of the corresponding scheme and the scenario is obtained, and the utility risk function is derived by combining the utility value with its occurrence probability. Considering the utility risk under different situations of the same scheme, the utility risk entropy of the emergency response scheme is obtained, and the best scheme at the current moment is selected. Finally, an example is given to verify the effectiveness of the proposed method.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134149708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks 基于时间卷积递归神经网络的单通道语音增强
Jingdong Li, Hui Zhang, Xueliang Zhang, Changliang Li
{"title":"Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks","authors":"Jingdong Li, Hui Zhang, Xueliang Zhang, Changliang Li","doi":"10.1109/APSIPAASC47483.2019.9023013","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023013","url":null,"abstract":"In recent decades, neural network based methods have significantly improved the performance of speech enhancement. Most of them estimate time-frequency (T-F) representation of target speech directly or indirectly, then resynthesize waveform using the estimated T-F representation. In this work, we proposed the temporal convolutional recurrent network (TCRN), an end-to-end model that directly map noisy waveform to clean waveform. The TCRN, which is combined convolution and recurrent neural network, is able to efficiently and effectively leverage short-term ang long-term information. Furthermore, we present the architecture that iterately downsample and upsample speech during forward propagation. We show that our model is able to improve the performance of model, compared with existing convolutional recurrent networks. Furthermore, We present several key techniques to stabilize the training process. The experimental results show that our model consistently outperforms existing speech enhancement approaches, in terms of speech intelligibility and quality.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"254 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114649487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Dilated-Gated Convolutional Neural Network with A New Loss Function on Sound Event Detection 基于新损失函数的扩展门控卷积神经网络在声音事件检测中的应用
Ke-Xin He, Weiqiang Zhang, Jia Liu, Yao Liu
{"title":"Dilated-Gated Convolutional Neural Network with A New Loss Function on Sound Event Detection","authors":"Ke-Xin He, Weiqiang Zhang, Jia Liu, Yao Liu","doi":"10.1109/APSIPAASC47483.2019.9023308","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023308","url":null,"abstract":"In this paper, we propose a new method for rare sound event detection. Compared with conventional Convolutional Recurrent Neural Network (CRNN), we devise a Dilated-Gated Convolutional Neural Network (DGCNN) to improve the detection accuracy as well as computational efficiency. Furthermore, we propose a new loss function. Since frame-level predictions will be post processed to get final prediction, continuous false alarm frames will lead to more insertion errors than single false alarm frame. So we adopt a discriminative penalty term to the loss function to reduce insertion errors. Our method is tested on the dataset of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge task 2. Our model can achieve an F-score of 91.3% and error rate of 0.16 on the evaluation dataset while baseline achieves an F-score of 87.5% and error rate of 0.23.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114774103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic Ontology Population Using Deep Learning for Triple Extraction 基于深度学习的三重抽取本体自动填充
Ming-Hsiang Su, Chung-Hsien Wu, Po-Chen Shih
{"title":"Automatic Ontology Population Using Deep Learning for Triple Extraction","authors":"Ming-Hsiang Su, Chung-Hsien Wu, Po-Chen Shih","doi":"10.1109/APSIPAASC47483.2019.9023113","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023113","url":null,"abstract":"Ontology is a kind of representation used to represent knowledge in a form that computers can derive the content meaning. The purpose of this work is to automatically populate an ontology using deep neural networks for updating an ontology with new facts from an input knowledge resource. In this study for automatic ontology population, a bi-LSTM-based term extraction model based on character embedding is proposed to extract the terms from a sentence. The extracted terms are regarded as the concepts of the ontology. Then, a multi-layer perception network is employed to decide the predicates between the pairs of the extracted concepts. The two concepts (one serves as subject and the other as object) along with the predicate form a triple. The number of occurrences of the dependency relations between the concepts and the predicates are estimated. The predicates with low occurrence frequency are filtered out to obtain precise triples for ontology population. For evaluation of the proposed method, we collected 46,646 sentences from Ontonotes 5.0 for training and testing the bi-LSTM-based term extraction model. We also collected 404,951 triples from ConceptNet 5 for training and testing the multilayer perceptron-based triple extraction model. From the experimental results, the proposed method could extract the triples from the documents, achieving 74.59% accuracy for ontology population.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115260496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Robust Camera Model Identification Based on Richer Convolutional Feature Network 基于更丰富卷积特征网络的鲁棒相机模型识别
Zeyu Zou, Yunxia Liu, Wen-Na Zhang, Yuehui Chen, Yun-Li Zang, Yang Yang, Bonnie Ngai-Fong Law
{"title":"Robust Camera Model Identification Based on Richer Convolutional Feature Network","authors":"Zeyu Zou, Yunxia Liu, Wen-Na Zhang, Yuehui Chen, Yun-Li Zang, Yang Yang, Bonnie Ngai-Fong Law","doi":"10.1109/APSIPAASC47483.2019.9023334","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023334","url":null,"abstract":"Based on convolutional neural network (CNN), the problem of robust patch level camera model identification is studied in this paper. Firstly, an effective feature representation is proposed by concatenating a multiscale residual prediction module as well as the original RGB images. Motivated by exploration of multi-scale characteristic, the multiscale residual prediction module automatically learn the residual images to avoid the subsequent CNN being affected by the scene content. Color channel information is integrated for enhanced diversity of CNN inputs. Secondly, a modified richer convolutional feature network is presented for robust camera model identification by fully exploiting the learnt features. Finally, the effectiveness of the proposed method is verified by abundant experimental results at the patch level, which is more difficult than image level experiments.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115411685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Prosodic Realization of Focus in English by Bidialectal Mandarin Speakers 双语者英语焦点的韵律实现
Jiajing Zhang, Ying Chen, Jie Cui
{"title":"Prosodic Realization of Focus in English by Bidialectal Mandarin Speakers","authors":"Jiajing Zhang, Ying Chen, Jie Cui","doi":"10.1109/APSIPAASC47483.2019.9023060","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023060","url":null,"abstract":"This study was designed to explore the prosodic patterns of focus in English by bidialectal Mandarin speakers. One learner group speaks Nanjing Mandarin as first dialect (D1) and standard Mandarin as second dialect (D2), and the other learner group speaks Changchun Mandarin as D1 and standard Mandarin as D2. This paper compares their prosodic outcome of focus realization in English in a production experiment. Results indicate that both Changchun and Nanjing bidialectal speakers produced clear in-focus expansion of duration, pitch and intensity and post-focus compression (PFC) of pitch and intensity yet were not able to acquire native-like patterns of PFC in English. Although these two groups' D1s are different dialects of Mandarin, they produced statistically similar patterns of prosodic focus in L2 English. These findings provide further support for the claim that PFC cannot be easily transferred cross-linguistically [11], [14], [15], [17], [18] despite its existence in both dialects of learners' L1 and their L2.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115500398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Can We Simulate Generative Process of Acoustic Modeling Data? Towards Data Restoration for Acoustic Modeling 声学建模数据的生成过程能否模拟?声学建模数据恢复研究
Ryo Masumura, Yusuke Ijima, Satoshi Kobashikawa, T. Oba, Y. Aono
{"title":"Can We Simulate Generative Process of Acoustic Modeling Data? Towards Data Restoration for Acoustic Modeling","authors":"Ryo Masumura, Yusuke Ijima, Satoshi Kobashikawa, T. Oba, Y. Aono","doi":"10.1109/APSIPAASC47483.2019.9023184","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023184","url":null,"abstract":"In this paper, we present an initial study on data restoration for acoustic modeling in automatic speech recognition (ASR). In the ASR field, the speech log data collected during practical services include customers' personal information, so the log data must often be preserved in segregated storage areas. Our motivation is to permanently and flexibly utilize the log data for acoustic modeling even though the log data cannot be moved from the segregated storage areas. Our key idea is to construct portable models that can simulate the generative process of acoustic modeling data so as to artificially restore the acoustic modeling data. Therefore, this paper proposes novel generative models called acoustic modeling data restorers (AMDRs), that can randomly sample triplets of a phonetic state sequence, an acoustic feature sequence, and utterance attribute information, even if original data is not directly accessible. In order to precisely model the generative process of the acoustic modeling data, we introduce neural language modeling to generate the phonetic state sequences and neural speech synthesis to generate the acoustic feature sequences. Experiments using Japanese speech data sets reveal how close the restored acoustic data is to the original data in terms of ASR performance.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124949432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SINGAN: Singing Voice Conversion with Generative Adversarial Networks 歌唱声音转换与生成对抗网络
Berrak Sisman, K. Vijayan, M. Dong, Haizhou Li
{"title":"SINGAN: Singing Voice Conversion with Generative Adversarial Networks","authors":"Berrak Sisman, K. Vijayan, M. Dong, Haizhou Li","doi":"10.1109/APSIPAASC47483.2019.9023162","DOIUrl":"https://doi.org/10.1109/APSIPAASC47483.2019.9023162","url":null,"abstract":"Singing voice conversion (SVC) is a task to convert the source singer's voice to sound like that of the target singer, without changing the lyrical content. So far, most of the voice conversion studies mainly focus only on the speech voice conversion that is different from singing voice conversion. We note that singing conveys both lexical and emotional information through words and tones. It is one of the most expressive components in music and a means of entertainment as well as self expression. In this paper, we propose a novel singing voice conversion framework, that is based on Generative Adversarial Networks (GANs). The proposed GAN-based conversion framework, that we call SINGAN, consists of two neural networks: a discriminator to distinguish natural and converted singing voice, and a generator to deceive the discriminator. With GAN, we minimize the differences of the distributions between the original target parameters and the generated singing parameters. To our best knowledge, this is the first framework that uses generative adversarial networks for singing voice conversion. In experiments, we show that the proposed method effectively converts singing voices and outperforms the baseline approach.","PeriodicalId":145222,"journal":{"name":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122698943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信