2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

筛选
英文 中文
Domain Adaptation and Language Conditioning to Improve Phonetic Posteriorgram Based Cross-Lingual Voice Conversion 基于语音后图的跨语言语音转换的领域适应和语言条件调节
Pin-Chieh Hsu, N. Minematsu, D. Saito
{"title":"Domain Adaptation and Language Conditioning to Improve Phonetic Posteriorgram Based Cross-Lingual Voice Conversion","authors":"Pin-Chieh Hsu, N. Minematsu, D. Saito","doi":"10.23919/APSIPAASC55919.2022.9979918","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979918","url":null,"abstract":"In this work, we examine two methods for im-proving phonetic posteriorgram (PPG) based cross-lingual voice conversion (CLV C). Previous research usually utilized a speaker encoder to characterize speakers' identity; however, the speaker embedding learned by the previous model tends to be language- dependent, degrading the performance of converted speeches. Therefore, we propose using the technique of domain-adversarial training. With this approach, the speaker embedding in different languages can be adapted into the same distribution to form a language-independent speaker embedding space. The other approach we propose is to employ external language conditioning to support our model to disentangle the language information from the speaker embedding. In our experiments, both methods are evaluated on a Japanese-English bilingual database. Besides subjective evaluation, two automatic objective assessment systems are adopted to assess the quality and speaker similarity of converted utterances. According to the experimental results, the two proposed methods can generate speaker embedding with reduced language dependency and improve the naturalness and speaker similarity of converted speeches.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114278561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physiological study on the effect of game events in response to player's laughter 游戏事件对玩家笑声反应的生理学研究
Mikito Fukuda, Y. Arimoto
{"title":"Physiological study on the effect of game events in response to player's laughter","authors":"Mikito Fukuda, Y. Arimoto","doi":"10.23919/APSIPAASC55919.2022.9979868","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979868","url":null,"abstract":"To investigate whether computer's automatic responses to our emotional expression influences our cognitive and emotional involvement in a virtual world, this study examined to measure the player's physiological reactions to game events presented in response to the players' spontaneous laughter. Participants played two conditional virtual games in our experiments, and their electrocardiogram, electrodermal activity, and facial electromyography (corrugator supercilii muscle and zygomaticus major muscle) were recorded during the games. The experiment consisted of two conditions, namely advantageous event condition and disadvantageous event condition. In the advantageous event condition, the system responded to the player's laughter with an event that benefitted the player. In the disadvantageous event condition, the system responded to the player's laughter with an event that annoyed the player. A three-way analysis of variance was performed using these physiological signals to test the hypothesis that there is time-series variation in physiological responses between both event types and event durations. As a result, a significantly slower heart rate was observed after the presentation of an event in both the advantageous/disadvantageous event conditions. This result suggests that the players paid more attention to the game when any event was generated against their laughter. Moreover, both type of events to the player's laughter more activated electrodermal activity and corrugator supercilii muscle. In particular, the disadvantageous events to the player's laughter more activated corrugator supercilii muscle than the advantageous event. These results suggest that players were more emotionally engaged in the game when they encountered troublesome or fortunate situations while laughing.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128655295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Pre-Trained Acoustic Feature Extractor For Affective Vocal Bursts Tasks 利用预先训练的声学特征提取器进行情感声乐爆发任务
Bagus Tris Atmaja, A. Sasou
{"title":"Leveraging Pre-Trained Acoustic Feature Extractor For Affective Vocal Bursts Tasks","authors":"Bagus Tris Atmaja, A. Sasou","doi":"10.23919/APSIPAASC55919.2022.9980083","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980083","url":null,"abstract":"Understanding humans' emotions is a challenge for computers. Nowadays, research on speech emotion recognition has been conducted progressively. Instead of a speech, affective information may lay on short vocal bursts (i.e., cry when sad). In this study, we evaluated a recent self-supervised learning model to extract acoustic embedding for affective vocal bursts tasks. There are four tasks investigated on both regression and classification problems. Using similar architectures, we found the effectiveness of using a pre-trained model over the baseline methods. The study is further expanded to evaluate the different number of seeds, patiences, and batch sizes on the performance of the four tasks.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"326 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129445227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Optimal Deep Multi-Route Self-Attention for Single Image Super-Resolution 单幅图像超分辨率的最优深度多路径自关注
Nisawan Ngambenjavichaikul, Sovann Chen, S. Aramvith
{"title":"Optimal Deep Multi-Route Self-Attention for Single Image Super-Resolution","authors":"Nisawan Ngambenjavichaikul, Sovann Chen, S. Aramvith","doi":"10.23919/APSIPAASC55919.2022.9979962","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979962","url":null,"abstract":"Image restoration, such as single image super-resolution (SISR), is a long-established low-level vision issue that intends to regenerate high-resolution (HR) images from low-resolution (LR) input counterparts. While state-of-the-art image super-resolution models are based on the well-known convolutional neural network (CNN), many self-attention-based or transformer-based experiment attempts have been conducted and have shown promising performance on vision problems. A powerful baseline model based on the swin transformer adopts the shifted window approach. It enhances the capability by restricting the model to compute the self-attention function only on non-superimpose local windows while enabling cross-window relations. However, the architecture design is manually fixed. Therefore, the results are not achieving optimal performance. This paper presents an optimal deep multi-route self-attention network for single image super-resolution (ODMR-SASR). The genetic algorithm (GA) is introduced to discover the optimal number of filters and layers. Experimental results demonstrate that the proposed optimization technique can produce a progressive SR image quality.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"46 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113974158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering of advertising images using electroencephalogram 广告图像的脑电图聚类
Ingon Chanpornpakdi, Motoi Noda, Toshihisa Tanaka, Yuval Harpaz, A. Geva
{"title":"Clustering of advertising images using electroencephalogram","authors":"Ingon Chanpornpakdi, Motoi Noda, Toshihisa Tanaka, Yuval Harpaz, A. Geva","doi":"10.23919/APSIPAASC55919.2022.9980161","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980161","url":null,"abstract":"Packaging and advertisements of brands affect customers' decision-making on purchasing products and could lead to business loss. Hence, neuromarketing, the application of neuroscience in the marketing field, is introduced aiming to understand customers' cognitive functions toward advertisements or products. Our study focused on identifying how the brain respond to different types of advertising image of the same brand were perceived using electroencephalogram (EEG). We performed an experiment using 33 different Coca-Cola advertising images in RSVP (rapid serial visual presentation) task on 23 participants. A seven channels EEG dry headset was used to record the visual event-related potential (ERP), specifically, the positive peak found at 300 to 700 ms after image onset; P300, to compare the perception response. We applied k-means and hierarchical clustering to the obtained EEG data, and achieved the best clustering for three clusters, yielding different P300 amplitudes and latencies. The typical Coca-Cola ads, red color with Cola-cola text on the ads, induced a faster and larger response, implying better perception than the unconventional or black color ads. We conclude that ERP clustering may be a useful tool for neuromarketing. However, the relationship between the EEG-based cluster and the image-based cluster should be further investigated to confirm the suggestion.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132213783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and Control of a Muscle-skeleton Robot Elbow based on Reinforcement Learning 基于强化学习的肌肉骨骼机器人肘部设计与控制
Jianyin Fan, Haoran Xu, Yuwei Du, Jing Jin, Qiang Wang
{"title":"Design and Control of a Muscle-skeleton Robot Elbow based on Reinforcement Learning","authors":"Jianyin Fan, Haoran Xu, Yuwei Du, Jing Jin, Qiang Wang","doi":"10.23919/APSIPAASC55919.2022.9980219","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980219","url":null,"abstract":"The muscle-skeleton body structure and learning ability allow natural creatures to adapt to the complex environment. These can also make robots more adaptive in human-robot interaction scenarios. In this work, we implement a humanoid muscle-skeleton robot elbow joint actuated by two antagonistic pneumatic artificial muscles (PAMs). A reinforcement learning algorithm based on soft actor-critic (SAC) is adopted to learn the control policy of the proposed elbow joint. Lower action space and hindsight experience replay (HER) further reduce training time, and the temperature factor is fixed during the training process for small steady-state error. An elbow model is implemented in the simulation to verify the training procedure for our real robot elbow platform. The experimental results show that the RL learning procedure can learn control policies in the robot elbow prototype, and the steady-state error is within 0.64% after 1 s of control time.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"343 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134202147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Branch Network for Few-shot Learning 少射学习的多分支网络
Kai Ren, Zijie Guo, Zhimin Zhang, Rui Zhu, Xiaoxu Li
{"title":"Multi-Branch Network for Few-shot Learning","authors":"Kai Ren, Zijie Guo, Zhimin Zhang, Rui Zhu, Xiaoxu Li","doi":"10.23919/APSIPAASC55919.2022.9980160","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980160","url":null,"abstract":"Few-shot learning aims provide precise predictions for unseen data through learning from only one or few labelled samples of each class. However, it often suffers from the overfitting problem because of insufficient training data. In this paper, we propose a novel metric-based few-shot learning method, multi-branch network (MBN), with a new data augmentation module to improve the generalization ability of the model. Specifically, we generate different types of noise contaminated data through multiple branches in the network to simulate the real-world scenarios when noisy images are obtained. Following this novel data augmentation module, the feature embedding and similarities between the support and query samples are learned simultaneously through the embedding and metric modules, respectively. Moreover, to consider more details in the feature maps, we propose to utilize the average-pooling layer in the metric module rather than the commonly adopted max-pooling layer. The network is trained from end to end by the Kullback- Leibler (KL) divergence, to minimize the difference between the distributions of the ground truths and predictions. Extensive experiments on Standford-Dogs, Standford-Cars, CUB-200-2011 and mini-ImageNet in the 1-shot and 5-shot tasks demonstrate the superior classification performance of MBN.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131498888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sound Reproduction with a Circular Loudspeaker Array Using Differential Beamforming Method 采用差分波束形成方法的圆形扬声器阵列的声音再现
Yankai Zhang, Jiayi Mao, Yefeng Cai, C. Ye
{"title":"Sound Reproduction with a Circular Loudspeaker Array Using Differential Beamforming Method","authors":"Yankai Zhang, Jiayi Mao, Yefeng Cai, C. Ye","doi":"10.23919/APSIPAASC55919.2022.9980128","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980128","url":null,"abstract":"This paper proposes an approach to get frequency invariant, symmetric beampattern using a compact circular loudspeaker array. The Jacobi-Anger expansion method is used to approximate the target beampattern. The simulated performance is compared of the same circular loudspeaker array with and without a rigid baffle. The analytical solution of the weight and the simulation results show that the circular loudspeaker array with a rigid baffle can overcome the null problem confronting the array without a rigid baffle. The minimum-norm filter is used to improve the robustness of the system and maintain the frequency-invariant beampattern over the frequency range of interest.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"20 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131775067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Multiframe Super-resolution Pipeline for Sub-image-typed Light Field Data 一种用于子图像类型光场数据的多帧超分辨率管道
Chien-Han Hsu, Yi-Hsien Lin, Yen-Po Lin, Yi-Chang Lu
{"title":"A Multiframe Super-resolution Pipeline for Sub-image-typed Light Field Data","authors":"Chien-Han Hsu, Yi-Hsien Lin, Yen-Po Lin, Yi-Chang Lu","doi":"10.23919/APSIPAASC55919.2022.9980305","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980305","url":null,"abstract":"Due to the trade-off between spatial and angular resolutions in light field cameras, the obtained resolutions of synthesized 2D images are often far less than those captured by conventional digital cameras using the same image sensor. This work proposes a complete digital image processing pipeline for hand-held light field cameras to generate high-resolution all-in-focus 2D images. The flow contains refined disparity estimation, digital refocusing, and super-resolution stages in which the characteristics of light fields are considered. We adopt the efficient first-order primal-dual algorithm as our optimization tool. The results show that the proposed approach gives better image quality when compared to other existing super-resolution methods.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129393268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Table Structure Recognition Based on Grid Shape Graph 基于网格形状图的表结构识别
Eunji Lee, Junhyeong Kwon, Haeyoon Yang, Jaewoo Park, Soonyoung Lee, H. Koo, N. Cho
{"title":"Table Structure Recognition Based on Grid Shape Graph","authors":"Eunji Lee, Junhyeong Kwon, Haeyoon Yang, Jaewoo Park, Soonyoung Lee, H. Koo, N. Cho","doi":"10.23919/APSIPAASC55919.2022.9980172","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980172","url":null,"abstract":"Since tables in documents provide important information in compact form, table understanding has been an essential topic in document image processing. Researchers represented table structures in various formats for table understanding, such as simple grid structure, a graph with text/cell boxes as nodes, or a sequence of HTML tokens. However, these approaches have difficulties in handling regularities, e.g., global row and column information, and spanning cells simultaneously. In this paper, we propose a new table recognition method based on a grid shape graph and present grid localization and grid elements grouping networks. This approach is designed to exploit the grid structure and deal with spanning cells. To convert grid structure into cell structure, we only have to test adjacent pairs of grid elements, enabling efficient inference. In addition, we have discovered that predicting row/column-based relationships between grid elements improve cell-based connectivity estimation performance. We demonstrate the effectiveness of the proposed method through experiments on three benchmark datasets.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130872479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信