2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献_第2页

Domain Adaptation and Language Conditioning to Improve Phonetic Posteriorgram Based Cross-Lingual Voice Conversion 基于语音后图的跨语言语音转换的领域适应和语言条件调节

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9979918

Pin-Chieh Hsu, N. Minematsu, D. Saito

{"title":"Domain Adaptation and Language Conditioning to Improve Phonetic Posteriorgram Based Cross-Lingual Voice Conversion","authors":"Pin-Chieh Hsu, N. Minematsu, D. Saito","doi":"10.23919/APSIPAASC55919.2022.9979918","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979918","url":null,"abstract":"In this work, we examine two methods for im-proving phonetic posteriorgram (PPG) based cross-lingual voice conversion (CLV C). Previous research usually utilized a speaker encoder to characterize speakers' identity; however, the speaker embedding learned by the previous model tends to be language- dependent, degrading the performance of converted speeches. Therefore, we propose using the technique of domain-adversarial training. With this approach, the speaker embedding in different languages can be adapted into the same distribution to form a language-independent speaker embedding space. The other approach we propose is to employ external language conditioning to support our model to disentangle the language information from the speaker embedding. In our experiments, both methods are evaluated on a Japanese-English bilingual database. Besides subjective evaluation, two automatic objective assessment systems are adopted to assess the quality and speaker similarity of converted utterances. According to the experimental results, the two proposed methods can generate speaker embedding with reduced language dependency and improve the naturalness and speaker similarity of converted speeches.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114278561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Physiological study on the effect of game events in response to player's laughter 游戏事件对玩家笑声反应的生理学研究

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9979868

Mikito Fukuda, Y. Arimoto

{"title":"Physiological study on the effect of game events in response to player's laughter","authors":"Mikito Fukuda, Y. Arimoto","doi":"10.23919/APSIPAASC55919.2022.9979868","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979868","url":null,"abstract":"To investigate whether computer's automatic responses to our emotional expression influences our cognitive and emotional involvement in a virtual world, this study examined to measure the player's physiological reactions to game events presented in response to the players' spontaneous laughter. Participants played two conditional virtual games in our experiments, and their electrocardiogram, electrodermal activity, and facial electromyography (corrugator supercilii muscle and zygomaticus major muscle) were recorded during the games. The experiment consisted of two conditions, namely advantageous event condition and disadvantageous event condition. In the advantageous event condition, the system responded to the player's laughter with an event that benefitted the player. In the disadvantageous event condition, the system responded to the player's laughter with an event that annoyed the player. A three-way analysis of variance was performed using these physiological signals to test the hypothesis that there is time-series variation in physiological responses between both event types and event durations. As a result, a significantly slower heart rate was observed after the presentation of an event in both the advantageous/disadvantageous event conditions. This result suggests that the players paid more attention to the game when any event was generated against their laughter. Moreover, both type of events to the player's laughter more activated electrodermal activity and corrugator supercilii muscle. In particular, the disadvantageous events to the player's laughter more activated corrugator supercilii muscle than the advantageous event. These results suggest that players were more emotionally engaged in the game when they encountered troublesome or fortunate situations while laughing.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128655295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging Pre-Trained Acoustic Feature Extractor For Affective Vocal Bursts Tasks 利用预先训练的声学特征提取器进行情感声乐爆发任务

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980083

Bagus Tris Atmaja, A. Sasou

引用次数: 1

Optimal Deep Multi-Route Self-Attention for Single Image Super-Resolution 单幅图像超分辨率的最优深度多路径自关注

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9979962

Nisawan Ngambenjavichaikul, Sovann Chen, S. Aramvith

{"title":"Optimal Deep Multi-Route Self-Attention for Single Image Super-Resolution","authors":"Nisawan Ngambenjavichaikul, Sovann Chen, S. Aramvith","doi":"10.23919/APSIPAASC55919.2022.9979962","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9979962","url":null,"abstract":"Image restoration, such as single image super-resolution (SISR), is a long-established low-level vision issue that intends to regenerate high-resolution (HR) images from low-resolution (LR) input counterparts. While state-of-the-art image super-resolution models are based on the well-known convolutional neural network (CNN), many self-attention-based or transformer-based experiment attempts have been conducted and have shown promising performance on vision problems. A powerful baseline model based on the swin transformer adopts the shifted window approach. It enhances the capability by restricting the model to compute the self-attention function only on non-superimpose local windows while enabling cross-window relations. However, the architecture design is manually fixed. Therefore, the results are not achieving optimal performance. This paper presents an optimal deep multi-route self-attention network for single image super-resolution (ODMR-SASR). The genetic algorithm (GA) is introduced to discover the optimal number of filters and layers. Experimental results demonstrate that the proposed optimization technique can produce a progressive SR image quality.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"46 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113974158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Clustering of advertising images using electroencephalogram 广告图像的脑电图聚类

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980161

Ingon Chanpornpakdi, Motoi Noda, Toshihisa Tanaka, Yuval Harpaz, A. Geva

{"title":"Clustering of advertising images using electroencephalogram","authors":"Ingon Chanpornpakdi, Motoi Noda, Toshihisa Tanaka, Yuval Harpaz, A. Geva","doi":"10.23919/APSIPAASC55919.2022.9980161","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980161","url":null,"abstract":"Packaging and advertisements of brands affect customers' decision-making on purchasing products and could lead to business loss. Hence, neuromarketing, the application of neuroscience in the marketing field, is introduced aiming to understand customers' cognitive functions toward advertisements or products. Our study focused on identifying how the brain respond to different types of advertising image of the same brand were perceived using electroencephalogram (EEG). We performed an experiment using 33 different Coca-Cola advertising images in RSVP (rapid serial visual presentation) task on 23 participants. A seven channels EEG dry headset was used to record the visual event-related potential (ERP), specifically, the positive peak found at 300 to 700 ms after image onset; P300, to compare the perception response. We applied k-means and hierarchical clustering to the obtained EEG data, and achieved the best clustering for three clusters, yielding different P300 amplitudes and latencies. The typical Coca-Cola ads, red color with Cola-cola text on the ads, induced a faster and larger response, implying better perception than the unconventional or black color ads. We conclude that ERP clustering may be a useful tool for neuromarketing. However, the relationship between the EEG-based cluster and the image-based cluster should be further investigated to confirm the suggestion.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132213783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design and Control of a Muscle-skeleton Robot Elbow based on Reinforcement Learning 基于强化学习的肌肉骨骼机器人肘部设计与控制

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980219

Jianyin Fan, Haoran Xu, Yuwei Du, Jing Jin, Qiang Wang

引用次数: 0

Multi-Branch Network for Few-shot Learning 少射学习的多分支网络

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980160

Kai Ren, Zijie Guo, Zhimin Zhang, Rui Zhu, Xiaoxu Li

{"title":"Multi-Branch Network for Few-shot Learning","authors":"Kai Ren, Zijie Guo, Zhimin Zhang, Rui Zhu, Xiaoxu Li","doi":"10.23919/APSIPAASC55919.2022.9980160","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980160","url":null,"abstract":"Few-shot learning aims provide precise predictions for unseen data through learning from only one or few labelled samples of each class. However, it often suffers from the overfitting problem because of insufficient training data. In this paper, we propose a novel metric-based few-shot learning method, multi-branch network (MBN), with a new data augmentation module to improve the generalization ability of the model. Specifically, we generate different types of noise contaminated data through multiple branches in the network to simulate the real-world scenarios when noisy images are obtained. Following this novel data augmentation module, the feature embedding and similarities between the support and query samples are learned simultaneously through the embedding and metric modules, respectively. Moreover, to consider more details in the feature maps, we propose to utilize the average-pooling layer in the metric module rather than the commonly adopted max-pooling layer. The network is trained from end to end by the Kullback- Leibler (KL) divergence, to minimize the difference between the distributions of the ground truths and predictions. Extensive experiments on Standford-Dogs, Standford-Cars, CUB-200-2011 and mini-ImageNet in the 1-shot and 5-shot tasks demonstrate the superior classification performance of MBN.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131498888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sound Reproduction with a Circular Loudspeaker Array Using Differential Beamforming Method 采用差分波束形成方法的圆形扬声器阵列的声音再现

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980128

Yankai Zhang, Jiayi Mao, Yefeng Cai, C. Ye

引用次数: 1

A Multiframe Super-resolution Pipeline for Sub-image-typed Light Field Data 一种用于子图像类型光场数据的多帧超分辨率管道

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980305

Chien-Han Hsu, Yi-Hsien Lin, Yen-Po Lin, Yi-Chang Lu

引用次数: 0

Table Structure Recognition Based on Grid Shape Graph 基于网格形状图的表结构识别

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI: 10.23919/APSIPAASC55919.2022.9980172

Eunji Lee, Junhyeong Kwon, Haeyoon Yang, Jaewoo Park, Soonyoung Lee, H. Koo, N. Cho

{"title":"Table Structure Recognition Based on Grid Shape Graph","authors":"Eunji Lee, Junhyeong Kwon, Haeyoon Yang, Jaewoo Park, Soonyoung Lee, H. Koo, N. Cho","doi":"10.23919/APSIPAASC55919.2022.9980172","DOIUrl":"https://doi.org/10.23919/APSIPAASC55919.2022.9980172","url":null,"abstract":"Since tables in documents provide important information in compact form, table understanding has been an essential topic in document image processing. Researchers represented table structures in various formats for table understanding, such as simple grid structure, a graph with text/cell boxes as nodes, or a sequence of HTML tokens. However, these approaches have difficulties in handling regularities, e.g., global row and column information, and spanning cells simultaneously. In this paper, we propose a new table recognition method based on a grid shape graph and present grid localization and grid elements grouping networks. This approach is designed to exploit the grid structure and deal with spanning cells. To convert grid structure into cell structure, we only have to test adjacent pairs of grid elements, enabling efficient inference. In addition, we have discovered that predicting row/column-based relationships between grid elements improve cell-based connectivity estimation performance. We demonstrate the effectiveness of the proposed method through experiments on three benchmark datasets.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130872479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1