Maolong Tang, Ming-Ting Sun, Leonardo Seda, J. Swanson, Zhengyou Zhang
{"title":"Measuring Infant's Length with an Image","authors":"Maolong Tang, Ming-Ting Sun, Leonardo Seda, J. Swanson, Zhengyou Zhang","doi":"10.23919/APSIPA.2018.8659482","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659482","url":null,"abstract":"It is important to measure an infant's length regularly to estimate the growth velocity to make sure that the infant is growing normally. Traditionally, measuring an infant's length is performed with an infantometer. However, the infant struggles and cries in the measuring process, and it often needs three persons to position the infant's head, legs, and the boards of the infantometer during the process. Thus, it is not practical for a parent to perform this measurement at home regularly. In this paper, we propose a new approach which allows the measurement of an infant's length using a cellphone picture without the need to position the infant. Our algorithm automatically calculates the 3D positions of the body parts and the total length of the infant with the help of round stickers. The round stickers can be put on the infant's body easily in a few seconds, before the picture is taken. This new technology would make frequent measurements of the infant's length and the tracking of the growth velocity possible.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122557015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Takuya Takahashi, T. Hori, Christoph M. Wilk, S. Sagayama
{"title":"Semi-Supervised NMF in the chroma Domain Applied to Music Harmony Estimation","authors":"Takuya Takahashi, T. Hori, Christoph M. Wilk, S. Sagayama","doi":"10.23919/APSIPA.2018.8659645","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659645","url":null,"abstract":"In this paper, we discuss non-negative matrix factorization (NMF) applied to chroma feature sequences to reduce the chroma-specific noise in chord estimation from music signals using the hidden Markov model (HMM). Even in the case of single pitch sounds, the raw 12-dimensional chroma vectors obtained from the music signal by summing and normalizing the spectrum by octaves often contain irrelevant components such as non-octave overtones falling into different pitch classes and cause inaccuracies in estimation of harmonies. NMF applied to the chroma domain is expected to suppress such chroma components in the NMF activation matrix caused by overtones, and thus “purifies” the noisy chroma vectors. By reducing the dimensionality to 12 dimensions as opposed to NMF applied to the raw spectrum, we expect advantages with respect to statistical robustness as well as computational cost for pitch class estimation of single and multiple tones. We use the “purified” chroma vectors in combination with a harmony progression model based on an HMM where the NMF activation distributions are modeled as observations associated with hidden harmonies, whose transition probabilities have been obtained statistically. We attempt to improve harmony estimation accuracy by combining suppression of irrelevant components and the HMM-based harmony model. In the experimental evaluation, we demonstrate the reduction of irrelevant components in raw chroma vectors computed from recordings of musical instruments. In addition, using music audio data with harmony annotation from the RWC database, we compare the harmony estimation accuracies using our method and conventional chroma.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126170805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"APSIPA ASC 2018 Organizing Committee","authors":"","doi":"10.23919/apsipa.2018.8659710","DOIUrl":"https://doi.org/10.23919/apsipa.2018.8659710","url":null,"abstract":"","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125927499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time Background Subtraction via L1 Norm Tensor Decomposition","authors":"Taehyeong Kim, Yoonsik Choe","doi":"10.23919/APSIPA.2018.8659727","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659727","url":null,"abstract":"Currently, background subtraction is being actively studied in many image processing applications. Nuclear Norm Minimization (NNM) and Weighted Nuclear Norm Minimization (WNNM) are commonly used background subtraction methods based on Robust Principal Component Analysis (RPCA). However, these techniques approximate the RPCA rank function and take the form of an iterative optimization algorithm. Therefore, due to the approximation, the NNM solution can not converge if the number of frames is small. In addition, the NNM and WNNM processing times are delayed because of their iterative optimization schemes. Thus, NNM and WNNM are not suitable for real-time background subtraction. In order to overcome these limitations, this paper presents a real-time background subtraction method using tensor decomposition in accordance with the recent tensor analysis research trend. In this study, we used the closed form TUCKER2 decomposition solution to omit the iterative process while retaining the L1 norm of the RPCA rank function. This proposed method allows for convergence even when the number of frames is small. Compared to NNM and WNNM, the proposed method reduces the processing time by more than 80 times and has a higher precision even when the number of frames are less than 10.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129783570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A deep learning based framework for converting sign language to emotional speech","authors":"Nan Song, Hongwu Yang, Pengpeng Zhi","doi":"10.23919/APSIPA.2018.8659571","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659571","url":null,"abstract":"This paper proposes a framework for converting sign language to emotional speech by deep learning. We firstly adopt a deep neural network (DNN) model to extract the features of sign language and facial expression. Then we train two support vector machines (SVM) to classify the sign language and facial expression for recognizing the text of sign language and emotional tags of facial expression. We also train a set of DNN-based emotional speech acoustic models by speaker adaptive training with an multi-speaker emotional speech corpus. Finally, we select the DNN-based emotional speech acoustic models with emotion tags to synthesize emotional speech from the text recognized from the sign language. Objective tests show that the recognition rate for static sign language is 90.7%. The recognition rate of facial expression achieves 94.6% on the extended Cohn-Kanade database (CK+) and 80.3% on the Japanese Female Facial Expression (JAFFE) database respectively. Subjective evaluation demonstrates that synthesized emotional speech can get 4.2 of the emotional mean opinion score. The pleasure-arousal-dominance (PAD) tree dimensional emotion model is employed to evaluate the PAD values for both facial expression and synthesized emotional speech. Results show that the PAD values of facial expression are close to the PAD values of synthesized emotional speech. This means that the synthesized emotional speech can express the emotions of facial expression.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129258435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Agglomerative Hierarchical Clustering of Basis Vector for Monaural Sound Source Separation Based on NMF","authors":"Kenta Murai, Taiho Takeuchi, Y. Tatekura","doi":"10.23919/APSIPA.2018.8659766","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659766","url":null,"abstract":"This paper proposes a method of monaural sound source separation by clustering based on the similarity of basis vectors decomposed by Non-negative Matrix Factorization (NMF). In the proposed method, the basis vectors are clustered on the assumption that the similarity between the basis vectors constituting the target sound source is higher than the similarity with the basis vectors of the other sound sources. Hierarchical clustering, which forms clusters in descending order of feature similarity, is introduced. Since it is unnecessary to explicitly determine the number of clusters in hierarchical clustering, hierarchical clustering can be classified into an optional number of clusters according to the threshold. Therefore, the proposed method can separate to an optional number of sound sources. From the numerical evaluation result, it was found that the Signal to Distortion Ratio (SDR), which is an evaluation index of sound source separation, can be improved by approximately 6 to 10 dB. Undesirable cases in which most of the basis vectors are classified into the same cluster are also discussed. In addition, sound source separation with mixed three mixed sound sources was also evaluated, and it was confirmed that SDR can be improved by about 10 dB.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128289043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Study on HDR/WCG Service Model for UHD Service","authors":"Juhan Bae, Jeongyeon Lim, So-Ki Jung","doi":"10.23919/APSIPA.2018.8659594","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659594","url":null,"abstract":"In recent years, as people's interest in high-quality media services and technology development have increased, not only methods producing content supporting technologies such as High Dynamic Range (HDR) / Wide Color Gamut (WCG), but also converting technologies from existing contents to the one satisfying high-quality media standards is also widely studied and attracting attention. In this paper, we propose a HDR/WCG content service model for commercialized IPTV service based on head-end processing media conversion. We also suggest commercial high-quality media services over content-platform-network-device area.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124959799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Teng Li, Jianfeng Ma, Qingqi Pei, Yulong Shen, Cong Sun
{"title":"Log-based Anomalies Detection of MANETs Routing with Reasoning and Verification","authors":"Teng Li, Jianfeng Ma, Qingqi Pei, Yulong Shen, Cong Sun","doi":"10.23919/APSIPA.2018.8659549","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659549","url":null,"abstract":"Routing security plays an important role in Mobile Ad hoc Networks (MANETs). Despite many attempts to improve its security, the routing procedure of MANETs remains vulnerable to attacks. Existing approaches offer support for detecting attacks or debugging in different routing phases, but many of them have not considered the privacy of the nodes during the anomalies detection, which depend on the central control program or a third party to supervise the whole network. In this paper, we present an approach called LAD which uses the raw logs of routers to construct control a flow graph and find the existing communication rules in MANETs. With the reasoning rules, LAD can detect both active and passive attacks launched during the routing phase. LAD can also protect the privacy of the nodes in the verification phase with the specific Merkle hash tree. Without deploying any special nodes to assist the verification, LAD can detect multiple malicious nodes by itself. To show that our approach can be used to guarantee the security of the MANETs, we deploy our experiment in NS3 as well as the practical router environment. LAD can improve the accuracy rate from 2.28% to 29.22%. The results show that LAD performs limited time and memory usages, high detection and low false positives.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116514267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"[Copyright notice]","authors":"","doi":"10.23919/apsipa.2018.8659690","DOIUrl":"https://doi.org/10.23919/apsipa.2018.8659690","url":null,"abstract":"","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116325743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating Co-Prime Microphone Arrays for Speech Direction of Arrival Estimation","authors":"Jiahong Zhao, C. Ritz","doi":"10.23919/APSIPA.2018.8659626","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659626","url":null,"abstract":"This paper investigates the application of the steered response power - phase transform (SRP-PHAT) method to coprime microphone array (CPMA) recordings to estimate the direction of arrival (DOA) of speech sources. While existing CPMA approaches for acoustics applications are limited, especially under reverberant conditions, the proposed algorithm utilises SRP-PHAT to estimate the DOA of speech sources and then employs a histogram-based stochastic algorithm using steered response power (SRP) adjustment and kernel density evaluation (KDE) to improve the DOA estimation accuracy. Experiments are conducted for up to three simultaneous speech sources in the far field considering both anechoic and reverberant scenarios. Results suggest that the proposed approach achieves more accurate DOA estimates than a uniform linear array (ULA) with the same number of microphones under both anechoic and low reverberant conditions, and it significantly decreases the number of microphones of another equivalent ULA while maintaining similar performances. Moreover, the operating frequency of the microphone array is largely increased without changing the number of microphones, making it possible to accurately record higher-frequency components of source signals.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127104814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}