{"title":"Detection of user's body movement for binaural hearing aids to control of directivity","authors":"Y. Chisaki, Shogo Tanaka, T. Usagawa","doi":"10.1109/APSIPA.2013.6694300","DOIUrl":"https://doi.org/10.1109/APSIPA.2013.6694300","url":null,"abstract":"Estimation of sound source directions and separation of the sound sources are implemented on many products widely, and one of the applications is binaural hearing aids. In conversation using binaural hearing aids, continuous tracking of sound sources with acoustics signals are sometimes complicated because sound sources move dynamically. In order to make the tracking of sound sources simple, it is considered to be helpful to use non-verbal information in communication. Since user's body movement, including a head, corresponds to speakers' positions, it is possible to estimate communication zone where sound sources locate by the head direction. In this paper, a head movement in conversation, as non-verbal information in communication, is focused, and two zone detection methods are discussed. A rotational angle of head movement is estimated by both acceleration by an accelerometer and angular velocity by angular velocity sensor which is attached to left ear position. The classification of spatial communication zone is performed by two methods, the threshold method and the support vector machine (SVM). As the results, performance on estimation of the target direction by the threshold-based method was slightly better than that by the SVM-based method.","PeriodicalId":154359,"journal":{"name":"2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131319928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BFI-based speaker personality perception using acoustic-prosodic features","authors":"Chia-Jui Liu, Chung-Hsien Wu, Yu-Hsien Chiu","doi":"10.1109/APSIPA.2013.6694234","DOIUrl":"https://doi.org/10.1109/APSIPA.2013.6694234","url":null,"abstract":"This paper presents an approach to automatic prediction of the traits the listeners attribute to a speaker they never heard before. In previous research, the Big Five Inventory (BFI), one of the most widely used questionnaires, is adopted for personality assessment. Based on the BFI, in this study, an artificial neural network (ANN) is adopted to project the input speech segment to the BFI space based on acoustic-prosodic features. Personality trait is then predicted by estimating the BFI scores obtained from the ANN. For performance evaluation, the BFI with two versions (one is a complete questionnaire and the other is a simplified version) were adopted. The experiments were performed over a corpus of 535 speech samples assessed in terms of personality traits by experienced subjects. The results show that the proposed method for predicting the trait is efficient and effective and the prediction accuracy rate can achieve 70%.","PeriodicalId":154359,"journal":{"name":"2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124499843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cell selection using distributed Q-learning in heterogeneous networks","authors":"Toshihito Kudo, T. Ohtsuki","doi":"10.1109/APSIPA.2013.6694368","DOIUrl":"https://doi.org/10.1109/APSIPA.2013.6694368","url":null,"abstract":"Cell selection with cell range expansion (CRE) that is a technique to expand a pico cell range virtually by adding a bias value to the pico received power, instead of increasing transmit power of the pico base station (PBS), can make coverage, cell-edge throughput, and overall network throughput improved. Many studies about CRE have used a common bias value among all user equipments (UEs), while the optimal bias values that minimize the number of UE outages vary from one UE to another. The optimal bias value that minimizes the number of UE outages depends on several factors such as the dividing ratio of radio resources between macro base stations (MBSs) and PBSs, it is given only by the trial and error method. In this paper, we propose a scheme to select a cell by using Q-learning algorithm where each UE learns which cell to select to minimize the number of UE outages from its past experience independently. Simulation results show that, compared to the practical common bias value setting, the proposed scheme reduces the number of UE outages and improves network throughput in the most cases. Moreover, instead of the degradation of the performances, it also solves the storage problem of our previous work.","PeriodicalId":154359,"journal":{"name":"2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131252943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Green cooperative relaying in multi-source wireless networks with high throughput and fairness provisioning","authors":"Kuan-Yu Lin, K. Liu","doi":"10.1109/APSIPA.2013.6694255","DOIUrl":"https://doi.org/10.1109/APSIPA.2013.6694255","url":null,"abstract":"Motivated by the urgent need of green communications, this paper investigates energy-efficient cooperative relaying methods for multi-source multi-relay wireless networks. Existing cooperative relaying schemes primarily focus on single-source cooperative networks and aim to maximize diversity gain exploitation, yet ignore the extra energy consumption used by relay nodes and fairness between source nodes. Instead, our object is to minimize relay power consumption and maintain network-wide fairness without throughput penalty. The considered problem includes two parts, namely source scheduling and relay assignment that are addressed separately. We derive the feasible condition for the green source-relay assignment problem and show that it is NP-hard. We propose a heuristic algorithm that deliver good performance with low complexity. Simulation results are presented to evaluate the efficacy of the proposed scheme in terms of average throughput, throughput fairness, average relay power consumption, and average outage probability, as compared to two related schemes, under both independent and identically distributed (i.i.d.) and independent and non-identically distributed (i.n.d.) channel configurations.","PeriodicalId":154359,"journal":{"name":"2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130507720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On adaptivity of online model selection method based on multikernel adaptive filtering","authors":"M. Yukawa, R. Ishii","doi":"10.1109/APSIPA.2013.6694329","DOIUrl":"https://doi.org/10.1109/APSIPA.2013.6694329","url":null,"abstract":"We investigate adaptivity of the online model selection method which has been proposed recently within the multikernel adaptive filtering framework. Specifically, we consider a situation in which the nonlinear system under study changes during adaptation and an appropriate kernel also does accordingly. Our time-varying cost functions involve three regularizers: the ℓ1 norm and two block ℓ1 norms which promote sparsity both in the kernel and data groups. The block ℓ1 regularizers are approximated by their Moreau envelopes, and the adaptive proximal forward-backward splitting (APFBS) method is applied to the approximated cost function. Numerical examples show that the proposed algorithm can adaptively estimate a reasonable model.","PeriodicalId":154359,"journal":{"name":"2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131406627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Affective-cognitive dialogue act detection in an error-aware spoken dialogue system","authors":"Wei-Bin Liang, Chung-Hsien Wu, Meng-Hsiu Sheng","doi":"10.1109/APSIPA.2013.6694233","DOIUrl":"https://doi.org/10.1109/APSIPA.2013.6694233","url":null,"abstract":"This paper presents an approach to affective-cognitive dialogue act detection in a spoken dialogue. To achieve this goal, the input utterance is decoded as the affective state by an emotion recognizer and a word sequence by an imperfect speech recognizer separately. Besides, four types of evidences are employed to grade the score of each recognized word. The recognized word sequence is used to derive the candidate sentences to alleviate the problem of unexpected language usage for the cognitive state predicted by the vector space-based dialogue act detection. The Boltzmann selection based method is then employed to predict the next possible act in the spoken dialogue system according to the affective-cognitive states. A model of affective anticipatory reward that is assumed to arise from the emotional seeking system is adopted for enhancing the efficacy of dialogue act detection. Finally, the evaluation data are collected and the experimental results confirm the improved performance of the proposed approach compared to the baseline system on the task completion rate.","PeriodicalId":154359,"journal":{"name":"2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115869053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint texture-depth pixel inpainting of disocclusion holes in virtual view synthesis","authors":"S. Reel, Gene Cheung, P. Wong, L. Dooley","doi":"10.1109/APSIPA.2013.6694249","DOIUrl":"https://doi.org/10.1109/APSIPA.2013.6694249","url":null,"abstract":"Transmitting texture and depth maps from one or more reference views enables a user to freely choose virtual viewpoints from which to synthesize images for observation via depth-image-based rendering (DIBR). In each DIBR-synthesized image, however, there remain disocclusion holes with missing pixels corresponding to spatial regions occluded from view in the reference images. To complete these holes, unlike previous schemes that rely heavily (and unrealistically) on the availability of a high-quality depth map in the virtual view for inpainting of the corresponding texture map, in this paper a new Joint Texture-Depth Inpainting (JTDI) algorithm is proposed that simultaneously fill in missing texture and depth pixels. Specifically, we first use available partial depth information to compute priority terms to identify the next target pixel patch in a disocclusion hole for inpainting. Then, after identifying the best-matched texture patch in the known pixel region via template matching for texture inpainting, the variance of the corresponding depth patch is copied to the target depth patch for depth inpainting. Experimental results show that JTDI outperforms two previous inpainting schemes that either does not use available depth information during inpainting, or depends on the availability of a good depth map at the virtual view for good inpainting performance.","PeriodicalId":154359,"journal":{"name":"2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133995464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speech recognition under noisy environments using multiple microphones based on asynchronous and intermittent measurements","authors":"Kohei Machida, A. Ito","doi":"10.1109/APSIPA.2013.6694362","DOIUrl":"https://doi.org/10.1109/APSIPA.2013.6694362","url":null,"abstract":"We propose a robust speech recognition method under noisy environments using multiple microphones based on asynchronous and intermittent observation. In asynchronous and intermittent observation, the noise spectrum is estimated by the environmental noise observed in fragments from multiple microphones, and spectral subtraction is performed by this estimated noise spectrum. In this paper, we consider the case of estimating the noise spectrum from the noise observed by another microphone just before speech input. However, the noise spectrum needs to be compensated because of the difference in the location of the microphone in this case. Then, we examined compensating the noise spectrum by using the estimated LSFL on the log spectrum. By compensating the noise spectrum, the recognition rate improved compared with the case without compensation.","PeriodicalId":154359,"journal":{"name":"2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115310003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Framework of ubiquitous healthcare system based on cloud computing for elderly living","authors":"Yang-Yen Ou, Po-Yi Shih, Yu-Hao Chin, Ta-Wen Kuan, Jhing-Fa Wang, S. Shih","doi":"10.1109/APSIPA.2013.6694298","DOIUrl":"https://doi.org/10.1109/APSIPA.2013.6694298","url":null,"abstract":"This work discusses a integrating framework on health care, home safety, convenience and entertainment for elderly living. Since the increasing number of elderly people will live alone, yet requires more smart and automation home services, the proposed framework attempts to develop a smart living helper to improve their later life. Two systems are proposed in the Ubiquitous Healthcare System (UHS) framework including, the Web-based User Remote Management Service (WURMS) and the Multimodal Interactive Computation Services (MICS). The proposed systems are coordinating couple existing audio-visual and communication techniques, including the speech/sound recognition, the speaker identification, the face identification, the sound source estimation, the text to speech (TTS) and the event recognition. For elders' friendly interfaces, the proposed services include, (1) Home care services, (2) Emergency assistance, (3) Family interaction, (4) Remote medical services, (5) Security monitoring, and (6) Information services, to improve the elders' life in more convenience.","PeriodicalId":154359,"journal":{"name":"2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference","volume":"272 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115667240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human segmentation algorithm for real-time video-call applications","authors":"Seon Heo, H. Koo, Hong Il Kim, N. Cho","doi":"10.1109/APSIPA.2013.6694320","DOIUrl":"https://doi.org/10.1109/APSIPA.2013.6694320","url":null,"abstract":"This paper presents a human region segmentation algorithm for real-time video-call applications. Unlike conventional methods, the segmentation process is automatically initialized and the motion of cameras is not restricted. To be precise, our method is initialized by face detection results and human/background regions are modeled with spatial color Gaussian mixture models (SCGMMs). Based on the SCGMMs, we build a cost function considering spatial and color distributions of pixels, region smoothness, and temporal coherence. Here, the temporal coherence term allows us to have stable segmentation results. The cost function is minimized by the well-known graphcut algorithm and we update our SCGMM models with the segmentation results. Experimental results have shown that our method yields stable segmentation results with a small amount of computation load.","PeriodicalId":154359,"journal":{"name":"2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125909058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}