{"title":"PGT: Proposal-guided object tracking","authors":"Han-Ul Kim, Chang-Su Kim","doi":"10.1109/APSIPA.2017.8282318","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282318","url":null,"abstract":"We propose a robust visual tracking system, which refines initial estimates of a base tracker by employing object proposal techniques. First, we decompose the base tracker into three building blocks: representation method, appearance model, and model update strategy. We then design each building block by adopting and improving ideas from recent successful trackers. Second, we propose the proposal-guided tracking (PGT) algorithm. Given proposals generated by an edge-based object proposal technique, we select only the proposals that can improve the result of the base tracker using several cues. Then, we discriminate target proposals from non-target ones, based on the nearest neighbor classification using the target and background models. Finally, we replace the result of the base tracker with the best target proposal. Experimental results demonstrate that proposed PGT algorithm provides excellent results on a visual tracking benchmark.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123744247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A drag-and-drop type human computer interaction technique based on electrooculogram","authors":"S. Ogai, Toshihisa Tanaka","doi":"10.1109/APSIPA.2017.8282126","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282126","url":null,"abstract":"A fundamental limitation of human-computer interaction using electrooculogram (EOG) is a low accuracy of eye tracking performance and the head movement that violates the calibration of the on-monitor gaze coordinates. In this paper, we develop a drag-and-drop type interface with the EOG that can avoid a direct estimation of gaze location and can make users free from the restriction of head movement. To drag a cursor on the screen, the proposed system models the relationship between the amount of eye movement and the EOG amplitude with linear regression. Five subjects participated in the experiment to compare the proposed drag-and-drop type and the conventional direct gaze type interfaces. Performance measures such as efficiency and satisfaction showed the advantage of the proposed method with significant differences (p < 0.05).","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116113181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Narumi Mae, Yoshiki Mitsui, S. Makino, Daichi Kitamura, Nobutaka Ono, Takeshi Yamada, H. Saruwatari
{"title":"Sound source localization using binaural difference for hose-shaped rescue robot","authors":"Narumi Mae, Yoshiki Mitsui, S. Makino, Daichi Kitamura, Nobutaka Ono, Takeshi Yamada, H. Saruwatari","doi":"10.1109/APSIPA.2017.8282292","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282292","url":null,"abstract":"Rescue robots have been developed for search and rescue operations in times of large-scale disasters. Such a robot is used to search for survivors in disaster sites by capturing their voices with its microphone array. However, since the robot has many vibration motors, ego noise is mixed with voices, and it is difficult to differentiate the ego noise from a call for help from a disaster survivor. In our previous works, an ego noise reduction technique that combines a method of blind source separation called independent low-rank matrix analysis and postprocessing for noise cancellation was proposed. In the practical use of this robot, to determine the precise location of survivors, the direction of the observed voice should be estimated after the ego noise reduction process. To achieve this objective, in this study, a new hose-shaped rescue robot with microphone arrays was developed. Moreover, we adapt postfilter called MOSIE to our previous noise reduction method to listen to stereo sound because this robot can record stereo sound. By performing in a simulated disaster site, we confirm that the operator can perceive the direction of a survivor's location by applying a speech enhancement technique combining independent low-rank matrix analysis, noise cancellation, and postfiltering to the observed multichannel noisy signals.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121112812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anand Kumar Mukhopadhyay, I. Chakrabarti, M. Sharad
{"title":"Real-time digitized neural-spike storage scheme in multiple channels for biomedical applications","authors":"Anand Kumar Mukhopadhyay, I. Chakrabarti, M. Sharad","doi":"10.1109/APSIPA.2017.8282256","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282256","url":null,"abstract":"The recording of real time Neural-spikes (N-spikes) into an on-chip memory module is essential for processing the stored information having use in neurological applications like neural spike sorting. Spike sorting is a process used in bio-medical signal processing where incoming real-time spikes are mapped to the neuron from which it originates. In this paper, power and area efficient architectural level storage schemes of digitized N-spikes recorded through multiple channels into a Single Port Random Access Memory (SPRAM) module have been compared. The power dissipation of the proposed storage scheme is in the order of few μW. The architectural level analysis of the schemes has been performed in 0.18μm CMOS process technology using the Synopsys design compiler tool.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126781324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katsuki Inoue, Sunao Hara, M. Abe, Nobukatsu Hojo, Yusuke Ijima
{"title":"An investigation to transplant emotional expressions in DNN-based TTS synthesis","authors":"Katsuki Inoue, Sunao Hara, M. Abe, Nobukatsu Hojo, Yusuke Ijima","doi":"10.1109/APSIPA.2017.8282231","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282231","url":null,"abstract":"In this paper, we investigate deep neural network (DNN) architectures to transplant emotional expressions to improve the expressiveness of DNN-based text-to-speech (TTS) synthesis. DNN is expected to have potential power in mapping between linguistic information and acoustic features. From multispeaker and/or multi-language perspectives, several types of DNN architecture have been proposed and have shown good performances. We tried to expand the idea to transplant emotion, constructing shared emotion-dependent mappings. The following three types of DNN architecture are examined; (1) the parallel model (PM) with an output layer consisting of both speaker- dependent layers and emotion-dependent layers, (2) the serial model (SM) with an output layer consisting of emotion-dependent layers preceded by speaker-dependent hidden layers, (3) the auxiliary input model (AIM) with an input layer consisting of emotion and speaker IDs as well as linguistics feature vectors. The DNNs were trained using neutral speech uttered by 24 speakers, and sad speech and joyful speech uttered by 3 speakers from those 24 speakers. In terms of unseen emotional synthesis, subjective evaluation tests showed that the PM performs much better than the SM and slightly better than the AIM. In addition, this test showed that the SM is the best of the three models when training data includes emotional speech uttered by the target speaker.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125928043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blind speaker counting in highly reverberant environments by clustering coherence features","authors":"Shahab Pasha, Jacob Donley, C. Ritz","doi":"10.1109/APSIPA.2017.8282303","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282303","url":null,"abstract":"This paper proposes the use of the frequency- domain Magnitude Squared Coherence (MSC) between two ad- hoc recordings of speech as a reliable speaker discrimination feature for source counting applications in highly reverberant environments. The proposed source counting method does not require knowledge of the microphone spacing and does not assume any relative distance between the sources and the microphones. Source counting is based on clustering the frequency domain MSC of the speech signals derived from short time segments. Experiments show that the frequency domain MSC is speaker-dependent and the method was successfully used to obtain highly accurate source counting results for up to six active speakers for varying levels of reverberation and microphone spacing.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124137910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint estimation of signal and mutual coupling parameters based on spatially spread polarization sensitive array","authors":"Huiyong Li, Zihui Luo, Julan Xie, Jun Li","doi":"10.1109/APSIPA.2017.8282156","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282156","url":null,"abstract":"A reduced-dimensional MUSIC (RD-MUSIC) algorithm is proposed to reduce the computation of blind joint direction-of-arrival (DOA), polarization and the mutual coupling parameters estimation algorithm based on spatially spread polarization sensitive uniform linear array (ULA). This algorithm works in two steps. In the first step, DOA parameter and polarization parameters are separated from the mutual coupling through matrix transformation, which are then estimated by RD-MUSIC algorithm. In the second step, the mutual coupling coefficients are estimated via eigen-decomposition with modulus constraint. Simulation results show the effectiveness of the proposed method for joint signal parameters and mutual coupling coefficients estimation.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127734047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Development of a multi-modal personal authentication interface","authors":"Sung-Phil Kim, Jae-Hwan Kang, Y. Jo, Ian Oakley","doi":"10.1109/APSIPA.2017.8282125","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282125","url":null,"abstract":"Recent advances have brought biometrie user interfaces such as fingerprint and iris to the users' daily lives. More advanced biometric techniques are on the verge of development and commercialization, with increasing levels of security. This paper presents recent work on the development of a multi-factor personal authentication system. The proposed system is based on unique cognitive responses of a user to predetermined stimuli. Biometric signals such as brain activity are used to measure cognitive responses. The approach to implement such a system and test authentication results are presented. Discussion includes the feasibility of the system as well as potential scenarios of using multi-factor authentication interfaces.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127951713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keisuke Imoto, Nobutaka Ono, M. Niitsuma, Y. Yamashita
{"title":"Online sound structure analysis based on generative model of acoustic feature sequences","authors":"Keisuke Imoto, Nobutaka Ono, M. Niitsuma, Y. Yamashita","doi":"10.1109/APSIPA.2017.8282236","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282236","url":null,"abstract":"We propose a method for the online sound structure analysis based on a Bayesian generative model of acoustic feature sequences, with which the hierarchical generative process of the sound clip, acoustic topic, acoustic word, and acoustic feature is assumed. In this model, it is assumed that sound clips are organized based on the combination of latent acoustic topics, and each acoustic topic is represented by a Gaussian mixture model (GMM) over an acoustic feature space, where the components of the GMM correspond to acoustic words. Since the conventional batch algorithm for learning this model requires a huge amount of calculation, it is difficult to analyze the massive amount of sound data. Moreover, the batch algorithm does not allow us to analyze the sequentially obtained data. Our variational Bayes-based online algorithm for this generative model can analyze the structure of sounds sound clip by sound clip. The experimental results show that the proposed online algorithm can reduce the calculation cost by about 90% and estimate the posterior distributions as efficiently as the conventional batch algorithm.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132423297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuang Shi, Huiyong Li, Dongyuan Shi, Bhan Lam, W. Gan
{"title":"Understanding multiple-input multiple-output active noise control from a perspective of sampling and reconstruction","authors":"Chuang Shi, Huiyong Li, Dongyuan Shi, Bhan Lam, W. Gan","doi":"10.1109/APSIPA.2017.8282013","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282013","url":null,"abstract":"This paper formulates the multiple-input multiple- output active noise control as a spatial sampling and reconstruction problem. With the proposed formulation, the inputs from the reference microphones and the outputs of the antinoise sources are regarded as spatial samples. We show that the proposed formulation is general and can unify the existing control strategies. Three control strategies, for instance, are derived from the proposed formulation and linked to different cost functions in the practical implementation. Finally, simulation results are presented to verify the effectiveness of our analysis.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114895755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}