{"title":"Enhancing Boundary for Video Object Segmentation","authors":"Qi Zhang, Xiaoqiang Lu, Yuan Yuan","doi":"10.1145/3271553.3271581","DOIUrl":"https://doi.org/10.1145/3271553.3271581","url":null,"abstract":"Video object segmentation aims to separate objects from background in successive video sequence accurately. It is a challenging task as the huge variance in object regions and similarity between object and background. Among previous methods, inner region of an object can be easily separated from background while the region around object boundary is often classified improperly. To address this problem, a novel video object segmentation method is proposed to enhance the object boundary by integrating video supervoxel into Convolutional Neural Network (CNN) model. Supervoxel is exploited in our method for its ability of preserving spatial details. The proposed method can be divided into four steps: 1) convolutional feature of video is extracted with CNN model; 2) supervoxel feature is constructed through averaging the convolutional features within each supervoxel to preserve spatial details of video; 3) the supervoxel feature and original convolutional feature are fused to construct video representation; 4) a softmax classifier is trained based on video representation to classify each pixel in video. The proposed method is evaluated both on DAVIS and Youtube-Objects datasets. Experimental results show that by considering supervoxel with spatial details, the proposed method can achieve impressive performance for video object segmentation through enhancing object boundary.","PeriodicalId":414782,"journal":{"name":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125980799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. A. Sankar, M. Aiswariya, Dominic Anna Rose, B. Anushree, D. Shree, P. Lakshmipriya, P. S. Sathidevi
{"title":"Speech Sound Classification and Estimation of Optimal Order of LPC Using Neural Network","authors":"M. A. Sankar, M. Aiswariya, Dominic Anna Rose, B. Anushree, D. Shree, P. Lakshmipriya, P. S. Sathidevi","doi":"10.1145/3271553.3271611","DOIUrl":"https://doi.org/10.1145/3271553.3271611","url":null,"abstract":"Speech codec which is an integral part of most of the communication standards consists of a Voice activity detector (VAD) module followed by an encoder that uses Linear Predictive Coding (LPC). These two modules have a lot of potential for improvements that can yield low bit-rates without compromising quality. VAD is used for detecting voice activity in the input signal, which is an important step in achieving high efficiency speech coding. LPC analysis of input speech at an optimal order can assure maximum SNR and thereby perceptual quality while reducing the transmission bit-rate. This paper proposes a novel method to classify speech into Voiced/ Unvoiced/ Silence/ Music/ Background noise (V/UV/S/M/BN) frames and to find optimal order of LPC for each frame using neural network. The speech sound classifier module gives classification of frames into five categories with very high accuracy. Choosing the order predicted by neural network as the optimal LPC order for voiced frames while keeping a low order for unvoiced frames maintains the reconstruction quality and brings down the bit-rate.","PeriodicalId":414782,"journal":{"name":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122376374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Route Reconstruction Method with Spare AP for Wireless Mesh Networks in Disaster Situation","authors":"E. Dorj, K. Kinoshita","doi":"10.1145/3271553.3271559","DOIUrl":"https://doi.org/10.1145/3271553.3271559","url":null,"abstract":"Computer networks are kind of essential infrastructure in modern society and should work even in a disaster situation so that fault-tolerant networks are actively being studied. Basically, disaster information systems, however, are blamed for two main issues such as the lack of their utilization in peacetime and the difficulty for a non-expert to manage them in case of disaster situation. Therefore, we give a special emphasis to development of a roadside edge server for both normal-time and disaster-time through Wi-Fi based wireless mesh network. In large-scale disaster situation, our goal is to figure out a way to reconstruct the mesh network by adding the minimum number of spare APs which satisfies the reachability for all the roadside edge servers to the backbone network. Furthermore, we consider that the only public workers without any experience on wireless communication technologies must decide adequate locations for spare APs and install them. The simulation results prove the effectiveness of the proposed method.","PeriodicalId":414782,"journal":{"name":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128859722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hyperconnectivity by Simultaneous EEG Recordings during Turn-taking","authors":"Tianyu Yang, Yishu Yang, Changle Zhou","doi":"10.1145/3271553.3271589","DOIUrl":"https://doi.org/10.1145/3271553.3271589","url":null,"abstract":"Turn-taking is a common scene in our daily life, however, the neural mechanism behind it is not fully understood yet. Researchers have proposed several theories to explain this phenomenon, and one of these theories is the oscillator model. In this model, the brains of the speaker and the listener are described as two \"oscillators\" and become mutually entrained during turn-taking. EEG hyperscanning is a method for studying two or more individuals simultaneously with the objective of elucidating how co-variations in their neural activity are influenced by their behavioral and social interactions. Turn-taking, as a frequent social interaction, could be investigated with EEG hyperscanning technique. In this paper, we designed an experiment allowing us to simultaneously record the EEG signals of the subjects during turn-taking in conversations, and depicted the method to measure the \"hyperconnectivity\" (functional connectivity between the two brains) by means of Partial Directed Coherence. Our study showed that: (1) there are significant hyperconnectivity links between the speaker and the listener; (2) The hyperconnectivity links mostly direct from the speaker to the listener; (3) Hyperconnectivity links in Beta band are much denser than those in Alpha band; (4) The T8 electrode plays a key role in the hyperconnectivity network.","PeriodicalId":414782,"journal":{"name":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128038489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Utility Tool for Personalised Medicine","authors":"Chetana Gavankar, Aditya Phatak, N. Thakkar, Vaidehi Patel, Bhoomi Pragda, Rutuja Lathkar","doi":"10.1145/3271553.3271562","DOIUrl":"https://doi.org/10.1145/3271553.3271562","url":null,"abstract":"Biomedical research is drowning in data, yet starving for knowledge. As the volume of scientific literature is growing unprecedentedly, revolutionary measures are needed for data management. Accessibility, analysis and mining knowledge from this textual data has become a very important task. One such source is NCBI that houses a series of databases (PubMed) relevant to biotechnology and bio-medicine. It is an important resource for bioinformatics tools and services. In this paper, a system is proposed that encases all the biomedical articles of PubMed as needed by bioinformaticians. Using machine learning and natural language processing, the tool aims at assisting clinicians and biomedical researchers to understand and graphically represent the relevance of gene in a given disease context. It will also support entity-specific bio-curation searches to get a list of most effective drugs for a particular disease. The system is evaluated by using standard information retrieval measures namely, Precision, Recall and F-score to measure the relevance of search results.","PeriodicalId":414782,"journal":{"name":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131195433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Wearable Technologies for Enhanced Soldier Situational Awareness","authors":"C. Korpela, A. Walker","doi":"10.1145/3271553.3271620","DOIUrl":"https://doi.org/10.1145/3271553.3271620","url":null,"abstract":"We present a design and functional prototype of a wearable technology for command and control of a remotely-operated ground vehicle used for intelligence, surveillance, and reconnaissance missions. A novel interface using hand motions, gestures, and a hands-free display allows the operator to control the robot using standard military hand and arm signals. We leverage existing lightweight wearable sensing and feedback mechanisms to allow soldiers the ability to maintain situational awareness while providing instructions to their robotic squad members. This paper presents recent test results of the system and its sensors using the proposed feedback and control mechanisms.","PeriodicalId":414782,"journal":{"name":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131416533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Perceptually Lossless Image Compression with Error Recovery","authors":"C. Kwan, Eric Shang, T. Tran","doi":"10.1145/3271553.3271602","DOIUrl":"https://doi.org/10.1145/3271553.3271602","url":null,"abstract":"In many bandwidth constrained applications, lossless compression may be unnecessary, as only two to three times of compression can be achieved. An alternative way to save bandwidth is to adopt perceptually lossless compression, which can attain eight times or more compression without loss of important information. In this research, our first objective is to compare and select the best compression algorithm in the literature to achieve 8:1 compression ratio with perceptually lossless compression for still images. Our second objective is to demonstrate error concealment algorithms that can handle corrupted pixels due to transmission errors in communication channels. We have clearly achieved the above objectives using realistic images.","PeriodicalId":414782,"journal":{"name":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131902055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative Analysis of Emotion Detection from Facial Expressions and Voice Using Local Binary Patterns and Markov Models: Computer Vision and Facial Recognition","authors":"Kennedy Chengeta","doi":"10.1145/3271553.3271574","DOIUrl":"https://doi.org/10.1145/3271553.3271574","url":null,"abstract":"Emotion detection has been achieved widely in facial and voice recognition separately with considerable success. The 6 emotional categories coming out of the classification include anger, fear, disgust, happiness and surprise. These can be infered from one's facial expressions both in the form of micro and macro expressions. In facial expressions the emotions are derived by feature extracting the facial expressions in different facial poses and classifying the expression feature vectors derived. Similarly automatic classification of a person's speech's affective state has also been used in signal processing to give insights into the nature of emotions. Speech being a critical tool for communication has been used to derive the emotional state of a human being. Different approaches have been successfully used to derive emotional states either in the form of facial expression recognition or speech emotional recognition being used. Less work has looked at fusing the two approaches to see if this improves emotional recognition accuracy. The study analyses the strengths of both and also limitations of either. The study reveals that emotional derivation based on facial expression recognition and acoustic information complement each other and a fusion of the two leads to better performance and results compared to the audio or acoustic recognition alone.","PeriodicalId":414782,"journal":{"name":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114484417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Watermark Extraction under Print-Cam Process Using Wave Atoms Based Blind Digital Watermarking","authors":"Fawad Ahmad, Lee-Ming Cheng","doi":"10.1145/3271553.3271619","DOIUrl":"https://doi.org/10.1145/3271553.3271619","url":null,"abstract":"Digital image watermarking is a data hiding technology, mainly utilized to protect intellectual ownership rights, can be utilized in exciting mobile applications for instantaneous watermark extraction. In this paper, we report the feasibility of wave atom transform (WAT) based blind digital watermarking for watermark detection under print-cam process. We investigate the robustness of WAT domain watermarking, based on sub-blocks mean energies comparison strategy, to read a watermark embedded in a printed image using a mobile phone camera. Watermark robust to printcam process can allow instant copyright verification using a portable device and offer other interesting applications like accessing online resources, linking to personal homepage or instigating a service. We conduct experimental analysis to evaluate WAT based digital watermarking scheme's performance to extract the embedded watermark from a printed domain image using a mobile phone camera. The experimental results show decent resistance of the scheme against print-cam process distortions.","PeriodicalId":414782,"journal":{"name":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122427192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Redundant Dictionary Construction via Genetic Algorithm","authors":"Haipeng Li, C. Zheng, Jucheng Zhang","doi":"10.1145/3271553.3271604","DOIUrl":"https://doi.org/10.1145/3271553.3271604","url":null,"abstract":"Sparse representation of signals based on redundant dictionary is widely used in array signal processing. In this paper, a redundant dictionary construction method via genetic algorithm (GA) is proposed for array signal processing. The problem is formulated as a dictionary selection problem where the dictionary entries are produced by discretizing the angle space. We apply the orthogonality of the entries to evaluate the dictionary according to the Restricted Isometry Property (RIP). GA is used to discretize the angle space which can make the dictionary more orthogonal. Simulation results show that the proposed method can obtain a better division of angle, improving the orthogonality of dictionary effectively, and is suitable for arbitrary observation space compared with commonly used equal angle division and equal sine division.","PeriodicalId":414782,"journal":{"name":"Proceedings of the 2nd International Conference on Vision, Image and Signal Processing","volume":"51 17","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131609652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}