{"title":"Selection of relevant information to improve Image Classification using Bag of Visual Words","authors":"Eduardo Fidalgo Fernández","doi":"10.5565/REV/ELCVIA.1102","DOIUrl":"https://doi.org/10.5565/REV/ELCVIA.1102","url":null,"abstract":"One of the main challenges in computer vision is image classification. Nowadays the number of images increases exponentially every day; therefore, it is important to classify them in a reliable way. The conventional image classification pipeline usually consists on extracting local image features, encoding them as a feature vector and classify them using a previously created model. With regards to feature codification, the Bag of Words model and its extensions, such as pyramid matching and weighted schemes, have achieved quite good results and have become the state of the art methods. The process as mentioned above is not perfect and computers, as well as humans, may make mistakes in any of the steps, causing a performance drop in classification. Some of the primary sources of error on large-scale image classification are the presence of multiple objects in the image, small or very thin objects, incorrect annotations or fine-grained recognition tasks among others. Based on those problems and the steps of a typical image classification pipeline, the motivation of this PhD thesis was to provide some guidelines to improve the quality of the extracted features to obtain better classification results. The contributions of the PhD thesis demonstrated how a good feature selection can contribute to improving the fine-grained classification, and that there would even be no need to have a big training data set to learn the key features of each class and to predict with good results.","PeriodicalId":38711,"journal":{"name":"Electronic Letters on Computer Vision and Image Analysis","volume":"12 1","pages":"5-8"},"PeriodicalIF":0.0,"publicationDate":"2018-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85277636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uncertainty Theories Based Iris Recognition System","authors":"Bellaaj Majd","doi":"10.5565/REV/ELCVIA.1131","DOIUrl":"https://doi.org/10.5565/REV/ELCVIA.1131","url":null,"abstract":"The performance and robustness of the iris-based recognition systems still suffer from imperfection in the biometric information. This paper makes an attempt to address these imperfections and deals with important problem for real system. We proposed a new method for iris recognition system based on uncertainty theories to treat imperfection iris feature. Several factors cause different types of degradation in iris data such as the poor quality of the acquired pictures, the partial occlusion of the iris region due to light spots, or lenses, eyeglasses, hair or eyelids, and adverse illumination and/or contrast. All of these factors are open problems in the field of iris recognition and affect the performance of iris segmentation, its feature extraction or decision making process, and appear as imperfections in the extracted iris feature. The aim of our experiments is to model the variability and ambiguity in the iris data with the uncertainty theories. This paper illustrates the importance of the use of this theory for modeling or/and treating encountered imperfections. Several comparative experiments are conducted on two subsets of the CASIA-V4 iris image database namely Interval and Synthetic. Compared to a typical iris recognition system relying on the uncertainty theories, experimental results show that our proposed model improves the iris recognition system in terms of Equal Error Rates (EER), Area Under the receiver operating characteristics Curve (AUC) and Accuracy Recognition Rate (ARR) statistics.","PeriodicalId":38711,"journal":{"name":"Electronic Letters on Computer Vision and Image Analysis","volume":"69 1","pages":"29-32"},"PeriodicalIF":0.0,"publicationDate":"2018-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75998594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature extraction algorithms from MRI to evaluate quality parameters on meat products by using data mining","authors":"Daniel Caballero Jorna","doi":"10.5565/REV/ELCVIA.1100","DOIUrl":"https://doi.org/10.5565/REV/ELCVIA.1100","url":null,"abstract":"This thesis proposes a new methodology to determine the quality characteristics of meat products (Iberian loin and ham) in a non-destructive way. For that, new algorithms have been developed to analyze Magnetic Resonance Imaging (MRI), and data mining techniques have been applied on data obtained from the images.The general procedure consists of obtaining MRI of meat products, and applying different computer vision algorithms (texture and fractal approaches, mainly), which allow the extraction of sets of computational features. Figure 1 shows the design of the proposed procedure.To achieve this, different research have been done, based on:high-field and low-field MRI scannersdifferent acquisition sequences: Spin Echo (SE), Gradient Echo (GE) and Turbo 3D (T3D)different texture approaches: Gray Level Co-occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM) and Neighboring Gray Level Dependence Matrix (NGLDM)fractals algorithms: Classical Fractal Algorithm (CFA), Fractal Texture Algorithm (FTA) and One Point Fractal Texture Algorithm (OPFTA)FTA [1] and OPFTA [2] have been developed in this thesis. They allow analyzing MRI images, properly, noting OPFTA for its simplicity and lower computational cost. At the same time, the meat products, Iberian hams and loins, were also analyzed by means of physico-chemical and sensory techniques. Databases were constructed with all these data. Different data mining techniques have been applied on them: deductive (Multiple Linear Regression (MLR)) [3], classification (Decision Trees (DT) and Rules-based Systems (RBS)) [4], and prediction techniques [5-7]. Figure 2 shows the MRI images of fresh and dry-cured Iberian loins (Figure 2A and 2B) and fresh and dry-cured hams (Figure 2C and 2D).The accuracy of the analysis of the quality parameters of Iberian ham and loin is affected by the MRI acquisition sequence, the algorithm used to analyze them and the data mining technique applied. Considering the data mining techniques, MLR and DT are appropriate, respectively, to deduce physico-chemical parameters of hams, and to classify as a function of salt content in hams. Regarding to the predictive technique, MLR could be indicate it allows obtaining equations to determine the physico-chemical characteristics and sensory attributes of Iberian loins and hams with a high degree of reliability, and analyzing the quality of these meat products in a non-destructive, efficient, effective and accurate way.","PeriodicalId":38711,"journal":{"name":"Electronic Letters on Computer Vision and Image Analysis","volume":"64 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2018-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73723872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facial attributes recognition using computer vision to detect drowsiness and distraction in drivers","authors":"Alberto Fernández Villán","doi":"10.5565/rev/elcvia.1134","DOIUrl":"https://doi.org/10.5565/rev/elcvia.1134","url":null,"abstract":"Driving is an activity that requires a high degree of concentration on the part of the person who performs it, since the slightest negligence is sufficient to provoke an accident with the consequent material and/or human losses. According to the most recent study published by the World Health Organization (WHO) in 2013, it was estimated that 1.25 million people died as a result of traffic accidents, whereas between 20 and 50 million did not die but consequences resulted in chronic conditions. Many of these accidents are caused by what is known as inattention. This term encloses different conditions such as distraction and drowsiness, which are, precisely, the ones that cause more fatalities. Many publications and research have tried to set figures indicating the consequences of inattention (and its subtypes), but there is no exact number of the accidents caused by inattention since all these studies have been carried out in different places, different time frames and, therefore, under different conditions. Overall, it has been estimated that inattention causes between 25% and 75% of accidents and near-accidents. A study on drowsiness while driving in ten European countries found that fatigue risks increasing reaction time by 86% and it is the fourth leading cause of death on Spanish roads. Distraction is also a major contributor to fatal accidents in Spain. According to the Directorate General of Traffic (DGT), distraction is the first violation found in fatal accidents, 13.15% of the cases. Overall, considering both distraction and drowsiness, the latest statistics on inattentive driving in Spanish drivers are alarming, appearing as the leading cause of fatalities (36%), well above excessive speed (21%) or alcohol consumption (11%). The reason for this PhD thesis is the direct consequences of the abovementioned figures and its purpose is to provide mechanisms to help reduce driver inattention effects using computer vision techniques. The extraction of facial attributes can be used to detect inattention robustly. Specifically, research establishes a frame of reference to characterize distraction in drivers in order to provide solid foundations for future research [1]. Based on this research [1], an architecture based on the analysis of visual characteristics has been proposed, constructed and validated by using techniques of computer vision and automatic learning for the detection of both distraction and drowsiness [2], integrating several innovative elements in order to operate in a completely autonomous way for the robust detection of the main visual indicators characterizing the driver’s both distraction and drowsiness: (1) a review of the role of computer vision technology applied to the development of monitoring systems to detect distraction [3]; (2) a face processing algorithm based on Local Binary Patterns (LBP) and Support Vector Machine (SVM) to detect facial attributes [4]; (3) detection unit for the presence/absence of the driver using","PeriodicalId":38711,"journal":{"name":"Electronic Letters on Computer Vision and Image Analysis","volume":"76 1","pages":"25-28"},"PeriodicalIF":0.0,"publicationDate":"2018-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83841111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video Processing for Remote Respiration Monitoring","authors":"D. Alinovi","doi":"10.5565/rev/elcvia.1124","DOIUrl":"https://doi.org/10.5565/rev/elcvia.1124","url":null,"abstract":"Monitoring of vital signs is a key tool in medical diagnostics. Among fundamental vital parameters, the Respiratory Rate (RR) plays an important role as indicator of possible pathological events. For this reason, respiration needs to be carefully monitored in order to detect potential signs indicating possible changes of health conditions. In this work, novel techniques for the visualization and analysis of respiration by remote and non-invasive video monitoring, based on the study of breathing-related movements, are proposed. The lack of large video databases, associated with clinical data, essential for performance evaluation and optimization of the video processing-based algorithms, is also addressed; statistical models of respiration and apnea events are proposed together with proper simulators, useful to test the remote monitoring algorithms.","PeriodicalId":38711,"journal":{"name":"Electronic Letters on Computer Vision and Image Analysis","volume":"52 1","pages":"9-12"},"PeriodicalIF":0.0,"publicationDate":"2018-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81611809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detail Enhanced Multi-Exposure Image Fusion Based On Edge Preserving Filters","authors":"Harbinder Singh","doi":"10.5565/REV/ELCVIA.1126","DOIUrl":"https://doi.org/10.5565/REV/ELCVIA.1126","url":null,"abstract":"Recent computational photography techniques play a significant role to overcome the limitation of standard digital cameras for handling wide dynamic range of real-world scenes contain brightly and poorly illuminated areas. In many of such techniques [1,2,3], it is often desirable to fuse details from images captured at different exposure settings, while avoiding visual artifacts. One such technique is High Dynamic Range (HDR) imaging that provides a solution to recover radiance maps from photographs taken with conventional imaging equipment. The process of HDR image composition needs the knowledge of exposure times and Camera Response Function (CRF), which is required to linearize the image data before combining Low Dynamic Range (LDR) exposures into HDR image. One of the long-standing challenges in HDR imaging technology is the limited Dynamic Range (DR) of conventional display devices and printing technology. Due to which these devices are unable to reproduce full DR. Although DR can be reduced by using a tone-mapping, but this comes at an unavoidable trade-off with increased computational cost. Therefore, it is desirable to maximize information content of the synthesized scene from a set of multi-exposure images without computing HDR radiance map and tone-mapping.This research attempts to develop a novel detail enhanced multi-exposure image fusion approach based on texture features, which exploits the edge preserving and intra-region smoothing property of nonlinear diffusion filters based on Partial Differential Equations (PDE). With the captured multi-exposure image series, we first decompose images into Base Layers (BLs) and Detail Layers (DLs) to extract sharp details and fine details, respectively. The magnitude of the gradient of the image intensity is utilized to encourage smoothness at homogeneous regions in preference to inhomogeneous regions. In the next step texture features of the BL to generate a decision mask (i.e., local range) have been considered that guide the fusion of BLs in multi-resolution fashion. Finally, well-exposed fused image is obtained that combines fused BL and the DL at each scale across all the input exposures. The combination of edge-preserving filters with Laplacian pyramid is shown to lead to texture detail enhancement in the fused image.Furthermore, Non-linear adaptive filter is employed for BL and DL decomposition that has better response near strong edges. The texture details are then added to the fused BL to reconstruct a detail enhanced LDR version of the image. This leads to an increased robustness of the texture details while at the same time avoiding gradient reversal artifacts near strong edges that may appear in fused image after DL enhancement.Finally, we propose a novel technique for exposure fusion in which Weighted Least Squares (WLS) optimization framework is utilized for weight map refinement of BLs and DLs, which lead to a new simple weighted average fusion framework. Computationally simple t","PeriodicalId":38711,"journal":{"name":"Electronic Letters on Computer Vision and Image Analysis","volume":"86 1","pages":"13-16"},"PeriodicalIF":0.0,"publicationDate":"2018-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83445023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MMKK++ algorithm for clustering heterogeneous images into an unknown number of clusters","authors":"Dávid Papp, G. Szűcs","doi":"10.5565/REV/ELCVIA.1054","DOIUrl":"https://doi.org/10.5565/REV/ELCVIA.1054","url":null,"abstract":"In this paper we present an automatic clustering procedure with the main aim to predict the number of clusters of unknown, heterogeneous images. We used the Fisher-vector for mathematical representation of the images and these vectors were considered as input data points for the clustering algorithm. We implemented a novel variant of K-means, the kernel K-means++, furthermore the min-max kernel K-means plusplus (MMKK++) as clustering method. The proposed approach examines some candidate cluster numbers and determines the strength of the clustering to estimate how well the data fit into K clusters, as well as the law of large numbers was used in order to choose the optimal cluster size. We conducted experiments on four image sets to demonstrate the efficiency of our solution. The first two image sets are subsets of different popular collections; the third is their union; the fourth is the complete Caltech101 image set. The result showed that our approach was able to give a better estimation for the number of clusters than the competitor methods. Furthermore, we defined two new metrics for evaluation of predicting the appropriate cluster number, which are capable of measuring the goodness in a more sophisticated way, instead of binary evaluation.","PeriodicalId":38711,"journal":{"name":"Electronic Letters on Computer Vision and Image Analysis","volume":"9 1","pages":"30-45"},"PeriodicalIF":0.0,"publicationDate":"2018-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87912577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic Video Concept Detection using Novel Mixed-Hybrid-Fusion Approach for Multi-Label Data","authors":"N. Janwe, K. Bhoyar","doi":"10.5565/REV/ELCVIA.927","DOIUrl":"https://doi.org/10.5565/REV/ELCVIA.927","url":null,"abstract":"The performance of the semantic concept detection method depends on, the selection of the low-level visual features used to represent key-frames of a shot and the selection of the feature-fusion method used. This paper proposes a set of low-level visual features of considerably smaller size and also proposes novel ‘hybrid-fusion’ and ‘mixed-hybrid-fusion’, approaches which are formulated by combining early and late-fusion strategies proposed in the literature. In the initially proposed hybrid-fusion approach, the features from the same feature group are combined using early-fusion before classifier training; and the concept probability scores from multiple classifiers are merged using late-fusion approach to get final detection scores. A feature group is defined as the features from the same feature family such as color moment. The hybrid-fusion approach is refined and the “mixed-hybrid-fusion” approach is proposed to further improve detection rate. This paper presents a novel video concept detection system for multi-label data using a proposed mixed-hybrid-fusion approach. Support Vector Machine (SVM) is used to build classifiers that produce concept probabilities for a test frame. The proposed approaches are evaluated on multi-label TRECVID2007 development dataset. Experimental results show that, the proposed mixed-hybrid-fusion approach performs better than other proposed hybrid-fusion approach and outperforms all conventional early-fusion and late-fusion approaches by large margins with respect to feature set dimensionality and Mean Average Precision (MAP) values.","PeriodicalId":38711,"journal":{"name":"Electronic Letters on Computer Vision and Image Analysis","volume":"10 1","pages":"14-29"},"PeriodicalIF":0.0,"publicationDate":"2017-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88557048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Random Image Matching CAPTCHA System","authors":"H. Hajjdiab","doi":"10.5565/REV/ELCVIA.1036","DOIUrl":"https://doi.org/10.5565/REV/ELCVIA.1036","url":null,"abstract":"Security risks is an important issues and caught the attention of researchers in the area of networks, web development, human computer interaction and software engineering. One main challenge for online systems is to identify whether the users are humans or software robots (bots). While it is natural to provide service to human users, providing service for software robots (bots) comes with many security risks and challenges. Software robots are often used by spammers to create fake online accounts, affect search engine ranking, take part in on-line polls, send out spam or simply waste the resources of the server. In this paper we introduce a visual CAPTCHA technique that is based on generating random images by the computer, the user is then asked to match a feature point between two images (i.e. solve the correspondence problem as defined by the researchers in the computer vision area). The relationship between the two images is based on a randomly generated homography transformation function. The main advantage of our approach compared to other visual CAPTCHA techniques is that we eliminate the need for a database of images while retaining ease of use.","PeriodicalId":38711,"journal":{"name":"Electronic Letters on Computer Vision and Image Analysis","volume":"18 1","pages":"1-13"},"PeriodicalIF":0.0,"publicationDate":"2017-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86006693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recognition of Facial Expressions using Local Mean Binary Pattern","authors":"Mahesh M. Goyani, N. Patel","doi":"10.5565/REV/ELCVIA.1058","DOIUrl":"https://doi.org/10.5565/REV/ELCVIA.1058","url":null,"abstract":"In this paper, we propose a novel appearance based local feature extraction technique called Local Mean Binary Pattern (LMBP), which efficiently encodes the local texture and global shape of the face. LMBP code is produced by weighting the thresholded neighbor intensity values with respect to mean of 3 x 3 patch. LMBP produces highly discriminative code compared to other state of the art methods. The micro pattern is derived using the mean of the patch, and hence it is robust against illumination and noise variations. An image is divided into M x N regions and feature descriptor is derived by concatenating LMBP distribution of each region. We also propose a novel template matching strategy called Histogram Normalized Absolute Difference (HNAD) for comparing LMBP histograms. Rigorous experiments prove the effectiveness and robustness of LMBP operator. Experiments also prove the superiority of HNAD measure over well-known template matching methods such as L2 norm and Chi-Square measure. We also investigated LMBP for facial expression recognition low resolution. The performance of the proposed approach is tested on well-known datasets CK, JAFFE, and TFEID.","PeriodicalId":38711,"journal":{"name":"Electronic Letters on Computer Vision and Image Analysis","volume":"10 1","pages":"54-67"},"PeriodicalIF":0.0,"publicationDate":"2017-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86067954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}