{"title":"Learning support vectors for face verification and recognition","authors":"K. Jonsson, J. Kittler, Yongping Li, Jiri Matas","doi":"10.1109/AFGR.2000.840636","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840636","url":null,"abstract":"The paper studies support vector machines (SVM) in the context of face verification and recognition. Our study supports the hypothesis that the SVM approach is able to extract the relevant discriminatory information from the training data and we present results showing superior performance in comparison with benchmark methods. However, when the representation space already captures and emphasises the discriminatory information (e.g., Fisher's linear discriminant), SVM loose their superiority. The results also indicate that the SVM are robust against changes in illumination provided these are adequately represented in the training data. The proposed system is evaluated on a large database of 295 people obtaining highly competitive results: an equal error rate of 1% for verification and a rank-one error rate of 2% for recognition (or 98% correct rank-one recognition).","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"355 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131691024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic handwriting gestures recognition using hidden Markov models","authors":"Jérôme Martin, Jean-Baptiste Durand","doi":"10.1109/AFGR.2000.840666","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840666","url":null,"abstract":"Hidden Markov models have been successfully employed in speech recognition and, more recently, in sign language interpretation. They seem adequate for visual recognition of gestures. In this paper, two problems often eluded are considered. We propose to use the Bayesian information criterion in order to determine the optimal number of model states. We describe the contribution of continuous models in opposition to symbolic ones. Experiments on handwriting gestures show recognition rate between 88% and 100%.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113966046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Terrillon, H. Fukamachi, S. Akamatsu, M. N. Shirazi
{"title":"Comparative performance of different skin chrominance models and chrominance spaces for the automatic detection of human faces in color images","authors":"J. Terrillon, H. Fukamachi, S. Akamatsu, M. N. Shirazi","doi":"10.1109/AFGR.2000.840612","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840612","url":null,"abstract":"This paper presents an analysis of the performance of two different skin chrominance models and of nine different chrominance spaces for the color segmentation and subsequent detection of human faces in two-dimensional static images. For each space, we use the single Gaussian model based on the Mahalanobis metric and a Gaussian mixture density model to segment faces from scene backgrounds. In the case of the mixture density model, the skin chrominance distribution is estimated by use of the expectation-maximisation (EM) algorithm. Feature extraction is performed on the segmented images by use of invariant Fourier-Mellin moments. A multilayer perceptron neural network (NN), with the invariant moments as the input vector, is then applied to distinguish faces from distractors. With the single Gaussian model, normalized color spaces are shown to produce the best segmentation results, and subsequently the highest rate of face detection. The results are comparable to those obtained with the more sophisticated mixture density model. However, the mixture density model improves the segmentation and face detection results significantly for most of the un-normalized color spaces. Ultimately, we show that, for each chrominance space, the detection efficiency depends on the capacity of each model to estimate the skin chrominance distribution and, most importantly, on the discriminability between skin and \"non-skin\" distributions.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"22 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122434978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gesture modeling and recognition using finite state machines","authors":"P. Hong, Thomas S. Huang, M. Turk","doi":"10.1109/AFGR.2000.840667","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840667","url":null,"abstract":"We propose a state-based approach to gesture learning and recognition. Using spatial clustering and temporal alignment, each gesture is defined to be an ordered sequence of states in spatial-temporal space. The 2D image positions of the centers of the head and both hands of the user are used as features; these are located by a color-based tracking method. From training data of a given gesture, we first learn the spatial information and then group the data into segments that are automatically aligned temporally. The temporal information is further integrated to build a finite state machine (FSM) recognizer. Each gesture has a FSM corresponding to it. The computational efficiency of the FSM recognizers allows us to achieve real-time on-line performance. We apply this technique to build an experimental system that plays a game of \"Simon Says\" with the user.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"86 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127441802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A continuous Chinese sign language recognition system","authors":"Jiyong Ma, Wen Gao, Jiangqin Wu, Chunli Wang","doi":"10.1109/AFGR.2000.840670","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840670","url":null,"abstract":"We describe a system for recognizing both the isolated and continuous Chinese sign language (CSL) using two cybergloves and two 3SAPCE-position trackers as gesture input devices. To get robust gesture features, each joint-angle collected by cybergloves is normalized. The relative position and orientation of the left hand to those of the right hand are proposed as the signer position-independent features. To speed up the recognition process, fast match and frame prediction techniques are proposed. To tackle the epenthesis movement problem, context-dependent models are obtained by the dynamic programming (DP) technique. HMM are utilized to model basic word units. Then we describe training techniques of the bigram language model and the search algorithm used in our baseline system. The baseline system converts sentence level gestures into synthesis speech and gestures of a 3D virtual human synchronously. Experiments show that these techniques are efficient both in recognition speed and recognition performance.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"211 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126096874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Face detection using multi-modal information","authors":"Sang-Hoon Kim, Hyoung-Gon Kim","doi":"10.1109/AFGR.2000.840606","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840606","url":null,"abstract":"This paper proposes an object-oriented face detection method using multi-modal fusion of range, color and motion information. Objects are segmented from a complex background using a stereo disparity histogram that represents the range information of the objects. A matching pixel count (MPC) disparity measure is introduced to enhance the matching accuracy. To detect the facial regions among segmented objects, a skin-color transform technique is used with the general skin color distribution (GSCD) modeled by a 2D Gaussian function in a color synthetic normalization (CSN) color space. The motion detection technique of AWUPC (adaptive weighted unmatched pixel count) is defined on the skin-color transformed image where the adaptive threshold value for the motion detection is determined according to the probability of skin color. AWUPC transforms the input color image into a gray-level image that represents the probability of both the skin color and motion information. The experimental results show that the proposed algorithm can detect a moving human object in various environments such as skin color noise and complex background. It can be useful in MPEG-4 SNHC.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123401600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Masanobu Yamamoto, Y. Ohta, T. Yamagiwa, Katsutoshi Yagishita, H. Yamanaka, Naoto Ohkubo
{"title":"Human action tracking guided by key-frames","authors":"Masanobu Yamamoto, Y. Ohta, T. Yamagiwa, Katsutoshi Yagishita, H. Yamanaka, Naoto Ohkubo","doi":"10.1109/AFGR.2000.840659","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840659","url":null,"abstract":"The model-based approaches for tracking of human bodies in image sequences can be categorised into two types; fitting model to body frame by frame, and accumulating estimated pose displacements in successive frames after model fitting at the initial frame. The latter has an inherent drawback as accumulation of tracking errors while the one has a great advantage as small computational efforts compared with the former. This paper proposes a new method which can correct the tracking errors by propagation from fitting model to body at a few key-frames. The propagation makes it possible to establish tracking of bodies under occlusion. Capturing the actor's motions in real old movies is presented.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114914029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation of sensor calibration in a biometric person recognition framework based on sensor fusion","authors":"Bernhard Fröba, C. Rothe, Christian Küblbeck","doi":"10.1109/AFGR.2000.840682","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840682","url":null,"abstract":"Biometric person authentication is a secure and user-friendly way of identifying persons in a variety of everyday applications. In order to achieve high recognition rates, we propose an audio-visual person recognition system based on voice, lip motion and still image. The combination of these three data sources (called sensor fusion) may be performed in several ways. We present a method for sensor normalization based on statistical sensor properties. We call this procedure sensor calibration. The final decision fusion simplifies to a multiplication or addition of the normalized outputs of each sensor. This approach is evaluated on a large database of 170 people with a total of 6315 recordings.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132744120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple cues used in model-based human motion capture","authors":"T. Moeslund, E. Granum","doi":"10.1109/AFGR.2000.840660","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840660","url":null,"abstract":"Human motion capture has lately been the object of much attention due to commercial interests. A \"touch-free\" computer vision solution to the problem is desirable to avoid the intrusiveness of standard capture devices. The object to be monitored is known a priori which suggests the inclusion of a human model in the capture process. We use a model-based approach known as the analysis-by-synthesis approach. This approach is powerful but has a problem with its potential huge search space. Using multiple cues we reduce the search space by introducing constraints through the 3D locations of salient points and a silhouette of the subject. Both data types are relatively easy to derive and only require limited computational effort so the approach remains suitable for real-time applications. The approach is tested on 3D movements of a human arm and the results show that we successfully can estimate the pose of the arm using the reduced search space.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125502987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From few to many: generative models for recognition under variable pose and illumination","authors":"A. Georghiades, P. Belhumeur, D. Kriegman","doi":"10.1109/AFGR.2000.840647","DOIUrl":"https://doi.org/10.1109/AFGR.2000.840647","url":null,"abstract":"Image variability due to changes in pose and illumination can seriously impair object recognition. This paper presents appearance-based methods which, unlike previous appearance-based approaches, require only a small set of training images to generate a rich representation that models this variability. Specifically, from as few as three images of an object in fixed pose seen under slightly varying but unknown lighting, a surface and an albedo map are reconstructed. These are then used to generate synthetic images with large variations in pose and illumination and thus build a representation useful for object recognition. Our methods have been tested within the domain of face recognition on a subset of the Yale Face Database B containing 4050 images of 10 faces seen under variable pose and illumination. This database was specifically gathered for testing these generative methods. Their performance is shown to exceed that of popular existing methods.","PeriodicalId":360065,"journal":{"name":"Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129258369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}