{"title":"Subject Guided Eye Image Synthesis with Application to Gaze Redirection.","authors":"Harsimran Kaur, Roberto Manduchi","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We propose a method for synthesizing eye images from segmentation masks with a desired style. The style encompasses attributes such as skin color, texture, iris color, and personal identity. Our approach generates an eye image that is consistent with a given segmentation mask and has the attributes of the input style image. We apply our method to data augmentation as well as to gaze redirection. The previous techniques of synthesizing real eye images from synthetic eye images for data augmentation lacked control over the generated attributes. We demonstrate the effectiveness of the proposed method in synthesizing realistic eye images with given characteristics corresponding to the synthetic labels for data augmentation, which is further useful for various tasks such as gaze estimation, eye image segmentation, pupil detection, etc. We also show how our approach can be applied to gaze redirection using only synthetic gaze labels, improving the previous state of the art results. The main contributions of our paper are i) a novel approach for Style-Based eye image generation from segmentation mask; ii) the use of this approach for gaze-redirection without the need for gaze annotated real eye images.</p>","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":" ","pages":"11-20"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8040934/pdf/nihms-1648872.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25606536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hand-Priming in Object Localization for Assistive Egocentric Vision.","authors":"Kyungjun Lee, Abhinav Shrivastava, Hernisa Kacorri","doi":"10.1109/wacv45572.2020.9093353","DOIUrl":"10.1109/wacv45572.2020.9093353","url":null,"abstract":"<p><p>Egocentric vision holds great promises for increasing access to visual information and improving the quality of life for people with visual impairments, with object recognition being one of the daily challenges for this population. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users often tend to include their hand either interacting with the object that they wish to recognize or simply placing it in proximity for better camera aiming. We propose localization models that leverage the presence of the hand as the contextual information for priming the center area of the object of interest. In our approach, hand segmentation is fed to either the entire localization network or its last convolutional layers. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves higher precision than other approaches, such as fine-tuning, multi-class, and multi-task learning, which also encode hand-object interactions in localization.</p>","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"2020 ","pages":"3411-3421"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7423407/pdf/nihms-1609047.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38269367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Template-Based Non-Rigid Motion Tracking Using Local Coordinate Regularization.","authors":"Wei Li, Shang Zhao, Xiao Xiao, James K Hahn","doi":"10.1109/wacv45572.2020.9093533","DOIUrl":"10.1109/wacv45572.2020.9093533","url":null,"abstract":"<p><p>In this paper, we propose our template-based non-rigid registration algorithm to address the misalignments in the frame-to-frame motion tracking with single or multiple commodity depth cameras. We analyze the deformation in the local coordinates of neighboring nodes and use this differential representation to formulate the regularization term for the deformation field in our non-rigid registration. The local coordinate regularizations vary for each pair of neighboring nodes based on the tracking status of the surface regions. We propose our tracking strategies for different surface regions to minimize misalignments and reduce error accumulation. This method can thus preserve local geometric features and prevent undesirable distortions. Moreover, we introduce a geodesic-based correspondence estimation algorithm to align surfaces with large displacements. Finally, we demonstrate the effectiveness of our proposed method with detailed experiments.</p>","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"2020 ","pages":"390-399"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/wacv45572.2020.9093533","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38031947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Generative Models of Tissue Organization with Supervised GANs.","authors":"Ligong Han, Robert F Murphy, Deva Ramanan","doi":"10.1109/WACV.2018.00080","DOIUrl":"https://doi.org/10.1109/WACV.2018.00080","url":null,"abstract":"<p><p>A key step in understanding the spatial organization of cells and tissues is the ability to construct generative models that accurately reflect that organization. In this paper, we focus on building generative models of electron microscope (EM) images in which the positions of cell membranes and mitochondria have been densely annotated, and propose a two-stage procedure that produces realistic images using Generative Adversarial Networks (or GANs) in a supervised way. In the first stage, we synthesize a label \"image\" given a noise \"image\" as input, which then provides supervision for EM image synthesis in the second stage. The full model naturally generates label-image pairs. We show that accurate synthetic EM images are produced using assessment via (1) shape features and global statistics, (2) segmentation accuracies, and (3) user studies. We also demonstrate further improvements by enforcing a reconstruction loss on intermediate synthetic labels and thus unifying the two stages into one single end-to-end framework.</p>","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"2018 ","pages":"682-690"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/WACV.2018.00080","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36458507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast, accurate, small-scale 3D scene capture using a low-cost depth sensor.","authors":"Nicole Carey, Radhika Nagpal, Justin Werfel","doi":"10.1109/WACV.2017.146","DOIUrl":"https://doi.org/10.1109/WACV.2017.146","url":null,"abstract":"<p><p>Commercially available depth sensing devices are primarily designed for domains that are either macroscopic, or static. We develop a solution for fast microscale 3D reconstruction, using off-the-shelf components. By the addition of lenses, precise calibration of camera internals and positioning, and development of bespoke software, we turn an infrared depth sensor designed for human-scale motion and object detection into a device with mm-level accuracy capable of recording at up to 30Hz.</p>","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"2017 ","pages":"1268-1276"},"PeriodicalIF":0.0,"publicationDate":"2017-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/WACV.2017.146","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35279256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Veda Murthy, Le Hou, Dimitris Samaras, Tahsin M Kurc, Joel H Saltz
{"title":"Center-Focusing Multi-task CNN with Injected Features for Classification of Glioma Nuclear Images.","authors":"Veda Murthy, Le Hou, Dimitris Samaras, Tahsin M Kurc, Joel H Saltz","doi":"10.1109/WACV.2017.98","DOIUrl":"10.1109/WACV.2017.98","url":null,"abstract":"<p><p>Classifying the various shapes and attributes of a glioma cell nucleus is crucial for diagnosis and understanding of the disease. We investigate the automated classification of the nuclear shapes and visual attributes of glioma cells, using Convolutional Neural Networks (CNNs) on pathology images of automatically segmented nuclei. We propose three methods that improve the performance of a previously-developed semi-supervised CNN. First, we propose a method that allows the CNN to focus on the most important part of an image-the image's center containing the nucleus. Second, we inject (concatenate) pre-extracted VGG features into an intermediate layer of our Semi-Supervised CNN so that during training, the CNN can learn a set of additional features. Third, we separate the losses of the two groups of target classes (nuclear shapes and attributes) into a single-label loss and a multi-label loss in order to incorporate prior knowledge of inter-label exclusiveness. On a dataset of 2078 images, the combination of the proposed methods reduces the error rate of attribute and shape classification by 21.54% and 15.07% respectively compared to the existing state-of-the-art method on the same dataset.</p>","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"2017 ","pages":"834-841"},"PeriodicalIF":0.0,"publicationDate":"2017-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5988234/pdf/nihms969223.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36205043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recognition of 3D package shapes for single camera metrology","authors":"Ryan Lloyd, Scott McCloskey","doi":"10.1109/WACV.2014.6836113","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836113","url":null,"abstract":"Many applications of 3D object measurement have become commercially viable due to the recent availability of low-cost range cameras such as the Microsoft Kinect. We address the application of measuring an object's dimensions for the purpose of billing in shipping transactions, where high accuracy is required for certification. In particular, we address cases where an object's pose reduces the accuracy with which we can estimate dimensions from a single camera. Because the class of object shapes is limited in the shipping domain, we perform a closed-world recognition in order to determine a shape model which can account for missing parts, and/or to induce the user to reposition the object for higher accuracy. Our experiments demonstrate that the addition of this recognition step significantly improves system accuracy.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"41 1","pages":"99-106"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88394086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheng Chen, Zhongyuan Feng, Qingkai Lu, Behrooz Mahasseni, Trevor Fiez, Alan Fern, S. Todorovic
{"title":"Play type recognition in real-world football video","authors":"Sheng Chen, Zhongyuan Feng, Qingkai Lu, Behrooz Mahasseni, Trevor Fiez, Alan Fern, S. Todorovic","doi":"10.1109/WACV.2014.6836040","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836040","url":null,"abstract":"This paper presents a vision system for recognizing the sequence of plays in amateur videos of American football games (e.g. offense, defense, kickoff, punt, etc). The system is aimed at reducing user effort in annotating football videos, which are posted on a web service used by over 13,000 high school, college, and professional football teams. Recognizing football plays is particularly challenging in the context of such a web service, due to the huge variations across videos, in terms of camera viewpoint, motion, distance from the field, as well as amateur camerawork quality, and lighting conditions, among other factors. Given a sequence of videos, where each shows a particular play of a football game, we first run noisy play-level detectors on every video. Then, we integrate responses of the play-level detectors with global game-level reasoning which accounts for statistical knowledge about football games. Our empirical results on more than 1450 videos from 10 diverse football games show that our approach is quite effective, and close to being usable in a real-world setting.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"1 1","pages":"652-659"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83172964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model-based anthropometry: Predicting measurements from 3D human scans in multiple poses","authors":"Aggeliki Tsoli, M. Loper, Michael J. Black","doi":"10.1109/WACV.2014.6836115","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836115","url":null,"abstract":"Extracting anthropometric or tailoring measurements from 3D human body scans is important for applications such as virtual try-on, custom clothing, and online sizing. Existing commercial solutions identify anatomical landmarks on high-resolution 3D scans and then compute distances or circumferences on the scan. Landmark detection is sensitive to acquisition noise (e.g. holes) and these methods require subjects to adopt a specific pose. In contrast, we propose a solution we call model-based anthropometry. We fit a deformable 3D body model to scan data in one or more poses; this model-based fitting is robust to scan noise. This brings the scan into registration with a database of registered body scans. Then, we extract features from the registered model (rather than from the scan); these include, limb lengths, circumferences, and statistical features of global shape. Finally, we learn a mapping from these features to measurements using regularized linear regression. We perform an extensive evaluation using the CAESAR dataset and demonstrate that the accuracy of our method outperforms state-of-the-art methods.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"2 1","pages":"83-90"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/WACV.2014.6836115","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72522246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Repeated constrained sparse coding with partial dictionaries for hyperspectral unmixing","authors":"Naveed Akhtar, F. Shafait, A. Mian","doi":"10.1109/WACV.2014.6836001","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836001","url":null,"abstract":"Hyperspectral images obtained from remote sensing platforms have limited spatial resolution. Thus, each spectra measured at a pixel is usually a mixture of many pure spectral signatures (endmembers) corresponding to different materials on the ground. Hyperspectral unmixing aims at separating these mixed spectra into its constituent end-members. We formulate hyperspectral unmixing as a constrained sparse coding (CSC) problem where unmixing is performed with the help of a library of pure spectral signatures under positivity and summation constraints. We propose two different methods that perform CSC repeatedly over the hyperspectral data. However, the first method, Repeated-CSC (RCSC), systematically neglects a few spectral bands of the data each time it performs the sparse coding. Whereas the second method, Repeated Spectral Derivative (RSD), takes the spectral derivative of the data before the sparse coding stage. The spectral derivative is taken such that it is not operated on a few selected bands. Experiments on simulated and real hyperspectral data and comparison with existing state of the art show that the proposed methods achieve significantly higher accuracy. Our results demonstrate the overall robustness of RCSC to noise and better performance of RSD at high signal to noise ratio.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"44 1","pages":"953-960"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79335744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}