{"title":"Self-Paced Collaborative and Adversarial Network for Unsupervised Domain Adaptation.","authors":"Weichen Zhang, Dong Xu, Wanli Ouyang, Wen Li","doi":"10.1109/TPAMI.2019.2962476","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2962476","url":null,"abstract":"<p><p>This paper proposes a new unsupervised domain adaptation approach called Collaborative and Adversarial Network (CAN), which uses the domain-collaborative and domain-adversarial learning strategies for training the neural network. The domain-collaborative learning strategy aims to learn domain specific feature representation to preserve the discriminability for the target domain, while the domain adversarial learning strategy aims to learn domain invariant feature representation to reduce the domain distribution mismatch between the source and target domains. We show that these two learning strategies can be uniformly formulated as domain classifier learning with positive or negative weights on the losses. We then design a collaborative and adversarial training scheme, which automatically learns domain specific representations from lower blocks in CNNs through collaborative learning and domain invariant representations from higher blocks through adversarial learning. Moreover, to further enhance the discriminability in the target domain, we propose Self-Paced CAN (SPCAN), which progressively selects pseudo-labeled target samples for re-training the classifiers. We employ a self-paced learning strategy such that we can select pseudo-labeled target samples in an easy-to-hard fashion. Additionally, we build upon the popular two-stream approach to extend our domain adaptation approach for more challenging video action recognition task, which additionally considers the cooperation between the RGB stream and the optical flow stream. We propose the Two-stream SPCAN (TS-SPCAN) method to select and reweight the pseudo labeled target samples of one stream (RGB/Flow) based on the information from the other stream (Flow/RGB) in a cooperative way. As a result, our TS-SPCAN model is able to exchange the information between the two streams. Comprehensive experiments on different benchmark datasets, Office-31, ImageCLEF-DA and VISDA-2017 for the object recognition task, and UCF101-10 and HMDB51-10 for the video action recognition task, show our newly proposed approaches achieve the state-of-the-art performance, which clearly demonstrates the effectiveness of our proposed approaches for unsupervised domain adaptation.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"2047-2061"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2962476","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37494794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing Graph Clusterings: Set Partition Measures vs. Graph-Aware Measures.","authors":"Valerie Poulin, Francois Theberge","doi":"10.1109/TPAMI.2020.3009862","DOIUrl":"https://doi.org/10.1109/TPAMI.2020.3009862","url":null,"abstract":"<p><p>In this paper, we propose a family of graph partition similarity measures that take the topology of the graph into account. These graph-aware measures are alternatives to using set partition similarity measures that are not specifically designed for graphs. The two types of measures, graph-aware and set partition measures, are shown to have opposite behaviors with respect to resolution issues and provide complementary information necessary to compare graph partitions.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"2127-2132"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2020.3009862","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38228636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Time-Resolved Far Infrared Light Transport Decomposition for Thermal Photometric Stereo.","authors":"Kenichiro Tanaka, Nobuhiro Ikeya, Tsuyoshi Takatani, Hiroyuki Kubo, Takuya Funatomi, Vijay Ravi, Achuta Kadambi, Yasuhiro Mukaigawa","doi":"10.1109/TPAMI.2019.2959304","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2959304","url":null,"abstract":"<p><p>We present a novel time-resolved light transport decomposition method using thermal imaging. Because the speed of heat propagation is much slower than the speed of light propagation, the transient transport of far infrared light can be observed at a video frame rate. A key observation is that the thermal image looks similar to the visible light image in an appropriately controlled environment. This implies that conventional computer vision techniques can be straightforwardly applied to the thermal image. We show that the diffuse component in the thermal image can be separated, and therefore, the surface normals of objects can be estimated by the Lambertian photometric stereo. The effectiveness of our method is evaluated by conducting real-world experiments, and its applicability to black body, transparent, and translucent objects is shown.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"2075-2085"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2959304","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37485196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Luo, Yongkang Wong, Mohan Kankanhalli, Qi Zhao
{"title":"Direction Concentration Learning: Enhancing Congruency in Machine Learning.","authors":"Yan Luo, Yongkang Wong, Mohan Kankanhalli, Qi Zhao","doi":"10.1109/TPAMI.2019.2963387","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2963387","url":null,"abstract":"<p><p>One of the well-known challenges in computer vision tasks is the visual diversity of images, which could result in an agreement or disagreement between the learned knowledge and the visual content exhibited by the current observation. In this work, we first define such an agreement in a concepts learning process as congruency. Formally, given a particular task and sufficiently large dataset, the congruency issue occurs in the learning process whereby the task-specific semantics in the training data are highly varying. We propose a Direction Concentration Learning (DCL) method to improve congruency in the learning process, where enhancing congruency influences the convergence path to be less circuitous. The experimental results show that the proposed DCL method generalizes to state-of-the-art models and optimizers, as well as improves the performances of saliency prediction task, continual learning task, and classification task. Moreover, it helps mitigate the catastrophic forgetting problem in the continual learning task. The code is publicly available at https://github.com/luoyan407/congruency.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"1928-1946"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2963387","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37512255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Song Bai, Yingwei Li, Yuyin Zhou, Qizhu Li, Philip H S Torr
{"title":"Adversarial Metric Attack and Defense for Person Re-Identification.","authors":"Song Bai, Yingwei Li, Yuyin Zhou, Qizhu Li, Philip H S Torr","doi":"10.1109/TPAMI.2020.3031625","DOIUrl":"https://doi.org/10.1109/TPAMI.2020.3031625","url":null,"abstract":"Person re-identification (re-ID) has attracted much attention recently due to its great importance in video surveillance. In general, distance metrics used to identify two person images are expected to be robust under various appearance changes. However, our work observes the extreme vulnerability of existing distance metrics to adversarial examples, generated by simply adding human-imperceptible perturbations to person images. Hence, the security danger is dramatically increased when deploying commercial re-ID systems in video surveillance. Although adversarial examples have been extensively applied for classification analysis, it is rarely studied in metric analysis like person re-identification. The most likely reason is the natural gap between the training and testing of re-ID networks, that is, the predictions of a re-ID network cannot be directly used during testing without an effective metric. In this work, we bridge the gap by proposing Adversarial Metric Attack, a parallel methodology to adversarial classification attacks. Comprehensive experiments clearly reveal the adversarial effects in re-ID systems. Meanwhile, we also present an early attempt of training a metric-preserving network, thereby defending the metric against adversarial attacks. At last, by benchmarking various adversarial settings, we expect that our work can facilitate the development of adversarial attack and defense in metric-based applications.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"2119-2126"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2020.3031625","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38495096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Lightweight Neural Network for Monocular View Generation With Occlusion Handling.","authors":"Simon Evain, Christine Guillemot","doi":"10.1109/TPAMI.2019.2960689","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2960689","url":null,"abstract":"<p><p>In this article, we present a very lightweight neural network architecture, trained on stereo data pairs, which performs view synthesis from one single image. With the growing success of multi-view formats, this problem is indeed increasingly relevant. The network returns a prediction built from disparity estimation, which fills in wrongly predicted regions using a occlusion handling technique. To do so, during training, the network learns to estimate the left-right consistency structural constraint on the pair of stereo input images, to be able to replicate it at test time from one single image. The method is built upon the idea of blending two predictions: a prediction based on disparity estimation and a prediction based on direct minimization in occluded regions. The network is also able to identify these occluded areas at training and at test time by checking the pixelwise left-right consistency of the produced disparity maps. At test time, the approach can thus generate a left-side and a right-side view from one input image, as well as a depth map and a pixelwise confidence measure in the prediction. The work outperforms visually and metric-wise state-of-the-art approaches on the challenging KITTI dataset, all while reducing by a very significant order of magnitude (5 or 10 times) the required number of parameters (6.5 M).</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"1832-1844"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2960689","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37484226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning a Fixed-Length Fingerprint Representation.","authors":"Joshua J Engelsma, Kai Cao, Anil K Jain","doi":"10.1109/TPAMI.2019.2961349","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2961349","url":null,"abstract":"<p><p>We present DeepPrint, a deep network, which learns to extract fixed-length fingerprint representations of only 200 bytes. DeepPrint incorporates fingerprint domain knowledge, including alignment and minutiae detection, into the deep network architecture to maximize the discriminative power of its representation. The compact, DeepPrint representation has several advantages over the prevailing variable length minutiae representation which (i) requires computationally expensive graph matching techniques, (ii) is difficult to secure using strong encryption schemes (e.g., homomorphic encryption), and (iii) has low discriminative power in poor quality fingerprints where minutiae extraction is unreliable. We benchmark DeepPrint against two top performing COTS SDKs (Verifinger and Innovatrics) from the NIST and FVC evaluations. Coupled with a re-ranking scheme, the DeepPrint rank-1 search accuracy on the NIST SD4 dataset against a gallery of 1.1 million fingerprints is comparable to the top COTS matcher, but it is significantly faster (DeepPrint: 98.80% in 0.3 seconds vs. COTS A: 98.85% in 27 seconds). To the best of our knowledge, the DeepPrint representation is the most compact and discriminative fixed-length fingerprint representation reported in the academic literature.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"1981-1997"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2961349","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37486548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yannick Hold-Geoffroy, Paulo Gotardo, Jean-Francois Lalonde
{"title":"Single Day Outdoor Photometric Stereo.","authors":"Yannick Hold-Geoffroy, Paulo Gotardo, Jean-Francois Lalonde","doi":"10.1109/TPAMI.2019.2962693","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2962693","url":null,"abstract":"<p><p>Photometric Stereo (PS) under outdoor illumination remains a challenging, ill-posed problem due to insufficient variability in illumination. Months-long capture sessions are typically used in this setup, with little success on shorter, single-day time intervals. In this paper, we investigate the solution of outdoor PS over a single day, under different weather conditions. First, we investigate the relationship between weather and surface reconstructability in order to understand when natural lighting allows existing PS algorithms to work. Our analysis reveals that partially cloudy days improve the conditioning of the outdoor PS problem while sunny days do not allow the unambiguous recovery of surface normals from photometric cues alone. We demonstrate that calibrated PS algorithms can thus be employed to reconstruct Lambertian surfaces accurately under partially cloudy days. Second, we solve the ambiguity arising in clear days by combining photometric cues with prior knowledge on material properties, local surface geometry and the natural variations in outdoor lighting through a CNN-based, weakly-calibrated PS technique. Given a sequence of outdoor images captured during a single sunny day, our method robustly estimates the scene surface normals with unprecedented quality for the considered scenario. Our approach does not require precise geolocation and significantly outperforms several state-of-the-art methods on images with real lighting, showing that our CNN can combine efficiently learned priors and photometric cues available during a single sunny day.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"2062-2074"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2962693","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37510063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Teena Hassan, Dominik Seus, Johannes Wollenberg, Katharina Weitz, Miriam Kunz, Stefan Lautenbacher, Jens-Uwe Garbas, Ute Schmid
{"title":"Automatic Detection of Pain from Facial Expressions: A Survey.","authors":"Teena Hassan, Dominik Seus, Johannes Wollenberg, Katharina Weitz, Miriam Kunz, Stefan Lautenbacher, Jens-Uwe Garbas, Ute Schmid","doi":"10.1109/TPAMI.2019.2958341","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2958341","url":null,"abstract":"<p><p>Pain sensation is essential for survival, since it draws attention to physical threat to the body. Pain assessment is usually done through self-reports. However, self-assessment of pain is not available in the case of noncommunicative patients, and therefore, observer reports should be relied upon. Observer reports of pain could be prone to errors due to subjective biases of observers. Moreover, continuous monitoring by humans is impractical. Therefore, automatic pain detection technology could be deployed to assist human caregivers and complement their service, thereby improving the quality of pain management, especially for noncommunicative patients. Facial expressions are a reliable indicator of pain, and are used in all observer-based pain assessment tools. Following the advancements in automatic facial expression analysis, computer vision researchers have tried to use this technology for developing approaches for automatically detecting pain from facial expressions. This paper surveys the literature published in this field over the past decade, categorizes it, and identifies future research directions. The survey covers the pain datasets used in the reviewed literature, the learning tasks targeted by the approaches, the features extracted from images and image sequences to represent pain-related information, and finally, the machine learning methods used.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"1815-1831"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2958341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37447718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Memory- and Accuracy-Aware Gaussian Parameter-Based Stereo Matching Using Confidence Measure.","authors":"Yeongmin Lee, Chong-Min Kyung","doi":"10.1109/TPAMI.2019.2959613","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2959613","url":null,"abstract":"<p><p>Accurate stereo matching requires a large amount of memory at a high bandwidth, which restricts its use in resource-limited systems such as mobile devices. This problem is compounded by the recent trend of applications requiring significantly high pixel resolution and disparity levels. To alleviate this, we present a memory-efficient and robust stereo matching algorithm. For cost aggregation, we employ the semiglobal parametric approach, which significantly reduces the memory bandwidth by representing the costs of all disparities as a Gaussian mixture model. All costs on multiple paths in an image are aggregated by updating the Gaussian parameters. The aggregation is performed during the scanning in the forward and backward directions. To reduce the amount of memory for the intermediate results during the forward scan, we suggest to store only the Gaussian parameters which contribute significantly to the final disparity selection. We also propose a method to enhance the overall procedure through a learning-based confidence measure. The random forest framework is used to train various features which are extracted from the cost and intensity profile. The experimental results on KITTI dataset show that the proposed method reduces the memory requirement to less than 3 percent of that of semiglobal matching (SGM) while providing a robust depth map compared to those of state-of-the-art SGM-based algorithms.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"1845-1858"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2959613","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37484221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}