Mojtaba Parsa Kordasiabi, I. Vornicu, R. Carmona-Galán, Á. Rodríguez-Vázquez
{"title":"A survey on FPGA-based high-resolution TDCs","authors":"Mojtaba Parsa Kordasiabi, I. Vornicu, R. Carmona-Galán, Á. Rodríguez-Vázquez","doi":"10.1145/3349801.3357129","DOIUrl":"https://doi.org/10.1145/3349801.3357129","url":null,"abstract":"Time-to-digital converters based on Nutt method are especially suitable for FPGA implementation. They are able to provide high resolution, range and linearity with low resources usage. The core of this architecture consist in a coarse counter for long range, a fine time interpolator for high resolution and real-time calibration for high linearity. This paper reviews different time interpolation and real-time calibration techniques. Moreover, a comparison of state-of-the-art FPGA-based TDCs is presented as well.","PeriodicalId":299138,"journal":{"name":"Proceedings of the 13th International Conference on Distributed Smart Cameras","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124947367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards an Embedded Stereo Matching Algorithm Based on Multiple Correlation Windows","authors":"Marco-Antonio Palacios-Ramos, Héctor-Daniel Vázquez-Delgado, Abiel Aguilar-González, Madaín Pérez Patricio","doi":"10.1145/3349801.3357128","DOIUrl":"https://doi.org/10.1145/3349801.3357128","url":null,"abstract":"Stereo matching consists in extracting 3D information from digital images, such as those obtained by a CCD camera. It is an important issue under several real world applications, such as positioning systems for mobile robots, augmented reality systems, etc. In previous works one of the most popular trend to address the stereo matching challenge is that compares scene information from two viewpoints (left-right) with an eppipolar geometry via correlation metrics. In regard to the correlation metrics, most previous works compute the similarity between pixels in the left image and pixels in the right image using a correlation index computed on neighborhoods of these pixels called correlation windows. Unfortunately, in order to preserve edges, small correlation windows need to be used, while, for homogeneous areas, large correlation windows are required. To address this problem, we lay down on the hypothesis that small correlation windows combined with large correlation windows should deliver accurate results under homogeneous areas while at the same time edges are preserved. To validate our hypothesis, in this paper a similarity criterion based on the grayscale homogeneity of the correlation window being processed is presented. Preliminary results are encourageous, validates our hypothesis and demonstrated the viability performance and scope of the proposed approach.","PeriodicalId":299138,"journal":{"name":"Proceedings of the 13th International Conference on Distributed Smart Cameras","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125922118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Guided Convolutional Network","authors":"Chunlei Liu, Wenrui Ding, Yuan Hu, Hanlin Chen, Baochang Zhang, Shuo Liu","doi":"10.1145/3349801.3349813","DOIUrl":"https://doi.org/10.1145/3349801.3349813","url":null,"abstract":"Low-level handcrafted features (e.g., edge and saliency) dominate the design of traditional algorithms, and endow themselves the effective capability of dealing with simple classification problems. However, such excellent properties have not been well explored in the popular deep convolutional neural networks (DCNNs). In this paper, we propose a new deep model, termed Guided Convolutional Networks (GCNs), using low-level handcrafted features to guide the training process of the DCNNs, which can be used in the following vision tasks. Furthermore, signature structure is also investigated with saliency information as a basic block to help the network to be slim. With the modulated binary convolutional way, the memory of our small network is reduced by 132 theoretically. Experiments also demonstrate GCNs have comparable results in presicion compared with state-of-the-art networks such as Wide-ResNet (WRN) while reducing the network dramatically.","PeriodicalId":299138,"journal":{"name":"Proceedings of the 13th International Conference on Distributed Smart Cameras","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116876427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pavel Nikitin, Marco Cagnazzo, Joël Jung, A. Fiandrotti
{"title":"Exploiting View Synthesis for Super-multiview Video Compression","authors":"Pavel Nikitin, Marco Cagnazzo, Joël Jung, A. Fiandrotti","doi":"10.1145/3349801.3349820","DOIUrl":"https://doi.org/10.1145/3349801.3349820","url":null,"abstract":"Super-multiview video consists in a 2D arrangement of cameras acquiring the same scene and it is a well-suited format for immersive and free navigation video services. However, the large number of acquired viewpoints calls for extremely effective compression tools. View synthesis allows to reconstruct a viewpoint using nearby cameras texture and depth information. In this work we explore the potential of recent advances in view synthesis algorithms to enhance the compression performances of super-multiview video. Towards this end we consider five methods that replace one viewpoint with a synthesized view, possibly enhanced with some side information. Our experiments suggest that, if the geometry information (i.e. depth map) is reliable, these methods have the potential to improve rate-distortion performance with respect to traditional approaches, at least for some specific content and configuration. Moreover, our results shed some light about how to further improve compression performance by integrating new view-synthesis prediction tools within a 3D video encoder.","PeriodicalId":299138,"journal":{"name":"Proceedings of the 13th International Conference on Distributed Smart Cameras","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129440177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Deception Detection in RGB videos using Facial Action Units","authors":"D. Avola, L. Cinque, G. Foresti, D. Pannone","doi":"10.1145/3349801.3349806","DOIUrl":"https://doi.org/10.1145/3349801.3349806","url":null,"abstract":"The outcome of situations such as police interrogatory or court trials is strongly influenced by the behaviour of the interviewed subject. In particular, a deceptive behaviour may completely overturn such sensible situations. Moreover, if some specific devices such as polygraph or magnetic resonance are used, the subject is aware of being monitored and thus he may change his behaviour accordingly. To overcome this problem, in this paper a method for detecting deception in RGB videos is presented. The method automatically extracts facial Action Units (AU) from video frames containing the interviewed subject, and classifies them through an SVM as truthful or deception. Experiments on real trial court data and comparisons with the current state of the art show the effectiveness of the proposed method.","PeriodicalId":299138,"journal":{"name":"Proceedings of the 13th International Conference on Distributed Smart Cameras","volume":"45 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123656816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised camera pose estimation through human mesh recovery","authors":"Nicola Garau, N. Conci","doi":"10.1145/3349801.3357138","DOIUrl":"https://doi.org/10.1145/3349801.3357138","url":null,"abstract":"Camera resectioning is essential in computer vision and 3D reconstruction to estimate the position of matching pinhole cameras in 3D worlds. While the internal camera parameters are usually known or can be easily computed offline, in camera networks extrinsic parameters need to be computed each time a camera changes position, thus not allowing for smooth and dynamic network reconfiguration. In this work we propose a fully markerless, unsupervised, and automatic tool for the estimation of the extrinsic parameters of a camera network, based on 3D human mesh recovery from RGB videos. We show how it is possible to retrieve, from monocular images and with just a weak prior knowledge of the intrinsic parameters, the real-world position of the cameras in the network, together with the floor plane. Our solution also works with a single RGB camera and allows the user to dynamically add, re-position, or remove cameras from the network.","PeriodicalId":299138,"journal":{"name":"Proceedings of the 13th International Conference on Distributed Smart Cameras","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125461666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Face anti-spoofing with multi-color double-stream CNN","authors":"Daqiang Mu, Teng Li","doi":"10.1145/3349801.3349817","DOIUrl":"https://doi.org/10.1145/3349801.3349817","url":null,"abstract":"Previous methods on face anti-spoofing rarely pay attention to the difference of multi-channel chrominance between genuine and fake faces, or only use hand crafted features, which cannot effectively fuse multi-channel chrominance information. This paper uses CNN (convolutional neural network) features instead of hand crafted features for face anti-spoofing. In order to fuse more discriminative chrominance information, this paper proposes a novel face anti-spoofing method based on a double-stream CNN. Through the jointly modeling of features from global face image and local patches, as well as integrating the features of two different color spaces, i.e. YCbCr and HSV, we explore the discriminative representation for face anti-spoofing. Extensive experiments on benchmarks including CASIA-FASD and Replay_Attack show that our method can achieve state-of-the-art performance. Specifically, 1.79% of EER (Equal Error Rate) on the CASIA-FASD, 0.29% of EER on the Replay_Attack database are achieved.","PeriodicalId":299138,"journal":{"name":"Proceedings of the 13th International Conference on Distributed Smart Cameras","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121965272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Walther Carballo-Hernández, F. Berry, M. Pelcat, M. Arias-Estrada
{"title":"Towards Embedded Heterogeneous FPGA-GPU Smart Camera Architectures for CNN Inference","authors":"Walther Carballo-Hernández, F. Berry, M. Pelcat, M. Arias-Estrada","doi":"10.1145/3349801.3357136","DOIUrl":"https://doi.org/10.1145/3349801.3357136","url":null,"abstract":"The success of Deep Learning (DL) algorithms in computer vision tasks have created an on-going demand of dedicated hardware architectures that could keep up with the their required computation and memory complexities. This task is particularly challenging when embedded smart camera platforms have constrained resources such as power consumption, Processing Element (PE) and communication. This article describes a heterogeneous system embedding an FPGA and a GPU for executing CNN inference for computer vision applications. The built system addresses some challenges of embedded CNN such as task and data partitioning, and workload balancing. The selected heterogeneous platform embeds an Nvidia® Jetson TX2 for the CPU-GPU side and an Intel Altera® Cyclone10GX for the FPGA side interconnected by PCIe Gen2 with a MIPI-CSI camera for prototyping. This test environment will be used as a support for future work on a methodology for optimized model partitioning.","PeriodicalId":299138,"journal":{"name":"Proceedings of the 13th International Conference on Distributed Smart Cameras","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131085249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Muñío-Gracia, J. Fernández-Berni, R. Carmona-Galán, Á. Rodríguez-Vázquez
{"title":"Impact of CNNs Pooling Layer Implementation on FPGAs Accelerator Design","authors":"A. Muñío-Gracia, J. Fernández-Berni, R. Carmona-Galán, Á. Rodríguez-Vázquez","doi":"10.1145/3349801.3357130","DOIUrl":"https://doi.org/10.1145/3349801.3357130","url":null,"abstract":"Convolutional Neural Networks have demonstrated their competence in extracting information from data, especially in the field of computer vision. Their computational complexity prompts for hardware acceleration. The challenge in the design of hardware accelerators for CNNs is providing a sustained throughput with low power consumption, for what FPGAs have captured community attention. In CNNs pooling layers are introduced to reduce model spatial dimensions. This work explores the influence of pooling layers modification in some state-of-the-art CNNs, namely AlexNet and SqueezeNet. The objective is to optimize hardware resources utilization without negative impact on inference accuracy.","PeriodicalId":299138,"journal":{"name":"Proceedings of the 13th International Conference on Distributed Smart Cameras","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130707893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Camera network optimization: maximize coverage in a 3D virtual environment","authors":"N. Bisagno, Cristian Iacovlev","doi":"10.1145/3349801.3357137","DOIUrl":"https://doi.org/10.1145/3349801.3357137","url":null,"abstract":"In this demo, we present a method for optimal video camera positioning. The final objective is to maximize visual coverage in complex indoor environments. Starting from a predefined camera model and environmental setup, we employ a particle swarm optimizer (PSO) to determine the best configuration of the camera network to satisfy our target coverage. The target coverage objectives can vary and depend on realistic factors such as lighting and obstacles.","PeriodicalId":299138,"journal":{"name":"Proceedings of the 13th International Conference on Distributed Smart Cameras","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132994692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}