Paula Arguello;Jhon Lopez;Karen Sanchez;Carlos Hinojosa;Fernando Rojas-Morales;Henry Arguello
{"title":"Learning to Describe Scenes via Privacy-Aware Designed Optical Lens","authors":"Paula Arguello;Jhon Lopez;Karen Sanchez;Carlos Hinojosa;Fernando Rojas-Morales;Henry Arguello","doi":"10.1109/TCI.2024.3426975","DOIUrl":null,"url":null,"abstract":"Scene captioning consists of accurately describing the visual information using text, leveraging the capabilities of computer vision and natural language processing. However, current image captioning methods are trained on high-resolution images that may contain private information about individuals within the scene, such as facial attributes or sensitive data. This raises concerns about whether machines require high-resolution images and how we can protect the private information of the users. In this work, we aim to protect privacy in the scene captioning task by addressing the issue directly from the optics before image acquisition. Specifically, motivated by the emerging trend of integrating optics design with algorithms, we introduce a learned refractive lens into the camera to ensure privacy. Our optimized lens obscures sensitive visual attributes, such as faces, ethnicity, gender, and more, in the acquired image while extracting relevant features, enabling descriptions even from highly distorted images. By optimizing the refractive lens and a deep network architecture for image captioning end-to-end, we achieve description generation directly from our distorted images. We validate our approach with extensive simulations and hardware experiments. Our results show that we achieve a better trade-off between privacy and utility when compared to conventional non-privacy-preserving methods on the COCO dataset. For instance, our approach successfully conceals private information within the scene while achieving a BLEU-4 score of 27.0 on the COCO test set.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"10 ","pages":"1069-1079"},"PeriodicalIF":4.2000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Imaging","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10613002/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Scene captioning consists of accurately describing the visual information using text, leveraging the capabilities of computer vision and natural language processing. However, current image captioning methods are trained on high-resolution images that may contain private information about individuals within the scene, such as facial attributes or sensitive data. This raises concerns about whether machines require high-resolution images and how we can protect the private information of the users. In this work, we aim to protect privacy in the scene captioning task by addressing the issue directly from the optics before image acquisition. Specifically, motivated by the emerging trend of integrating optics design with algorithms, we introduce a learned refractive lens into the camera to ensure privacy. Our optimized lens obscures sensitive visual attributes, such as faces, ethnicity, gender, and more, in the acquired image while extracting relevant features, enabling descriptions even from highly distorted images. By optimizing the refractive lens and a deep network architecture for image captioning end-to-end, we achieve description generation directly from our distorted images. We validate our approach with extensive simulations and hardware experiments. Our results show that we achieve a better trade-off between privacy and utility when compared to conventional non-privacy-preserving methods on the COCO dataset. For instance, our approach successfully conceals private information within the scene while achieving a BLEU-4 score of 27.0 on the COCO test set.
期刊介绍:
The IEEE Transactions on Computational Imaging will publish articles where computation plays an integral role in the image formation process. Papers will cover all areas of computational imaging ranging from fundamental theoretical methods to the latest innovative computational imaging system designs. Topics of interest will include advanced algorithms and mathematical techniques, model-based data inversion, methods for image and signal recovery from sparse and incomplete data, techniques for non-traditional sensing of image data, methods for dynamic information acquisition and extraction from imaging sensors, software and hardware for efficient computation in imaging systems, and highly novel imaging system design.