通过隐私意识设计的光学透镜学习描述场景

IF 4.8 2区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Computational Imaging Pub Date : 2024-07-29 DOI:10.1109/TCI.2024.3426975

Paula Arguello;Jhon Lopez;Karen Sanchez;Carlos Hinojosa;Fernando Rojas-Morales;Henry Arguello

{"title":"通过隐私意识设计的光学透镜学习描述场景","authors":"Paula Arguello;Jhon Lopez;Karen Sanchez;Carlos Hinojosa;Fernando Rojas-Morales;Henry Arguello","doi":"10.1109/TCI.2024.3426975","DOIUrl":null,"url":null,"abstract":"Scene captioning consists of accurately describing the visual information using text, leveraging the capabilities of computer vision and natural language processing. However, current image captioning methods are trained on high-resolution images that may contain private information about individuals within the scene, such as facial attributes or sensitive data. This raises concerns about whether machines require high-resolution images and how we can protect the private information of the users. In this work, we aim to protect privacy in the scene captioning task by addressing the issue directly from the optics before image acquisition. Specifically, motivated by the emerging trend of integrating optics design with algorithms, we introduce a learned refractive lens into the camera to ensure privacy. Our optimized lens obscures sensitive visual attributes, such as faces, ethnicity, gender, and more, in the acquired image while extracting relevant features, enabling descriptions even from highly distorted images. By optimizing the refractive lens and a deep network architecture for image captioning end-to-end, we achieve description generation directly from our distorted images. We validate our approach with extensive simulations and hardware experiments. Our results show that we achieve a better trade-off between privacy and utility when compared to conventional non-privacy-preserving methods on the COCO dataset. For instance, our approach successfully conceals private information within the scene while achieving a BLEU-4 score of 27.0 on the COCO test set.","PeriodicalId":56022,"journal":{"name":"IEEE Transactions on Computational Imaging","volume":"10 ","pages":"1069-1079"},"PeriodicalIF":4.8000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning to Describe Scenes via Privacy-Aware Designed Optical Lens\",\"authors\":\"Paula Arguello;Jhon Lopez;Karen Sanchez;Carlos Hinojosa;Fernando Rojas-Morales;Henry Arguello\",\"doi\":\"10.1109/TCI.2024.3426975\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scene captioning consists of accurately describing the visual information using text, leveraging the capabilities of computer vision and natural language processing. However, current image captioning methods are trained on high-resolution images that may contain private information about individuals within the scene, such as facial attributes or sensitive data. This raises concerns about whether machines require high-resolution images and how we can protect the private information of the users. In this work, we aim to protect privacy in the scene captioning task by addressing the issue directly from the optics before image acquisition. Specifically, motivated by the emerging trend of integrating optics design with algorithms, we introduce a learned refractive lens into the camera to ensure privacy. Our optimized lens obscures sensitive visual attributes, such as faces, ethnicity, gender, and more, in the acquired image while extracting relevant features, enabling descriptions even from highly distorted images. By optimizing the refractive lens and a deep network architecture for image captioning end-to-end, we achieve description generation directly from our distorted images. We validate our approach with extensive simulations and hardware experiments. Our results show that we achieve a better trade-off between privacy and utility when compared to conventional non-privacy-preserving methods on the COCO dataset. For instance, our approach successfully conceals private information within the scene while achieving a BLEU-4 score of 27.0 on the COCO test set.\",\"PeriodicalId\":56022,\"journal\":{\"name\":\"IEEE Transactions on Computational Imaging\",\"volume\":\"10 \",\"pages\":\"1069-1079\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computational Imaging\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10613002/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computational Imaging","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10613002/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

场景字幕包括利用计算机视觉和自然语言处理能力，用文字准确描述视觉信息。然而，目前的图像字幕方法是在高分辨率图像上进行训练的，而高分辨率图像可能包含场景中个人的隐私信息，如面部属性或敏感数据。这引发了人们对机器是否需要高分辨率图像以及如何保护用户隐私信息的担忧。在这项工作中，我们的目标是在图像采集前直接从光学角度解决这个问题，从而在场景字幕任务中保护隐私。具体来说，在光学设计与算法相结合这一新兴趋势的推动下，我们在相机中引入了一个学习折射透镜，以确保隐私。我们优化的透镜能遮挡采集图像中的敏感视觉属性，如人脸、种族、性别等，同时提取相关特征，即使是高度失真的图像也能进行描述。通过优化折射透镜和端到端图像字幕深度网络架构，我们可以直接从失真图像中生成描述。我们通过大量的模拟和硬件实验验证了我们的方法。结果表明，与 COCO 数据集上的传统非隐私保护方法相比，我们在隐私和实用性之间实现了更好的权衡。例如，我们的方法成功地隐藏了场景中的隐私信息，同时在 COCO 测试集上的 BLEU-4 得分为 27.0。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning to Describe Scenes via Privacy-Aware Designed Optical Lens

Scene captioning consists of accurately describing the visual information using text, leveraging the capabilities of computer vision and natural language processing. However, current image captioning methods are trained on high-resolution images that may contain private information about individuals within the scene, such as facial attributes or sensitive data. This raises concerns about whether machines require high-resolution images and how we can protect the private information of the users. In this work, we aim to protect privacy in the scene captioning task by addressing the issue directly from the optics before image acquisition. Specifically, motivated by the emerging trend of integrating optics design with algorithms, we introduce a learned refractive lens into the camera to ensure privacy. Our optimized lens obscures sensitive visual attributes, such as faces, ethnicity, gender, and more, in the acquired image while extracting relevant features, enabling descriptions even from highly distorted images. By optimizing the refractive lens and a deep network architecture for image captioning end-to-end, we achieve description generation directly from our distorted images. We validate our approach with extensive simulations and hardware experiments. Our results show that we achieve a better trade-off between privacy and utility when compared to conventional non-privacy-preserving methods on the COCO dataset. For instance, our approach successfully conceals private information within the scene while achieving a BLEU-4 score of 27.0 on the COCO test set.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Computational Imaging Mathematics-Computational Mathematics

CiteScore

8.20

自引率

7.40%

发文量

期刊介绍： The IEEE Transactions on Computational Imaging will publish articles where computation plays an integral role in the image formation process. Papers will cover all areas of computational imaging ranging from fundamental theoretical methods to the latest innovative computational imaging system designs. Topics of interest will include advanced algorithms and mathematical techniques, model-based data inversion, methods for image and signal recovery from sparse and incomplete data, techniques for non-traditional sensing of image data, methods for dynamic information acquisition and extraction from imaging sensors, software and hardware for efficient computation in imaging systems, and highly novel imaging system design.