SketchTriplet：自监督场景化草图-文本-图像三重生成

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Internet of Things Journal Pub Date : 2025-01-10 DOI:10.1109/JIOT.2024.3523382

Zhenbei Wu;Qiang Wang;Jie Yang

{"title":"SketchTriplet：自监督场景化草图-文本-图像三重生成","authors":"Zhenbei Wu;Qiang Wang;Jie Yang","doi":"10.1109/JIOT.2024.3523382","DOIUrl":null,"url":null,"abstract":"Touchscreen Internet of Things (IoT) devices, such as smartphones and tablets, have been seamlessly integrated into our daily lives. Drawing sketches on the touch screen is an extremely convenient mode of interaction, and when combined with generative artificial intelligence (AI), it makes the customized data generation and digital twin based on IoT devices more straightforward. However, the scarcity of free-hand sketch data makes the construction of data-driven generative AI models a thorny issue. Despite the emergence of some large-scale sketch datasets, these datasets primarily consist of sketches at the single-object level. There continues to be a lack of large-scale paired datasets for scene sketches. In this article, we propose a self-supervised method for scene sketch generation that does not rely on any existing scene sketch, enabling the transformation of single-object sketches into scene sketches. To accomplish this, we introduce a method for vector sketch captioning and sketch semantic expansion. Additionally, we design a sketch generation network that incorporates a fusion of multimodal perceptual constraints, suitable for application in zero-shot image-to-sketch downstream task, demonstrating state-of-the-art performance through experimental validation. Finally, leveraging our proposed sketch-to-sketch generation method, we contribute a large-scale dataset centered around scene sketches, comprising highly semantically consistent “text-sketch–image” triplets. Our research can provide critical technical support for AI content creation and customized data generation based on IoT devices.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":"12 9","pages":"13021-13032"},"PeriodicalIF":8.9000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SketchTriplet: Self-Supervised Scenarized Sketch–Text–Image Triplet Generation\",\"authors\":\"Zhenbei Wu;Qiang Wang;Jie Yang\",\"doi\":\"10.1109/JIOT.2024.3523382\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Touchscreen Internet of Things (IoT) devices, such as smartphones and tablets, have been seamlessly integrated into our daily lives. Drawing sketches on the touch screen is an extremely convenient mode of interaction, and when combined with generative artificial intelligence (AI), it makes the customized data generation and digital twin based on IoT devices more straightforward. However, the scarcity of free-hand sketch data makes the construction of data-driven generative AI models a thorny issue. Despite the emergence of some large-scale sketch datasets, these datasets primarily consist of sketches at the single-object level. There continues to be a lack of large-scale paired datasets for scene sketches. In this article, we propose a self-supervised method for scene sketch generation that does not rely on any existing scene sketch, enabling the transformation of single-object sketches into scene sketches. To accomplish this, we introduce a method for vector sketch captioning and sketch semantic expansion. Additionally, we design a sketch generation network that incorporates a fusion of multimodal perceptual constraints, suitable for application in zero-shot image-to-sketch downstream task, demonstrating state-of-the-art performance through experimental validation. Finally, leveraging our proposed sketch-to-sketch generation method, we contribute a large-scale dataset centered around scene sketches, comprising highly semantically consistent “text-sketch–image” triplets. Our research can provide critical technical support for AI content creation and customized data generation based on IoT devices.\",\"PeriodicalId\":54347,\"journal\":{\"name\":\"IEEE Internet of Things Journal\",\"volume\":\"12 9\",\"pages\":\"13021-13032\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Internet of Things Journal\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10836756/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10836756/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

智能手机、平板电脑等触屏物联网（IoT）设备已经无缝融入我们的日常生活。在触摸屏上绘制草图是一种非常方便的交互方式，结合生成式人工智能（AI），使基于物联网设备的定制数据生成和数字孪生更加直观。然而，手绘草图数据的稀缺性使得构建数据驱动的生成式人工智能模型成为一个棘手的问题。尽管出现了一些大规模的草图数据集，但这些数据集主要由单对象级别的草图组成。对于场景草图，仍然缺乏大规模的配对数据集。在本文中，我们提出了一种场景草图生成的自监督方法，该方法不依赖于任何现有的场景草图，实现了单对象草图到场景草图的转换。为了实现这一目标，我们引入了一种矢量草图标注和草图语义扩展的方法。此外，我们设计了一个包含多模态感知约束融合的草图生成网络，适用于零镜头图像到草图下游任务，通过实验验证展示了最先进的性能。最后，利用我们提出的草图到草图生成方法，我们提供了一个以场景草图为中心的大规模数据集，包括高度语义一致的“文本-草图-图像”三元组。我们的研究可以为基于物联网设备的人工智能内容创建和定制数据生成提供关键的技术支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SketchTriplet: Self-Supervised Scenarized Sketch–Text–Image Triplet Generation

Touchscreen Internet of Things (IoT) devices, such as smartphones and tablets, have been seamlessly integrated into our daily lives. Drawing sketches on the touch screen is an extremely convenient mode of interaction, and when combined with generative artificial intelligence (AI), it makes the customized data generation and digital twin based on IoT devices more straightforward. However, the scarcity of free-hand sketch data makes the construction of data-driven generative AI models a thorny issue. Despite the emergence of some large-scale sketch datasets, these datasets primarily consist of sketches at the single-object level. There continues to be a lack of large-scale paired datasets for scene sketches. In this article, we propose a self-supervised method for scene sketch generation that does not rely on any existing scene sketch, enabling the transformation of single-object sketches into scene sketches. To accomplish this, we introduce a method for vector sketch captioning and sketch semantic expansion. Additionally, we design a sketch generation network that incorporates a fusion of multimodal perceptual constraints, suitable for application in zero-shot image-to-sketch downstream task, demonstrating state-of-the-art performance through experimental validation. Finally, leveraging our proposed sketch-to-sketch generation method, we contribute a large-scale dataset centered around scene sketches, comprising highly semantically consistent “text-sketch–image” triplets. Our research can provide critical technical support for AI content creation and customized data generation based on IoT devices.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Internet of Things Journal Computer Science-Information Systems

CiteScore

17.60

自引率

13.20%

发文量

1982

期刊介绍： The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.