{"title":"通过扩散模型为图像添加字幕:一项调查","authors":"","doi":"10.1016/j.engappai.2024.109288","DOIUrl":null,"url":null,"abstract":"<div><p>Diffusion models are increasingly favored over traditional approaches like generative adversarial networks (GANs) and auto-regressive transformers due to their remarkable generative capabilities. They demonstrate outstanding performance not solely limited to image generation and manipulation but also in text-related tasks. Despite this, existing surveys tend to concentrate on the utilization of diffusion models solely for image generation, ignoring their potential in image captioning. To address this oversight, our paper provides an exhaustive examination of image-to-text diffusion models within the landscape of artificial intelligence (AI) and generative computing, filling a critical void in the literature. Starting with an overview of basic diffusion model principles, we explore into the enhancements brought by conditioning or guidance and the implemented AI. We then present a taxonomy and review of cutting-edge methods in diffusion-based image captioning. Additionally, we explore applications beyond image-to-text generation, such as image-guided creative generation, text editing, and the application of AI. We also cover existing evaluation metrics, software and libraries, as well as challenges and future directions in the field.</p></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":7.5000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Image captioning by diffusion models: A survey\",\"authors\":\"\",\"doi\":\"10.1016/j.engappai.2024.109288\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Diffusion models are increasingly favored over traditional approaches like generative adversarial networks (GANs) and auto-regressive transformers due to their remarkable generative capabilities. They demonstrate outstanding performance not solely limited to image generation and manipulation but also in text-related tasks. Despite this, existing surveys tend to concentrate on the utilization of diffusion models solely for image generation, ignoring their potential in image captioning. To address this oversight, our paper provides an exhaustive examination of image-to-text diffusion models within the landscape of artificial intelligence (AI) and generative computing, filling a critical void in the literature. Starting with an overview of basic diffusion model principles, we explore into the enhancements brought by conditioning or guidance and the implemented AI. We then present a taxonomy and review of cutting-edge methods in diffusion-based image captioning. Additionally, we explore applications beyond image-to-text generation, such as image-guided creative generation, text editing, and the application of AI. We also cover existing evaluation metrics, software and libraries, as well as challenges and future directions in the field.</p></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197624014465\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197624014465","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Diffusion models are increasingly favored over traditional approaches like generative adversarial networks (GANs) and auto-regressive transformers due to their remarkable generative capabilities. They demonstrate outstanding performance not solely limited to image generation and manipulation but also in text-related tasks. Despite this, existing surveys tend to concentrate on the utilization of diffusion models solely for image generation, ignoring their potential in image captioning. To address this oversight, our paper provides an exhaustive examination of image-to-text diffusion models within the landscape of artificial intelligence (AI) and generative computing, filling a critical void in the literature. Starting with an overview of basic diffusion model principles, we explore into the enhancements brought by conditioning or guidance and the implemented AI. We then present a taxonomy and review of cutting-edge methods in diffusion-based image captioning. Additionally, we explore applications beyond image-to-text generation, such as image-guided creative generation, text editing, and the application of AI. We also cover existing evaluation metrics, software and libraries, as well as challenges and future directions in the field.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.