Qian Ma, Yuan Guo, Jinlei Zhang, Haochen Zhang, Shikai Guo, Bo Ning, Yu Gu, Yu Ge
{"title":"Generative imputation of incomplete images: Leveraging multimodal information for missing pixel","authors":"Qian Ma, Yuan Guo, Jinlei Zhang, Haochen Zhang, Shikai Guo, Bo Ning, Yu Gu, Yu Ge","doi":"10.1016/j.ins.2025.122159","DOIUrl":null,"url":null,"abstract":"<div><div>Missing pixels are a common issue in real-world images, arising from various factors such as hardware malfunctions, sensor errors, and other unforeseen circumstances. This prevalence of missing pixels has made incomplete image imputation a critical area of research, garnering attention both domestically and internationally. However, as the volume of data continues to grow, traditional imputation methods that rely exclusively on information from the target images are becoming less effective, particularly in scenarios where the proportion of missing pixels is high. To address this challenge, we propose a novel imputation model named MMIGAN (Multi-modal Imputation Generative Adversarial Network), which imputes incomplete images by leveraging not only the information from the images themselves but also additional information from corresponding texts. Specifically, MMIGAN is a GAN-based model where the generator G comprises a cross-modality feature learning subnet to extract multimodal features and an MV imputation subnet to output the imputed images. Meanwhile, the discriminator D attempts to distinguish between real (observed) and fake (imputed) pixels to enhance imputation accuracy. We conducted extensive experiments on the Flickr8k, Flickr30k, and COCO datasets, demonstrating that MMIGAN surpasses state-of-the-art methods in image inpainting tasks. Under varying missing rates, the peak performance improvements across these datasets reached 52.5%, 61.0%, and 54.6% respectively, while maintaining robust minimum improvements of 38.2%, 39.5%, and 35.2%. These results provide conclusive evidence for both the superiority of MMIGAN and the effectiveness of multimodal information fusion in addressing image inpainting challenges. The code is available at <span><span>https://github.com/guoynow/MMIGAN.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"712 ","pages":"Article 122159"},"PeriodicalIF":8.1000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525002919","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Missing pixels are a common issue in real-world images, arising from various factors such as hardware malfunctions, sensor errors, and other unforeseen circumstances. This prevalence of missing pixels has made incomplete image imputation a critical area of research, garnering attention both domestically and internationally. However, as the volume of data continues to grow, traditional imputation methods that rely exclusively on information from the target images are becoming less effective, particularly in scenarios where the proportion of missing pixels is high. To address this challenge, we propose a novel imputation model named MMIGAN (Multi-modal Imputation Generative Adversarial Network), which imputes incomplete images by leveraging not only the information from the images themselves but also additional information from corresponding texts. Specifically, MMIGAN is a GAN-based model where the generator G comprises a cross-modality feature learning subnet to extract multimodal features and an MV imputation subnet to output the imputed images. Meanwhile, the discriminator D attempts to distinguish between real (observed) and fake (imputed) pixels to enhance imputation accuracy. We conducted extensive experiments on the Flickr8k, Flickr30k, and COCO datasets, demonstrating that MMIGAN surpasses state-of-the-art methods in image inpainting tasks. Under varying missing rates, the peak performance improvements across these datasets reached 52.5%, 61.0%, and 54.6% respectively, while maintaining robust minimum improvements of 38.2%, 39.5%, and 35.2%. These results provide conclusive evidence for both the superiority of MMIGAN and the effectiveness of multimodal information fusion in addressing image inpainting challenges. The code is available at https://github.com/guoynow/MMIGAN.git.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.