Qisen Ma,Yan Huang,Zikun Liu,Hyunhee Park,Liang Wang
{"title":"Hierarchical Multimodal Knowledge Matching for Training-Free Open-Vocabulary Object Detection.","authors":"Qisen Ma,Yan Huang,Zikun Liu,Hyunhee Park,Liang Wang","doi":"10.1109/tip.2025.3618408","DOIUrl":"https://doi.org/10.1109/tip.2025.3618408","url":null,"abstract":"Open-Vocabulary Object Detection (OVOD) aims to leverage the generalization capabilities of pre-trained vision-language models for detecting objects beyond the trained categories. Existing methods mostly focus on supervised learning strategies based on available training data, which might be suboptimal for data-limited novel categories. To tackle this challenge, this paper presents a Hierarchical Multimodal Knowledge Matching method (HMKM) to better represent novel categories and match them with region features. Specifically, HMKM includes a set of object prototype knowledge that is obtained using limited category-specific images, acting as off-the-shelf category representations. In addition, HMKM also includes a set of attribute prototype knowledge to represent key attributes of categories at a fine-grained level, with the goal to distinguish one category from its visually similar ones. During inference, two sets of object and attribute prototype knowledge are adaptively combined to match categories with region features. The proposed HMKM is training-free and can be easily integrated as a plug-and-play module into existing OVOD models. Extensive experiments demonstrate that our HMKM significantly improves the performance when detecting novel categories across various backbones and datasets.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"117 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145288508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Delving into the Training Dynamics for Image Classification","authors":"Mengyang Li, Xiaoling Zhou, Ou Wu","doi":"10.1109/tip.2025.3618395","DOIUrl":"https://doi.org/10.1109/tip.2025.3618395","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"28 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145282978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shengxi Li, Zifu Zhang, Mai Xu, Lai Jiang, Yufan Liu, Ce Zhu
{"title":"Hierarchical Semantic Compression for Consistent Image Semantic Restoration","authors":"Shengxi Li, Zifu Zhang, Mai Xu, Lai Jiang, Yufan Liu, Ce Zhu","doi":"10.1109/tip.2025.3618379","DOIUrl":"https://doi.org/10.1109/tip.2025.3618379","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"40 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145282983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renqiu Xia, Hancheng Ye, Xiangchao Yan, Qi Liu, Hongbin Zhou, Zijun Chen, Botian Shi, Junchi Yan, Bo Zhang
{"title":"ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning","authors":"Renqiu Xia, Hancheng Ye, Xiangchao Yan, Qi Liu, Hongbin Zhou, Zijun Chen, Botian Shi, Junchi Yan, Bo Zhang","doi":"10.1109/tip.2025.3607618","DOIUrl":"https://doi.org/10.1109/tip.2025.3607618","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"1 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145282982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Robustness of Point Cloud Analysis Through Perturbation Simulation and Distortion-Guided Feature Augmentation","authors":"Jingming He, Chongyi Li, Shiqi Wang, Sam Kwong","doi":"10.1109/tip.2025.3618411","DOIUrl":"https://doi.org/10.1109/tip.2025.3618411","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"19 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145282981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaxuan Wang,Huiyuan Fu,Wenkai Zheng,Xicong Wang,Xin Wang,Heng Zhang,Huadong Ma
{"title":"Rethinking the Low-Light Video Enhancement: Benchmark Datasets and Methods.","authors":"Jiaxuan Wang,Huiyuan Fu,Wenkai Zheng,Xicong Wang,Xin Wang,Heng Zhang,Huadong Ma","doi":"10.1109/tip.2025.3616639","DOIUrl":"https://doi.org/10.1109/tip.2025.3616639","url":null,"abstract":"Low-light video enhancement is a critical task in computer vision with a wide range of applications. However, there is a lack of high-quality benchmark datasets in this field. To address this issue, we collect a high-quality low-light video dataset using a well-designed camera system. The videos in our dataset feature apparent camera motion and strict spatial alignment. In order to achieve general low-light video enhancement, we propose a Retinex-based method called Light Adjustable Network (LAN). LAN iteratively adjusts the brightness and adapts to different lighting conditions in various real-world scenarios, producing visually appealing results. We further develop a new dataset capture method and low-light video enhancement method to address the limitation of our previous dataset in capturing dynamic scenes and previous method. The new camera setup and capture method enable the recording of real continuous videos and generate the new dataset. Our new low-light video enhancement method, LAN++, leverages a new inter-frame relationship, difference images. It utilizes the texture information contained in the difference images of dynamic scenes to supplement the high-frequency details of the original features, which produce sharper and more realistic output images. The extensive experiments demonstrate the superiority of our low-light video dataset and enhancement method. Our dataset and code will be publicly available.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"7 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145246667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}