IEEE Transactions on Multimedia最新文献

筛选
英文 中文
SQL-Net: Semantic Query Learning for Point-Supervised Temporal Action Localization SQL-Net:点监督时态动作定位的语义查询学习
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-24 DOI: 10.1109/TMM.2024.3521799
Yu Wang;Shengjie Zhao;Shiwei Chen
{"title":"SQL-Net: Semantic Query Learning for Point-Supervised Temporal Action Localization","authors":"Yu Wang;Shengjie Zhao;Shiwei Chen","doi":"10.1109/TMM.2024.3521799","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521799","url":null,"abstract":"Point-supervised Temporal Action Localization (PS-TAL) detects temporal intervals of actions in untrimmed videos with a label-efficient paradigm. However, most existing methods fail to learn action completeness without instance-level annotations, resulting in fragmentary region predictions. In fact, the semantic information of snippets is crucial for detecting complete actions, meaning that snippets with similar representations should be considered as the same action category. To address this issue, we propose a novel representation refinement framework with a semantic query mechanism to enhance the discriminability of snippet-level features. Concretely, we set a group of learnable queries, each representing a specific action category, and dynamically update them based on the video context. With the assistance of these queries, we expect to search for the optimal action sequence that agrees with their semantics. Besides, we leverage some reliable proposals as pseudo labels and design a refinement and completeness module to refine temporal boundaries further, so that the completeness of action instances is captured. Finally, we demonstrate the superiority of the proposed method over existing state-of-the-art approaches on THUMOS14 and ActivityNet13 benchmarks. Notably, thanks to completeness learning, our algorithm achieves significant improvements under more stringent evaluation metrics.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"84-94"},"PeriodicalIF":8.4,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LININ: Logic Integrated Neural Inference Network for Explanatory Visual Question Answering 用于解释性视觉问答的逻辑集成神经推理网络
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-24 DOI: 10.1109/TMM.2024.3521709
Dizhan Xue;Shengsheng Qian;Quan Fang;Changsheng Xu
{"title":"LININ: Logic Integrated Neural Inference Network for Explanatory Visual Question Answering","authors":"Dizhan Xue;Shengsheng Qian;Quan Fang;Changsheng Xu","doi":"10.1109/TMM.2024.3521709","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521709","url":null,"abstract":"Explanatory Visual Question Answering (EVQA) is a recently proposed multimodal reasoning task consisting of answering the visual question and generating multimodal explanations for the reasoning processes. Unlike traditional Visual Question Answering (VQA) task that only aims at predicting answers for visual questions, EVQA also aims to generate user-friendly explanations to improve the explainability and credibility of reasoning models. To date, existing methods for VQA and EVQA ignore the prompt in the question and enforce the model to predict the probabilities of all answers. Moreover, existing EVQA methods ignore the complex relationships among question words, visual regions, and explanation tokens. Therefore, in this work, we propose a Logic Integrated Neural Inference Network (LININ) to restrict the range of candidate answers based on first-order-logic (FOL) and capture cross-modal relationships to generate rational explanations. Firstly, we design a FOL-based question analysis program to fetch a small number of candidate answers. Secondly, we utilize a multimodal transformer encoder to extract visual and question features, and conduct the prediction on candidate answers. Finally, we design a multimodal explanation transformer to construct cross-modal relationships and generate rational explanations. Comprehensive experiments on benchmark datasets demonstrate the superiority of LININ compared with the state-of-the-art methods for EVQA.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"16-27"},"PeriodicalIF":8.4,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Progressive Region-to-Boundary Exploration Network for Camouflaged Object Detection 区域到边界渐进式伪装目标探测网络
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-24 DOI: 10.1109/TMM.2024.3521761
Guanghui Yue;Shangjie Wu;Tianwei Zhou;Gang Li;Jie Du;Yu Luo;Qiuping Jiang
{"title":"Progressive Region-to-Boundary Exploration Network for Camouflaged Object Detection","authors":"Guanghui Yue;Shangjie Wu;Tianwei Zhou;Gang Li;Jie Du;Yu Luo;Qiuping Jiang","doi":"10.1109/TMM.2024.3521761","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521761","url":null,"abstract":"Camouflaged object detection (COD) aims to segment targeted objects that have similar colors, textures, or shapes to their background environment. Due to the limited ability in distinguishing highly similar patterns, existing COD methods usually produce inaccurate predictions, especially around the boundary areas, when coping with complex scenes. This paper proposes a Progressive Region-to-Boundary Exploration Network (PRBE-Net) to accurately detect camouflaged objects. PRBE-Net follows an encoder-decoder framework and includes three key modules. Specifically, firstly, both high-level and low-level features of the encoder are integrated by a region and boundary exploration module to explore their complementary information for extracting the object's coarse region and fine boundary cues simultaneously. Secondly, taking the region cues as the guidance information, a Region Enhancement (RE) module is used to adaptively localize and enhance the region information at each layer of the encoder. Subsequently, considering that camouflaged objects usually have blurry boundaries, a Boundary Refinement (BR) decoder is used after the RE module to better detect the boundary areas with the assistance of boundary cues. Through top-down deep supervision, PRBE-Net can progressively refine the prediction. Extensive experiments on four datasets indicate that our PRBE-Net achieves superior results over 21 state-of-the-art COD methods. Additionally, it also shows good results on polyp segmentation, a COD-related task in the medical field.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"236-248"},"PeriodicalIF":8.4,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory-Enhanced Confidence Calibration for Class-Incremental Unsupervised Domain Adaptation
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-24 DOI: 10.1109/TMM.2024.3521834
Jiaping Yu;Muli Yang;Aming Wu;Cheng Deng
{"title":"Memory-Enhanced Confidence Calibration for Class-Incremental Unsupervised Domain Adaptation","authors":"Jiaping Yu;Muli Yang;Aming Wu;Cheng Deng","doi":"10.1109/TMM.2024.3521834","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521834","url":null,"abstract":"In this paper, we focus on Class-Incremental Unsupervised Domain Adaptation (CI-UDA), where the labeled source domain already includes all classes, and the classes in the unlabeled target domain emerge sequentially over time. This task involves addressing two main challenges. The first is the domain gap between the labeled source data and the unlabeled target data, which leads to weak generalization performance. The second is the inconsistency between the source and target category spaces at each time step, which causes catastrophic forgetting during the testing stage. Previous methods focus solely on the alignment of similar samples from different domains, which overlooks the underlying causes of the domain gap/class distribution difference. To tackle the issue, we rethink this task from a causal perspective for the first time. We first build a structural causal graph to describe the CI-UDA problem. Based on the causal graph, we present Memory-Enhanced Confidence Calibration (MECC), which aims to improve confidence in the predicted results. In particular, we argue that the domain discrepancy caused by the different styles is prone to make the model produce less confident predictions and thus weakens the generalization and continual learning abilities. To this end, we first explore using the gram matrix to generate source-style target data, which is combined with the original data to jointly train the model and thereby reduce the domain-shift impact. Second, we utilize the model of the previous time step to select corresponding samples that are used to build a memory bank, which is instrumental in alleviating catastrophic forgetting. Extensive experimental results on multiple datasets demonstrate the superiority of our method.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"610-621"},"PeriodicalIF":8.4,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143464353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Masked Attribute Description Embedding for Cloth-Changing Person Re-Identification
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-24 DOI: 10.1109/TMM.2024.3521730
Chunlei Peng;Boyu Wang;Decheng Liu;Nannan Wang;Ruimin Hu;Xinbo Gao
{"title":"Masked Attribute Description Embedding for Cloth-Changing Person Re-Identification","authors":"Chunlei Peng;Boyu Wang;Decheng Liu;Nannan Wang;Ruimin Hu;Xinbo Gao","doi":"10.1109/TMM.2024.3521730","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521730","url":null,"abstract":"Cloth-changing person re-identification (CC-ReID) aims to match persons who change clothes over long periods. The key challenge in CC-ReID is to extract cloth-irrelated features, such as face, hairstyle, body shape, and gait. Current research mainly focuses on modeling body shape using multi-modal biological features (such as silhouettes and sketches). However, it does not fully leverage the personal description information hidden in the original RGB image. Considering that there are certain attribute descriptions that remain unchanged after the changing of cloth, we propose a Masked Attribute Description Embedding (MADE) method that unifies personal visual appearance and attribute description for CC-ReID. Specifically, handling variable cloth-sensitive information, such as color and type, is challenging for effective modeling. To address this, we mask the clothes type and color information (upper body type, upper body color, lower body type, and lower body color) in the personal attribute description extracted through an attribute detection model. The masked attribute description is then connected and embedded into Transformer blocks at various levels, fusing it with the low-level to high-level features of the image. This approach compels the model to discard cloth information. Experiments are conducted on several CC-ReID benchmarks, including PRCC, LTCC, Celeb-reID-light, and LaST. Results demonstrate that MADE effectively utilizes attribute description, enhancing cloth-changing person re-identification performance, and compares favorably with state-of-the-art methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"1475-1485"},"PeriodicalIF":8.4,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143583153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Between/Within View Information Completing for Tensorial Incomplete Multi-View Clustering
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-24 DOI: 10.1109/TMM.2024.3521771
Mingze Yao;Huibing Wang;Yawei Chen;Xianping Fu
{"title":"Between/Within View Information Completing for Tensorial Incomplete Multi-View Clustering","authors":"Mingze Yao;Huibing Wang;Yawei Chen;Xianping Fu","doi":"10.1109/TMM.2024.3521771","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521771","url":null,"abstract":"Incomplete Multi-view Clustering (IMvC) receives increasing attention due to its effectiveness in solving data-missing problems. With the information loss in incomplete situations, the core of IMvC needs to consider effectively overcoming the challenge of missing views, that is, exploring the underlying correlations from available data and recovering the missing information. However, most existing IMvC methods overemphasize the recovery-first principle with integrating the existing data from different views while neglecting the influence of view consistency in IMvC task together with valuable within view information. In this paper, a novel Between/Within View Information Completing for Tensorial Incomplete Multi-view Clustering (BWIC-TIMC) has been proposed, in which between/within view information is jointly exploited for effectively completing the missing views. Specifically, the proposed method designs a dual tensor constraint module, which focuses on simultaneously exploring the view-specific correlations of incomplete views and enforcing the between view consistency across different views. With the dual tensor constraint, between/within view information can be effectively integrated for completing missing views for IMvC task. Furthermore, in order to balance different contributions of multiple views and alleviate the problem of feature degeneration, BWIC-TIMC implements an adaptive fusion graph learning strategy for consensus representation learning. Extensive comparative experiments with the-state-of-art baselines can demonstrate the effectiveness of BWIC-TIMC.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"1538-1550"},"PeriodicalIF":8.4,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143583232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DCM-Net: A Diffusion Model-Based Detection Network Integrating the Characteristics of Copy-Move Forgery
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-24 DOI: 10.1109/TMM.2024.3521685
Shaowei Weng;Jianhao Zhang;Tanguo Zhu;Lifang Yu;Chunyu Zhang
{"title":"DCM-Net: A Diffusion Model-Based Detection Network Integrating the Characteristics of Copy-Move Forgery","authors":"Shaowei Weng;Jianhao Zhang;Tanguo Zhu;Lifang Yu;Chunyu Zhang","doi":"10.1109/TMM.2024.3521685","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521685","url":null,"abstract":"Essentially, directly introducing any object detection network to perform copy-move forgery detection (CMFD) inevitably leads to low detection accuracy. Therefore, DCM-Net, an object detection network dominated by diffusion model that incorporates the characteristics of copy-move forgery, is proposed in this paper for obviously enhancing CMFD performance. DCM-Net, as the first diffusion model-based CMFD network, has the following three improvements. Firstly, the high-similarity box padding strategy pads high-similarity boxes, rather than random boxes used in diffusion model, to ground truth boxes, better guiding subsequent dual-attention detection heads (DDHs) to focus more on high-similarity regions. Secondly, different from previous deep learning based CMFD networks that utilize self-correlation calculation to indiscriminately transform all classification features extracted from feature extraction into high-similarly features, an adaptive feature combination strategy is proposed to obtain the optimal feature transformation capable of achieving the best detection performance, enabling DDHs to more effectively distinguish source and target regions. Finally, to make detection heads have more accurate source/target localization and distinguishment, DDHs equipped with efficient multi-scale attention and contextual transformer, are proposed to generate tampered features fusing the entire precise spatial position information and rich contextual global information. The experimental results carried out on three publicly available datasets including USC-ISI, CoMoFoD, and COVERAGE, demonstrate that DCM-Net outperforms several advanced algorithms in terms of similarity detection ability and source/target differentiation ability.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"503-514"},"PeriodicalIF":8.4,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSDLF-K: A Multimodal Feature Learning Approach for Sentiment Analysis in Korean Incorporating Text and Speech
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-24 DOI: 10.1109/TMM.2024.3521707
Tae-Young Kim;Jufeng Yang;Eunil Park
{"title":"MSDLF-K: A Multimodal Feature Learning Approach for Sentiment Analysis in Korean Incorporating Text and Speech","authors":"Tae-Young Kim;Jufeng Yang;Eunil Park","doi":"10.1109/TMM.2024.3521707","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521707","url":null,"abstract":"Recently, sentiment analysis research has made significant improvements in addressing sentiment and subjectivity within textual content. The advent of multimodal deep learning techniques has further broadened this scope, enabling the integration of diverse modalities such as voice and image features alongside text. However, despite these advancements, the analysis of the Korean language remains challenging due to its inherently agglutinative nature and linguistic ambiguity, primarily examined at the sentence level. To effectively address this challenge, we propose a novel Multimodal Sentimental Deep Learning Framework for Korean (MSDLF-K), which can examine not only Korean text but also its associated speech. Our framework, MSDLF-K, integrates spectrograms and waveforms from Korean voice data with embedding vectors derived from script sentences, creating a unified multimodal representation. This approach facilitates the identification of both shared and unique features within the latent space, thereby offering valuable insights into their respective impacts on sentiment analysis performance. To validate the efficacy of MSDLF-K, we conducted a set of experiments using the emotion speech synthesis dataset. Our findings demonstrate that MSDLF-K achieves a remarkable accuracy of 79.0% in valence and 81.7% in arousal for emotion classification, metrics previously unexplored in the literature. Furthermore, empirical analysis reveals the significant influence of multimodal representations, encompassing both text and voice, on enhancing emotion analysis performance. In summary, our study not only presents a pioneering solution for sentiment analysis in the Korean language but also underscores the importance of incorporating multimodal approaches for more comprehensive and accurate sentiment analysis across diverse linguistic contexts.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"1266-1276"},"PeriodicalIF":8.4,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143594412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PRA-Det: Anchor-Free Oriented Object Detection With Polar Radius Representation PRA-Det:基于极半径表示的无锚定向目标检测
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-24 DOI: 10.1109/TMM.2024.3521683
Min Dang;Gang Liu;Hao Li;Di Wang;Rong Pan;Quan Wang
{"title":"PRA-Det: Anchor-Free Oriented Object Detection With Polar Radius Representation","authors":"Min Dang;Gang Liu;Hao Li;Di Wang;Rong Pan;Quan Wang","doi":"10.1109/TMM.2024.3521683","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521683","url":null,"abstract":"Oriented object detection typically adds an additional rotation angle to the regressed horizontal bounding box (HBB) for representing the oriented bounding box (OBB). However, existing oriented object detectors based on regression angles face inconsistency between metric and loss, boundary discontinuity or square-like problems. To solve the above problems, we propose an anchor-free oriented object detector named PRA-Det, which assigns the center region of the object to regress OBBs represented by the polar radius vectors. Specifically, the proposed PRA-Det introduces a diamond-shaped positive region of category-wise attention factor to assign positive sample points to regress polar radius vectors. PRA-Det regresses the polar radius vector of the edges from the assigned sample points as the regression target and suppresses the predicted low-quality polar radius vectors through the category-wise attention factor. The OBBs defined for different protocols are uniformly encoded by the polar radius encoding module into regression targets represented by polar radius vectors. Therefore, the regression target represented by the polar radius vector does not have angle parameters during training, thus solving the angle-sensitive boundary discontinuity and square-like problems. To optimize the predicted polar radius vector, we design a spatial geometry loss to improve the detection accuracy. Furthermore, in the inference stage, the center offset score of the polar radius vector is combined with the classification score as the confidence to alleviate the inconsistency between classification and regression. The extensive experiments on public benchmarks demonstrate that the PRA-Det is highly competitive with state-of-the-art oriented object detectors and outperforms other comparison methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"145-157"},"PeriodicalIF":8.4,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quality-Guided Skin Tone Enhancement for Portrait Photography 质量指导肤色增强人像摄影
IF 8.4 1区 计算机科学
IEEE Transactions on Multimedia Pub Date : 2024-12-24 DOI: 10.1109/TMM.2024.3521829
Shiqi Gao;Huiyu Duan;Xinyue Li;Kang Fu;Yicong Peng;Qihang Xu;Yuanyuan Chang;Jia Wang;Xiongkuo Min;Guangtao Zhai
{"title":"Quality-Guided Skin Tone Enhancement for Portrait Photography","authors":"Shiqi Gao;Huiyu Duan;Xinyue Li;Kang Fu;Yicong Peng;Qihang Xu;Yuanyuan Chang;Jia Wang;Xiongkuo Min;Guangtao Zhai","doi":"10.1109/TMM.2024.3521829","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521829","url":null,"abstract":"In recent years, learning-based color and tone enhancement methods for photos have become increasingly popular. However, most learning-based image enhancement methods just learn a mapping from one distribution to another based on one dataset, lacking the ability to adjust images continuously and controllably. It is important to enable the learning-based enhancement models to adjust an image continuously, since in many cases we may want to get a slighter or stronger enhancement effect rather than one fixed adjusted result. In this paper, we propose a quality-guided image enhancement paradigm that enables image enhancement models to learn the distribution of images with various quality ratings. By learning this distribution, image enhancement models can associate image features with their corresponding perceptual qualities, which can be used to adjust images continuously according to different quality scores. To validate the effectiveness of our proposed method, a subjective quality assessment experiment is first conducted, focusing on skin tone adjustment in portrait photography. Guided by the subjective quality ratings obtained from this experiment, our method can adjust the skin tone corresponding to different quality requirements. Furthermore, an experiment conducted on 10 natural raw images corroborates the effectiveness of our model in situations with fewer subjects and fewer shots, and also demonstrates its general applicability to natural images.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"171-185"},"PeriodicalIF":8.4,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信