在机器视觉编码中增强 DNN 稳健性的高级微调程序

IF 1.8 4区计算机科学

Eurasip Journal on Image and Video Processing Pub Date : 2024-09-18 DOI:10.1186/s13640-024-00650-3

Alban Marie, Karol Desnos, Alexandre Mercat, Luce Morin, Jarno Vanne, Lu Zhang

{"title":"在机器视觉编码中增强 DNN 稳健性的高级微调程序","authors":"Alban Marie, Karol Desnos, Alexandre Mercat, Luce Morin, Jarno Vanne, Lu Zhang","doi":"10.1186/s13640-024-00650-3","DOIUrl":null,"url":null,"abstract":"<p>Video Coding for Machines (VCM) is gaining momentum in applications like autonomous driving, industry manufacturing, and surveillance, where the robustness of machine learning algorithms against coding artifacts is one of the key success factors. This work complements the MPEG/JVET standardization efforts in improving the resilience of deep neural network (DNN)-based machine models against such coding artifacts by proposing the following three advanced fine-tuning procedures for their training: (1) the progressive increase of the distortion strength as the training proceeds; (2) the incorporation of a regularization term in the original loss function to minimize the distance between predictions on compressed and original content; and (3) a joint training procedure that combines the proposed two approaches. These proposals were evaluated against a conventional fine-tuning anchor on two different machine tasks and datasets: image classification on ImageNet and semantic segmentation on Cityscapes. Our joint training procedure is shown to reduce the training time in both cases and still obtain a 2.4% coding gain in image classification and 7.4% in semantic segmentation, whereas a slight increase in training time can bring up to 9.4% better coding efficiency for the segmentation. All these coding gains are obtained without any additional inference or encoding time. As these advanced fine-tuning procedures are standard-compliant, they offer the potential to have a significant impact on visual coding for machine applications.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"208 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Advanced fine-tuning procedures to enhance DNN robustness in visual coding for machines\",\"authors\":\"Alban Marie, Karol Desnos, Alexandre Mercat, Luce Morin, Jarno Vanne, Lu Zhang\",\"doi\":\"10.1186/s13640-024-00650-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Video Coding for Machines (VCM) is gaining momentum in applications like autonomous driving, industry manufacturing, and surveillance, where the robustness of machine learning algorithms against coding artifacts is one of the key success factors. This work complements the MPEG/JVET standardization efforts in improving the resilience of deep neural network (DNN)-based machine models against such coding artifacts by proposing the following three advanced fine-tuning procedures for their training: (1) the progressive increase of the distortion strength as the training proceeds; (2) the incorporation of a regularization term in the original loss function to minimize the distance between predictions on compressed and original content; and (3) a joint training procedure that combines the proposed two approaches. These proposals were evaluated against a conventional fine-tuning anchor on two different machine tasks and datasets: image classification on ImageNet and semantic segmentation on Cityscapes. Our joint training procedure is shown to reduce the training time in both cases and still obtain a 2.4% coding gain in image classification and 7.4% in semantic segmentation, whereas a slight increase in training time can bring up to 9.4% better coding efficiency for the segmentation. All these coding gains are obtained without any additional inference or encoding time. As these advanced fine-tuning procedures are standard-compliant, they offer the potential to have a significant impact on visual coding for machine applications.</p>\",\"PeriodicalId\":49322,\"journal\":{\"name\":\"Eurasip Journal on Image and Video Processing\",\"volume\":\"208 1\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Eurasip Journal on Image and Video Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1186/s13640-024-00650-3\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eurasip Journal on Image and Video Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s13640-024-00650-3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

机器视频编码（VCM）在自动驾驶、工业制造和监控等应用领域的发展势头日益强劲，其中机器学习算法对编码伪影的鲁棒性是成功的关键因素之一。这项工作是对 MPEG/JVET 标准化工作的补充，通过为基于深度神经网络（DNN）的机器模型的训练提出以下三种先进的微调程序，提高机器模型对此类编码人工痕迹的适应能力：(1) 在训练过程中逐步增加失真强度；(2) 在原始损失函数中加入正则化项，以最小化对压缩内容和原始内容的预测之间的距离；(3) 结合上述两种方法的联合训练程序。我们在两个不同的机器任务和数据集（ImageNet 的图像分类和 Cityscapes 的语义分割）上，对照传统的微调锚对这些建议进行了评估。结果表明，我们的联合训练程序在两种情况下都能减少训练时间，并在图像分类和语义分割中分别获得 2.4% 和 7.4% 的编码增益，而训练时间的略微增加则能使分割的编码效率提高 9.4%。所有这些编码增益都是在不增加任何推理或编码时间的情况下实现的。由于这些先进的微调程序符合标准，因此有可能对机器应用的视觉编码产生重大影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Advanced fine-tuning procedures to enhance DNN robustness in visual coding for machines

查看原文本刊更多论文

Advanced fine-tuning procedures to enhance DNN robustness in visual coding for machines

Video Coding for Machines (VCM) is gaining momentum in applications like autonomous driving, industry manufacturing, and surveillance, where the robustness of machine learning algorithms against coding artifacts is one of the key success factors. This work complements the MPEG/JVET standardization efforts in improving the resilience of deep neural network (DNN)-based machine models against such coding artifacts by proposing the following three advanced fine-tuning procedures for their training: (1) the progressive increase of the distortion strength as the training proceeds; (2) the incorporation of a regularization term in the original loss function to minimize the distance between predictions on compressed and original content; and (3) a joint training procedure that combines the proposed two approaches. These proposals were evaluated against a conventional fine-tuning anchor on two different machine tasks and datasets: image classification on ImageNet and semantic segmentation on Cityscapes. Our joint training procedure is shown to reduce the training time in both cases and still obtain a 2.4% coding gain in image classification and 7.4% in semantic segmentation, whereas a slight increase in training time can bring up to 9.4% better coding efficiency for the segmentation. All these coding gains are obtained without any additional inference or encoding time. As these advanced fine-tuning procedures are standard-compliant, they offer the potential to have a significant impact on visual coding for machine applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Eurasip Journal on Image and Video Processing Engineering-Electrical and Electronic Engineering

CiteScore

7.10

自引率

0.00%

发文量

审稿时长

6.8 months

期刊介绍： EURASIP Journal on Image and Video Processing is intended for researchers from both academia and industry, who are active in the multidisciplinary field of image and video processing. The scope of the journal covers all theoretical and practical aspects of the domain, from basic research to development of application.