Alban Marie, Karol Desnos, Alexandre Mercat, Luce Morin, Jarno Vanne, Lu Zhang
{"title":"在机器视觉编码中增强 DNN 稳健性的高级微调程序","authors":"Alban Marie, Karol Desnos, Alexandre Mercat, Luce Morin, Jarno Vanne, Lu Zhang","doi":"10.1186/s13640-024-00650-3","DOIUrl":null,"url":null,"abstract":"<p>Video Coding for Machines (VCM) is gaining momentum in applications like autonomous driving, industry manufacturing, and surveillance, where the robustness of machine learning algorithms against coding artifacts is one of the key success factors. This work complements the MPEG/JVET standardization efforts in improving the resilience of deep neural network (DNN)-based machine models against such coding artifacts by proposing the following three advanced fine-tuning procedures for their training: (1) the progressive increase of the distortion strength as the training proceeds; (2) the incorporation of a regularization term in the original loss function to minimize the distance between predictions on compressed and original content; and (3) a joint training procedure that combines the proposed two approaches. These proposals were evaluated against a conventional fine-tuning anchor on two different machine tasks and datasets: image classification on ImageNet and semantic segmentation on Cityscapes. Our joint training procedure is shown to reduce the training time in both cases and still obtain a 2.4% coding gain in image classification and 7.4% in semantic segmentation, whereas a slight increase in training time can bring up to 9.4% better coding efficiency for the segmentation. All these coding gains are obtained without any additional inference or encoding time. As these advanced fine-tuning procedures are standard-compliant, they offer the potential to have a significant impact on visual coding for machine applications.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"208 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Advanced fine-tuning procedures to enhance DNN robustness in visual coding for machines\",\"authors\":\"Alban Marie, Karol Desnos, Alexandre Mercat, Luce Morin, Jarno Vanne, Lu Zhang\",\"doi\":\"10.1186/s13640-024-00650-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Video Coding for Machines (VCM) is gaining momentum in applications like autonomous driving, industry manufacturing, and surveillance, where the robustness of machine learning algorithms against coding artifacts is one of the key success factors. This work complements the MPEG/JVET standardization efforts in improving the resilience of deep neural network (DNN)-based machine models against such coding artifacts by proposing the following three advanced fine-tuning procedures for their training: (1) the progressive increase of the distortion strength as the training proceeds; (2) the incorporation of a regularization term in the original loss function to minimize the distance between predictions on compressed and original content; and (3) a joint training procedure that combines the proposed two approaches. These proposals were evaluated against a conventional fine-tuning anchor on two different machine tasks and datasets: image classification on ImageNet and semantic segmentation on Cityscapes. Our joint training procedure is shown to reduce the training time in both cases and still obtain a 2.4% coding gain in image classification and 7.4% in semantic segmentation, whereas a slight increase in training time can bring up to 9.4% better coding efficiency for the segmentation. All these coding gains are obtained without any additional inference or encoding time. As these advanced fine-tuning procedures are standard-compliant, they offer the potential to have a significant impact on visual coding for machine applications.</p>\",\"PeriodicalId\":49322,\"journal\":{\"name\":\"Eurasip Journal on Image and Video Processing\",\"volume\":\"208 1\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Eurasip Journal on Image and Video Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1186/s13640-024-00650-3\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eurasip Journal on Image and Video Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s13640-024-00650-3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Advanced fine-tuning procedures to enhance DNN robustness in visual coding for machines
Video Coding for Machines (VCM) is gaining momentum in applications like autonomous driving, industry manufacturing, and surveillance, where the robustness of machine learning algorithms against coding artifacts is one of the key success factors. This work complements the MPEG/JVET standardization efforts in improving the resilience of deep neural network (DNN)-based machine models against such coding artifacts by proposing the following three advanced fine-tuning procedures for their training: (1) the progressive increase of the distortion strength as the training proceeds; (2) the incorporation of a regularization term in the original loss function to minimize the distance between predictions on compressed and original content; and (3) a joint training procedure that combines the proposed two approaches. These proposals were evaluated against a conventional fine-tuning anchor on two different machine tasks and datasets: image classification on ImageNet and semantic segmentation on Cityscapes. Our joint training procedure is shown to reduce the training time in both cases and still obtain a 2.4% coding gain in image classification and 7.4% in semantic segmentation, whereas a slight increase in training time can bring up to 9.4% better coding efficiency for the segmentation. All these coding gains are obtained without any additional inference or encoding time. As these advanced fine-tuning procedures are standard-compliant, they offer the potential to have a significant impact on visual coding for machine applications.
期刊介绍:
EURASIP Journal on Image and Video Processing is intended for researchers from both academia and industry, who are active in the multidisciplinary field of image and video processing. The scope of the journal covers all theoretical and practical aspects of the domain, from basic research to development of application.