Sievenet:一种利用H.265编解码结构的高效视频目标检测模型

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) Pub Date : 2023-06-04 DOI:10.1109/ICASSPW59220.2023.10193722

O. Koyun, B. U. Töreyin

{"title":"Sievenet:一种利用H.265编解码结构的高效视频目标检测模型","authors":"O. Koyun, B. U. Töreyin","doi":"10.1109/ICASSPW59220.2023.10193722","DOIUrl":null,"url":null,"abstract":"In the field of video content analysis, object detection is a crucial task. The High Efficient Video Coding (H.265, HEVC) standard’s coding structures are strongly correlated with the video content, creating an opportunity to utilize these structures for video object detection in a computationally efficient way. To address this, we present a video object detection method that partitions frames into macroblocks based on the H.265 structure. Blocks with spatially high-frequency content go through a dynamic-layer approach that subjects them to deeper analysis with more layers, while blocks with spatially low-frequency content undergo fewer layers to enable a lower computational load. Results on ImageNet-Vid Dataset indicate that our approach has the potential to save significant computational resources while maintaining accurate object detection performance.","PeriodicalId":158726,"journal":{"name":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sievenet: An Efficient Model Utilizing H.265 Codec Structure for Video Object Detection\",\"authors\":\"O. Koyun, B. U. Töreyin\",\"doi\":\"10.1109/ICASSPW59220.2023.10193722\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the field of video content analysis, object detection is a crucial task. The High Efficient Video Coding (H.265, HEVC) standard’s coding structures are strongly correlated with the video content, creating an opportunity to utilize these structures for video object detection in a computationally efficient way. To address this, we present a video object detection method that partitions frames into macroblocks based on the H.265 structure. Blocks with spatially high-frequency content go through a dynamic-layer approach that subjects them to deeper analysis with more layers, while blocks with spatially low-frequency content undergo fewer layers to enable a lower computational load. Results on ImageNet-Vid Dataset indicate that our approach has the potential to save significant computational resources while maintaining accurate object detection performance.\",\"PeriodicalId\":158726,\"journal\":{\"name\":\"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSPW59220.2023.10193722\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSPW59220.2023.10193722","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在视频内容分析领域中，目标检测是一项至关重要的任务。高效视频编码(H.265, HEVC)标准的编码结构与视频内容密切相关，这为利用这些结构以高效的计算方式进行视频目标检测创造了机会。为了解决这个问题，我们提出了一种基于H.265结构将帧划分为宏块的视频对象检测方法。具有空间高频内容的块通过动态层方法，使用更多层对其进行更深入的分析，而具有空间低频内容的块通过更少的层来实现更低的计算负载。在ImageNet-Vid数据集上的结果表明，我们的方法在保持准确目标检测性能的同时，有可能节省大量的计算资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sievenet: An Efficient Model Utilizing H.265 Codec Structure for Video Object Detection

In the field of video content analysis, object detection is a crucial task. The High Efficient Video Coding (H.265, HEVC) standard’s coding structures are strongly correlated with the video content, creating an opportunity to utilize these structures for video object detection in a computationally efficient way. To address this, we present a video object detection method that partitions frames into macroblocks based on the H.265 structure. Blocks with spatially high-frequency content go through a dynamic-layer approach that subjects them to deeper analysis with more layers, while blocks with spatially low-frequency content undergo fewer layers to enable a lower computational load. Results on ImageNet-Vid Dataset indicate that our approach has the potential to save significant computational resources while maintaining accurate object detection performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)

自引率

0.00%

发文量