Eurasip Journal on Image and Video Processing最新文献

筛选
英文 中文
Just Dance: detection of human body reenactment fake videos Just Dance:检测人体再现假视频
IF 2.4 4区 计算机科学
Eurasip Journal on Image and Video Processing Pub Date : 2024-08-14 DOI: 10.1186/s13640-024-00635-2
Omran Alamayreh, Carmelo Fascella, Sara Mandelli, Benedetta Tondi, Paolo Bestagini, Mauro Barni
{"title":"Just Dance: detection of human body reenactment fake videos","authors":"Omran Alamayreh, Carmelo Fascella, Sara Mandelli, Benedetta Tondi, Paolo Bestagini, Mauro Barni","doi":"10.1186/s13640-024-00635-2","DOIUrl":"https://doi.org/10.1186/s13640-024-00635-2","url":null,"abstract":"<p>In the last few years, research on the detection of AI-generated videos has focused exclusively on detecting facial manipulations known as deepfakes. Much less attention has been paid to the detection of artificial non-facial fake videos. In this paper, we address a new forensic task, namely, the detection of fake videos of human body reenactment. To this purpose, we consider videos generated by the “Everybody Dance Now” framework. To accomplish our task, we have constructed and released a novel dataset of fake videos of this kind, referred to as FakeDance dataset. Additionally, we propose two forgery detectors to study the detectability of FakeDance kind of videos. The first one exploits spatial–temporal clues of a given video by means of hand-crafted descriptors, whereas the second detector is an end-to-end detector based on Convolutional Neural Networks (CNNs) trained on purpose. Both detectors have their peculiarities and strengths, working well in different operative scenarios. We believe that our proposed dataset together with the two detectors will contribute to the research on the detection of non-facial fake videos generated by means of AI.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"27 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PointPCA: point cloud objective quality assessment using PCA-based descriptors PointPCA:使用基于 PCA 的描述符进行点云客观质量评估
IF 2.4 4区 计算机科学
Eurasip Journal on Image and Video Processing Pub Date : 2024-08-09 DOI: 10.1186/s13640-024-00626-3
Evangelos Alexiou, Xuemei Zhou, Irene Viola, Pablo Cesar
{"title":"PointPCA: point cloud objective quality assessment using PCA-based descriptors","authors":"Evangelos Alexiou, Xuemei Zhou, Irene Viola, Pablo Cesar","doi":"10.1186/s13640-024-00626-3","DOIUrl":"https://doi.org/10.1186/s13640-024-00626-3","url":null,"abstract":"<p>Point clouds denote a prominent solution for the representation of 3D photo-realistic content in immersive applications. Similarly to other imaging modalities, quality predictions for point cloud contents are vital for a wide range of applications, enabling trade-off optimizations between data quality and data size in every processing step from acquisition to rendering. In this work, we focus on use cases that consider human end-users consuming point cloud contents and, hence, we concentrate on visual quality metrics. In particular, we propose a set of perceptually relevant descriptors based on principal component analysis (PCA) decomposition, which is applied to both geometry and texture data for full-reference point cloud quality assessment. Statistical features are derived from these descriptors to characterize local shape and appearance properties for both a reference and a distorted point cloud. The extracted statistical features are subsequently compared to provide corresponding predictions of visual quality for the distorted point cloud. As part of our method, a learning-based approach is proposed to fuse these individual predictors to a unified perceptual score. We validate the accuracy of the individual predictors, as well as the unified quality scores obtained after regression against subjectively annotated datasets, showing that our metric outperforms state-of-the-art solutions. Insights regarding design decisions are provided through exploratory studies, evaluating the performance of our metric under different parameter configurations, attribute domains, color spaces, and regression models. A software implementation of the proposed metric is made available at the following link: https://github.com/cwi-dis/pointpca.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"79 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compressed point cloud classification with point-based edge sampling 利用基于点的边缘采样进行压缩点云分类
IF 2.4 4区 计算机科学
Eurasip Journal on Image and Video Processing Pub Date : 2024-08-07 DOI: 10.1186/s13640-024-00637-0
Zhe Luo, Wenjing Jia, Stuart Perry
{"title":"Compressed point cloud classification with point-based edge sampling","authors":"Zhe Luo, Wenjing Jia, Stuart Perry","doi":"10.1186/s13640-024-00637-0","DOIUrl":"https://doi.org/10.1186/s13640-024-00637-0","url":null,"abstract":"<p>3D point cloud data, as an immersive detailed data source, has been increasingly used in numerous applications. To deal with the computational and storage challenges of this data, it needs to be compressed before transmission, storage, and processing, especially in real-time systems. Instead of decoding the compressed data stream and subsequently conducting downstream tasks on the decompressed data, analyzing point clouds directly in their compressed domain has attracted great interest. In this paper, we dive into the realm of compressed point cloud classification (CPCC), aiming to achieve high point cloud classification accuracy in a bitrate-saving way by ensuring the bit stream contains a high degree of representative information of the point cloud. Edge information is one of the most important and representative attributes of the point cloud because it can display the outlines or main shapes. However, extracting edge points or information from point cloud models is challenging due to their irregularity and sparsity. To address this challenge, we adopt an advanced edge-sampling method that enhances existing state-of-the-art (SOTA) point cloud edge-sampling techniques based on attention mechanisms and consequently develop a novel CPCC method “CPCC-PES” that focuses on point cloud’s edge information. The result obtained on the benchmark ModelNet40 dataset shows that our model has superior rate-accuracy trade-off performance than SOTA works. Specifically, our method achieves over 90% Top-1 Accuracy with a mere 0.08 bits-per-point (bpp), marking a remarkable over 96% reduction in BD-bitrate compared with specialized codecs. This means that our method only consumes 20% of the bitrate of other SOTA works while maintaining comparable accuracy. Furthermore, we propose a new evaluation metric named BD-Top-1 Accuracy to evaluate the trade-off performance between bitrate and Top-1 Accuracy for future CPCC research.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"28 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of the use of box size priors for 6D plane segment tracking from point clouds with applications in cargo packing 评估使用箱体尺寸先验进行点云 6D 平面段跟踪在货物包装中的应用
IF 2.4 4区 计算机科学
Eurasip Journal on Image and Video Processing Pub Date : 2024-08-06 DOI: 10.1186/s13640-024-00636-1
Guillermo A. Camacho-Muñoz, Sandra Esperanza Nope Rodríguez, Humberto Loaiza-Correa, João Paulo Silva do Monte Lima, Rafael Alves Roberto
{"title":"Evaluation of the use of box size priors for 6D plane segment tracking from point clouds with applications in cargo packing","authors":"Guillermo A. Camacho-Muñoz, Sandra Esperanza Nope Rodríguez, Humberto Loaiza-Correa, João Paulo Silva do Monte Lima, Rafael Alves Roberto","doi":"10.1186/s13640-024-00636-1","DOIUrl":"https://doi.org/10.1186/s13640-024-00636-1","url":null,"abstract":"<p>This paper addresses the problem of 6D pose tracking of plane segments from point clouds acquired from a mobile camera. This is motivated by manual packing operations, where an opportunity exists to enhance performance, aiding operators with instructions based on augmented reality. The approach uses as input point clouds, by its advantages for extracting geometric information relevant to estimating the 6D pose of rigid objects. The proposed algorithm begins with a RANSAC fitting stage on the raw point cloud. It then implements strategies to compute the 2D size and 6D pose of plane segments from geometric analysis of the fitted point cloud. Redundant detections are combined using a new quality factor that predicts point cloud mapping density and allows the selection of the most accurate detection. The algorithm is designed for dynamic scenes, employing a novel particle concept in the point cloud space to track detections’ validity over time. A variant of the algorithm uses box size priors (available in most packing operations) to filter out irrelevant detections. The impact of this prior knowledge is evaluated through an experimental design that compares the performance of a plane segment tracking system, considering variations in the tracking algorithm and camera speed (onboard the packing operator). The tracking algorithm varies at two levels: algorithm (<span>(A_{wpk})</span>), which integrates prior knowledge of box sizes, and algorithm (<span>(A_{woutpk})</span>), which assumes ignorance of box properties. Camera speed is evaluated at low and high speeds. Results indicate increments in the precision and F1-score associated with using the <span>(A_{wpk})</span> algorithm and consistent performance across both velocities. These results confirm the enhancement of the performance of a tracking system in a real-life and complex scenario by including previous knowledge of the elements in the scene. The proposed algorithm is limited to tracking plane segments of boxes fully supported on surfaces parallel to the ground plane and not stacked. Future works are proposed to include strategies to resolve this limitation.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"19 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141936581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Remote expert viewing, laboratory tests or objective metrics: which one(s) to trust? 远程专家观察、实验室测试或客观指标:该相信哪一个?
IF 2.4 4区 计算机科学
Eurasip Journal on Image and Video Processing Pub Date : 2024-06-17 DOI: 10.1186/s13640-024-00630-7
Mathias Wien, Joel Jung
{"title":"Remote expert viewing, laboratory tests or objective metrics: which one(s) to trust?","authors":"Mathias Wien, Joel Jung","doi":"10.1186/s13640-024-00630-7","DOIUrl":"https://doi.org/10.1186/s13640-024-00630-7","url":null,"abstract":"<p>We present a study on the validity of quality assessment in the context of the development of visual media coding schemes. The work is motivated by the need for reliable means for decision-taking in standardization efforts of MPEG and JVET, i.e., the adoption or rejection of coding tools during the development process of the coding standard. The study includes results considering three means: objective quality metrics, remote expert viewing, which is a method designed in the context of MPEG standardization, and formal laboratory visual evaluation. The focus of this work is on the comparison of pairs of coded video sequences, e.g., a proposed change and an anchor scheme at a given rate point. An aggregation of performance measurements across multiple rate points, such as the Bjøntegaard Delta rate, is out of the scope of this paper. The paper details the test setup for the subjective assessment methods and the objective quality metrics under consideration. The results of the three approaches are reviewed, analyzed, and compared with respect to their suitability for the decision-taking task. The study indicates that, subject to the chosen test content and test protocols, the results of remote expert viewing using a forced-choice scale can be considered more discriminatory than the results of naïve viewers in the laboratory tests. The results further that, in general, the well-established quality metrics, such as PSNR, SSIM, or MS-SSIM, exhibit a high rate of correct decision-making when their results are compared with both types of viewing tests. Among the learning-based metrics, VMAF and AVQT appear to be most robust. For the development process of a coding standard, the selection of the most suitable means must be guided by the context, where a small number of carefully selected objective metrics, in combination with viewing tests for unclear cases, appears recommendable.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"135 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141550115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of LiDAR point cloud compression on 3D object detection evaluated on the KITTI dataset 在 KITTI 数据集上评估激光雷达点云压缩对 3D 物体检测的影响
IF 2.4 4区 计算机科学
Eurasip Journal on Image and Video Processing Pub Date : 2024-06-17 DOI: 10.1186/s13640-024-00633-4
Nuno A. B. Martins, Luís A. da Silva Cruz, Fernando Lopes
{"title":"Impact of LiDAR point cloud compression on 3D object detection evaluated on the KITTI dataset","authors":"Nuno A. B. Martins, Luís A. da Silva Cruz, Fernando Lopes","doi":"10.1186/s13640-024-00633-4","DOIUrl":"https://doi.org/10.1186/s13640-024-00633-4","url":null,"abstract":"<p>The rapid growth on the amount of generated 3D data, particularly in the form of Light Detection And Ranging (LiDAR) point clouds (PCs), poses very significant challenges in terms of data storage, transmission, and processing. Point cloud (PC) representation of 3D visual information has shown to be a very flexible format with many applications ranging from multimedia immersive communication to machine vision tasks in the robotics and autonomous driving domains. In this paper, we investigate the performance of four reference 3D object detection techniques, when the input PCs are compressed with varying levels of degradation. Compression is performed using two MPEG standard coders based on 2D projections and octree decomposition, as well as two coding methods based on Deep Learning (DL). For the DL coding methods, we used a Joint Photographic Experts Group (JPEG) reference PC coder, that we adapted to accept LiDAR PCs in both Cartesian and cylindrical coordinate systems. The detection performance of the four reference 3D object detection methods was evaluated using both pre-trained models and models specifically trained using degraded PCs reconstructed from compressed representations. It is shown that LiDAR PCs can be compressed down to 6 bits per point with no significant degradation on the object detection precision. Furthermore, employing specifically trained detection models improves the detection capabilities even at compression rates as low as 2 bits per point. These results show that LiDAR PCs can be coded to enable efficient storage and transmission, without significant object detection performance loss.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"51 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141550114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive bridge model for compressed domain point cloud classification 用于压缩域点云分类的自适应桥模型
IF 2.4 4区 计算机科学
Eurasip Journal on Image and Video Processing Pub Date : 2024-06-08 DOI: 10.1186/s13640-024-00631-6
Abdelrahman Seleem, André F. R. Guarda, Nuno M. M. Rodrigues, Fernando Pereira
{"title":"Adaptive bridge model for compressed domain point cloud classification","authors":"Abdelrahman Seleem, André F. R. Guarda, Nuno M. M. Rodrigues, Fernando Pereira","doi":"10.1186/s13640-024-00631-6","DOIUrl":"https://doi.org/10.1186/s13640-024-00631-6","url":null,"abstract":"<p>The recent adoption of deep learning-based models for the processing and coding of multimedia signals has brought noticeable gains in performance, which have established deep learning-based solutions as the uncontested state-of-the-art both for computer vision tasks, targeting machine consumption, as well as, more recently, coding applications, targeting human visualization. Traditionally, applications requiring both coding and computer vision processing require first decoding the bitstream and then applying the computer vision methods to the decompressed multimedia signals. However, the adoption of deep learning-based solutions enables the use of compressed domain computer vision processing, with gains in performance and computational complexity over the decompressed domain approach. For point clouds (PCs), these gains have been demonstrated in the single available compressed domain computer vision processing solution, named Compressed Domain PC Classifier, which processes JPEG Pleno PC coding (PCC) compressed streams using a PC classifier largely compatible with the state-of-the-art spatial domain PointGrid classifier. However, the available Compressed Domain PC Classifier presents strong limitations by imposing a single, specific input size which is associated to specific JPEG Pleno PCC configurations; this limits the compression performance as these configurations are not ideal for all PCs due to their different characteristics, notably density. To overcome these limitations, this paper proposes the first Adaptive Compressed Domain PC Classifier solution which includes a novel adaptive bridge model that allows to process the JPEG Pleno PCC encoded bit streams using different coding configurations, now maximizing the compression efficiency. Experimental results show that the novel Adaptive Compressed Domain PC Classifier allows JPEG PCC to achieve better compression performance by not imposing a single, specific coding configuration for all PCs, regardless of its different characteristics. Moreover, the added adaptability power can achieve slightly better PC classification performance than the previous Compressed Domain PC Classifier and largely better PC classification performance (and lower number of weights) than the PointGrid PC classifier working in the decompressed domain.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"15 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning-based light field imaging: an overview 基于学习的光场成像:概述
IF 2.4 4区 计算机科学
Eurasip Journal on Image and Video Processing Pub Date : 2024-05-30 DOI: 10.1186/s13640-024-00628-1
Saeed Mahmoudpour, Carla Pagliari, Peter Schelkens
{"title":"Learning-based light field imaging: an overview","authors":"Saeed Mahmoudpour, Carla Pagliari, Peter Schelkens","doi":"10.1186/s13640-024-00628-1","DOIUrl":"https://doi.org/10.1186/s13640-024-00628-1","url":null,"abstract":"<p>Conventional photography can only provide a two-dimensional image of the scene, whereas emerging imaging modalities such as light field enable the representation of higher dimensional visual information by capturing light rays from different directions. Light fields provide immersive experiences, a sense of presence in the scene, and can enhance different vision tasks. Hence, research into light field processing methods has become increasingly popular. It does, however, come at the cost of higher data volume and computational complexity. With the growing deployment of machine-learning and deep architectures in image processing applications, a paradigm shift toward learning-based approaches has also been observed in the design of light field processing methods. Various learning-based approaches are developed to process the high volume of light field data efficiently for different vision tasks while improving performance. Taking into account the diversity of light field vision tasks and the deployed learning-based frameworks, it is necessary to survey the scattered learning-based works in the domain to gain insight into the current trends and challenges. This paper aims to review the existing learning-based solutions for light field imaging and to summarize the most promising frameworks. Moreover, evaluation methods and available light field datasets are highlighted. Lastly, the review concludes with a brief outlook for future research directions.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"41 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141189137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-automated computer vision-based tracking of multiple industrial entities: a framework and dataset creation approach 基于计算机视觉的多工业实体半自动跟踪:框架和数据集创建方法
IF 2.4 4区 计算机科学
Eurasip Journal on Image and Video Processing Pub Date : 2024-03-22 DOI: 10.1186/s13640-024-00623-6
{"title":"Semi-automated computer vision-based tracking of multiple industrial entities: a framework and dataset creation approach","authors":"","doi":"10.1186/s13640-024-00623-6","DOIUrl":"https://doi.org/10.1186/s13640-024-00623-6","url":null,"abstract":"<h3>Abstract</h3> <p>This contribution presents the TOMIE framework (Tracking Of Multiple Industrial Entities), a framework for the continuous tracking of industrial entities (e.g., pallets, crates, barrels) over a network of, in this example, six RGB cameras. This framework makes use of multiple sensors, data pipelines, and data annotation procedures, and is described in detail in this contribution. With the vision of a fully automated tracking system for industrial entities in mind, it enables researchers to efficiently capture high-quality data in an industrial setting. Using this framework, an image dataset, the TOMIE dataset, is created, which at the same time is used to gauge the framework’s validity. This dataset contains annotation files for 112,860 frames and 640,936 entity instances that are captured from a set of six cameras that perceive a large indoor space. This dataset out-scales comparable datasets by a factor of four and is made up of scenarios, drawn from industrial applications from the sector of warehousing. Three tracking algorithms, namely ByteTrack, Bot-Sort, and SiamMOT, are applied to this dataset, serving as a proof-of-concept and providing tracking results that are comparable to the state of the art.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"3 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140198850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast CU size decision and intra-prediction mode decision method for H.266/VVC 针对 H.266/VVC 的快速 CU 大小决策和内部预测模式决策方法
IF 2.4 4区 计算机科学
Eurasip Journal on Image and Video Processing Pub Date : 2024-03-18 DOI: 10.1186/s13640-024-00622-7
{"title":"Fast CU size decision and intra-prediction mode decision method for H.266/VVC","authors":"","doi":"10.1186/s13640-024-00622-7","DOIUrl":"https://doi.org/10.1186/s13640-024-00622-7","url":null,"abstract":"<h3>Abstract</h3> <p>H.266/Versatile Video Coding (VVC) is the most recent video coding standard developed by the Joint Video Experts Team (JVET). The quad-tree with nested multi-type tree (QTMT) architecture that improves the compression performance of H.266/VVC is introduced. Moreover, H.266/VVC contains a greater number of intra-prediction modes than H.265/High Efficiency Video Coding (HEVC), totalling 67. However, these lead to extremely the coding computational complexity. To cope with the above issues, a fast intra-coding unit (CU) size decision method and a fast intra-prediction mode decision method are proposed in this paper. Specifically, the trained Support Vector Machine (SVM) classifier models are utilized for determining CU partition mode in a fast CU size decision scheme. Furthermore, the quantity of intra-prediction modes added to the RDO mode set decreases in a fast intra-prediction mode decision scheme based on the improved search step. Simulation results illustrate that the proposed overall algorithm can decrease 55.24% encoding runtime with negligible BDBR.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"123 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140172567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信