{"title":"基于采样的密集点云渐进属性压缩","authors":"Xiaolong Mao;Hui Yuan;Tian Guo;Shiqi Jiang;Raouf Hamzaoui;Sam Kwong","doi":"10.1109/TIP.2025.3565214","DOIUrl":null,"url":null,"abstract":"We propose an end-to-end attribute compression method for dense point clouds. The proposed method combines a frequency sampling module, an adaptive scale feature extraction module with geometry assistance, and a global hyperprior entropy model. The frequency sampling module uses a Hamming window and the Fast Fourier Transform to extract high-frequency components of the point cloud. The difference between the original point cloud and the sampled point cloud is divided into multiple sub-point clouds. These sub-point clouds are then partitioned using an octree, providing a structured input for feature extraction. The feature extraction module integrates adaptive convolutional layers and uses offset-attention to capture both local and global features. Then, a geometry-assisted attribute feature refinement module is used to refine the extracted attribute features. Finally, a global hyperprior model is introduced for entropy encoding. This model propagates hyperprior parameters from the deepest (base) layer to the other layers, further enhancing the encoding efficiency. At the decoder, a mirrored network is used to progressively restore features and reconstruct the color attribute through transposed convolutional layers. The proposed method encodes base layer information at a low bitrate and progressively adds enhancement layer information to improve reconstruction accuracy. Compared to the best anchor of the latest geometry-based point cloud compression (G-PCC) standard that was proposed by the Moving Picture Experts Group (MPEG), the proposed method can achieve an average Bjøntegaard delta bitrate of -24.58% for the Y component (resp. -21.23% for YUV components) on the MPEG Category Solid dataset and -22.48% for the Y component (resp. -17.19% for YUV components) on the MPEG Category Dense dataset. This is the first instance that a learning-based attribute codec outperforms the G-PCC standard on these datasets by following the common test conditions specified by MPEG. Our source code will be made publicly available on <uri>https://github.com/sduxlmao/SPAC</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2939-2953"},"PeriodicalIF":13.7000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SPAC: Sampling-Based Progressive Attribute Compression for Dense Point Clouds\",\"authors\":\"Xiaolong Mao;Hui Yuan;Tian Guo;Shiqi Jiang;Raouf Hamzaoui;Sam Kwong\",\"doi\":\"10.1109/TIP.2025.3565214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose an end-to-end attribute compression method for dense point clouds. The proposed method combines a frequency sampling module, an adaptive scale feature extraction module with geometry assistance, and a global hyperprior entropy model. The frequency sampling module uses a Hamming window and the Fast Fourier Transform to extract high-frequency components of the point cloud. The difference between the original point cloud and the sampled point cloud is divided into multiple sub-point clouds. These sub-point clouds are then partitioned using an octree, providing a structured input for feature extraction. The feature extraction module integrates adaptive convolutional layers and uses offset-attention to capture both local and global features. Then, a geometry-assisted attribute feature refinement module is used to refine the extracted attribute features. Finally, a global hyperprior model is introduced for entropy encoding. This model propagates hyperprior parameters from the deepest (base) layer to the other layers, further enhancing the encoding efficiency. At the decoder, a mirrored network is used to progressively restore features and reconstruct the color attribute through transposed convolutional layers. The proposed method encodes base layer information at a low bitrate and progressively adds enhancement layer information to improve reconstruction accuracy. Compared to the best anchor of the latest geometry-based point cloud compression (G-PCC) standard that was proposed by the Moving Picture Experts Group (MPEG), the proposed method can achieve an average Bjøntegaard delta bitrate of -24.58% for the Y component (resp. -21.23% for YUV components) on the MPEG Category Solid dataset and -22.48% for the Y component (resp. -17.19% for YUV components) on the MPEG Category Dense dataset. This is the first instance that a learning-based attribute codec outperforms the G-PCC standard on these datasets by following the common test conditions specified by MPEG. Our source code will be made publicly available on <uri>https://github.com/sduxlmao/SPAC</uri>\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"2939-2953\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11002415/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11002415/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
提出了一种针对密集点云的端到端属性压缩方法。该方法结合了频率采样模块、几何辅助的自适应尺度特征提取模块和全局超先验熵模型。频率采样模块使用汉明窗和快速傅里叶变换提取点云的高频成分。将原始点云与采样点云的差值划分为多个子点云。然后使用八叉树对这些子点云进行分区,为特征提取提供结构化输入。特征提取模块集成了自适应卷积层,并使用偏移注意力捕获局部和全局特征。然后,利用几何辅助属性特征细化模块对提取的属性特征进行细化。最后,引入了熵编码的全局超先验模型。该模型将超先验参数从最深层(基础层)传播到其他层,进一步提高了编码效率。在解码器处,使用镜像网络通过转置卷积层逐步恢复特征并重建颜色属性。该方法以低比特率对基础层信息进行编码,并逐步增加增强层信息以提高重建精度。与运动图像专家组(MPEG)提出的最新基于几何的点云压缩(G-PCC)标准的最佳锚点相比,该方法可以实现Y分量(resp. 1)的平均bj / n δ比特率为-24.58%。-21.23%的YUV分量)和-22.48%的Y分量(resp。-17.19% (YUV组件)在MPEG分类密集数据集上。这是基于学习的属性编解码器通过遵循MPEG指定的通用测试条件,在这些数据集上优于G-PCC标准的第一个实例。我们的源代码将在https://github.com/sduxlmao/SPAC上公开提供
SPAC: Sampling-Based Progressive Attribute Compression for Dense Point Clouds
We propose an end-to-end attribute compression method for dense point clouds. The proposed method combines a frequency sampling module, an adaptive scale feature extraction module with geometry assistance, and a global hyperprior entropy model. The frequency sampling module uses a Hamming window and the Fast Fourier Transform to extract high-frequency components of the point cloud. The difference between the original point cloud and the sampled point cloud is divided into multiple sub-point clouds. These sub-point clouds are then partitioned using an octree, providing a structured input for feature extraction. The feature extraction module integrates adaptive convolutional layers and uses offset-attention to capture both local and global features. Then, a geometry-assisted attribute feature refinement module is used to refine the extracted attribute features. Finally, a global hyperprior model is introduced for entropy encoding. This model propagates hyperprior parameters from the deepest (base) layer to the other layers, further enhancing the encoding efficiency. At the decoder, a mirrored network is used to progressively restore features and reconstruct the color attribute through transposed convolutional layers. The proposed method encodes base layer information at a low bitrate and progressively adds enhancement layer information to improve reconstruction accuracy. Compared to the best anchor of the latest geometry-based point cloud compression (G-PCC) standard that was proposed by the Moving Picture Experts Group (MPEG), the proposed method can achieve an average Bjøntegaard delta bitrate of -24.58% for the Y component (resp. -21.23% for YUV components) on the MPEG Category Solid dataset and -22.48% for the Y component (resp. -17.19% for YUV components) on the MPEG Category Dense dataset. This is the first instance that a learning-based attribute codec outperforms the G-PCC standard on these datasets by following the common test conditions specified by MPEG. Our source code will be made publicly available on https://github.com/sduxlmao/SPAC