MetaSSC: Enhancing 3D semantic scene completion for autonomous driving through meta-learning and long-sequence modeling

IF 12.5 Q1 TRANSPORTATION
Yansong Qu , Zixuan Xu , Zilin Huang , Zihao Sheng , Sikai Chen , Tiantian Chen
{"title":"MetaSSC: Enhancing 3D semantic scene completion for autonomous driving through meta-learning and long-sequence modeling","authors":"Yansong Qu ,&nbsp;Zixuan Xu ,&nbsp;Zilin Huang ,&nbsp;Zihao Sheng ,&nbsp;Sikai Chen ,&nbsp;Tiantian Chen","doi":"10.1016/j.commtr.2025.100184","DOIUrl":null,"url":null,"abstract":"<div><div>Semantic scene completion (SSC) plays a pivotal role in achieving comprehensive perceptions of autonomous driving systems. However, existing methods often neglect the high deployment costs of SSC in real-world applications, and traditional architectures such as three-dimensional (3D) convolutional neural networks (3D CNNs) and self-attention mechanisms struggle to efficiently capture long-range dependencies within 3D voxel grids, limiting their effectiveness. To address these challenges, we propose MetaSSC, a novel meta-learning-based framework for SSC that leverages deformable convolution, large-kernel attention, and the Mamba (D-LKA-M) model. Our approach begins with a voxel-based semantic segmentation (SS) pretraining task, which is designed to explore the semantics and geometry of incomplete regions while acquiring transferable meta-knowledge. Using simulated cooperative perception datasets, we supervise the training of a single vehicle's perception via the aggregated sensor data from multiple nearby connected autonomous vehicles (CAVs), generating richer and more comprehensive labels. This meta-knowledge is then adapted to the target domain through a dual-phase training strategy—without adding extra model parameters—ensuring efficient deployment. To further enhance the model's ability to capture long-sequence relationships in 3D voxel grids, we integrate Mamba blocks with deformable convolution and large-kernel attention into the backbone network. Extensive experiments show that MetaSSC achieves state-of-the-art performance, surpassing competing models by a significant margin while also reducing deployment costs.</div></div>","PeriodicalId":100292,"journal":{"name":"Communications in Transportation Research","volume":"5 ","pages":"Article 100184"},"PeriodicalIF":12.5000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications in Transportation Research","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772424725000241","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION","Score":null,"Total":0}
引用次数: 0

Abstract

Semantic scene completion (SSC) plays a pivotal role in achieving comprehensive perceptions of autonomous driving systems. However, existing methods often neglect the high deployment costs of SSC in real-world applications, and traditional architectures such as three-dimensional (3D) convolutional neural networks (3D CNNs) and self-attention mechanisms struggle to efficiently capture long-range dependencies within 3D voxel grids, limiting their effectiveness. To address these challenges, we propose MetaSSC, a novel meta-learning-based framework for SSC that leverages deformable convolution, large-kernel attention, and the Mamba (D-LKA-M) model. Our approach begins with a voxel-based semantic segmentation (SS) pretraining task, which is designed to explore the semantics and geometry of incomplete regions while acquiring transferable meta-knowledge. Using simulated cooperative perception datasets, we supervise the training of a single vehicle's perception via the aggregated sensor data from multiple nearby connected autonomous vehicles (CAVs), generating richer and more comprehensive labels. This meta-knowledge is then adapted to the target domain through a dual-phase training strategy—without adding extra model parameters—ensuring efficient deployment. To further enhance the model's ability to capture long-sequence relationships in 3D voxel grids, we integrate Mamba blocks with deformable convolution and large-kernel attention into the backbone network. Extensive experiments show that MetaSSC achieves state-of-the-art performance, surpassing competing models by a significant margin while also reducing deployment costs.
MetaSSC:通过元学习和长序列建模增强自动驾驶的3D语义场景完成
语义场景完成(SSC)在实现自动驾驶系统的全面感知方面发挥着关键作用。然而,现有的方法往往忽略了SSC在实际应用中的高部署成本,而传统的架构,如三维卷积神经网络(3D cnn)和自关注机制,难以有效地捕获三维体素网格内的远程依赖关系,限制了它们的有效性。为了应对这些挑战,我们提出了MetaSSC,这是一种新的基于元学习的SSC框架,它利用了可变形卷积、大核注意和Mamba (D-LKA-M)模型。我们的方法从基于体素的语义分割(SS)预训练任务开始,该任务旨在探索不完整区域的语义和几何,同时获取可转移的元知识。使用模拟的协同感知数据集,我们通过来自多个附近连接的自动驾驶汽车(cav)的聚合传感器数据来监督单个车辆的感知训练,生成更丰富和更全面的标签。然后,通过双阶段训练策略(不添加额外的模型参数)使元知识适应目标领域,从而确保有效部署。为了进一步增强模型在3D体素网格中捕获长序列关系的能力,我们将具有可变形卷积和大核关注的Mamba块集成到骨干网络中。大量的实验表明,MetaSSC达到了最先进的性能,大大超过了竞争模型,同时也降低了部署成本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
15.20
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信