LaTP: LiDAR-aided multimodal token pruning for efficient trajectory prediction of autonomous driving

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-06-06 DOI:10.1016/j.neunet.2025.107673

Yantao Lu , Shiqi Sun , Ning Liu , Bo Jiang , Yilan Li , Jinchao Chen , Ying Zhang , Yichen Zhu , Senem Velipasalar

{"title":"LaTP: LiDAR-aided multimodal token pruning for efficient trajectory prediction of autonomous driving","authors":"Yantao Lu , Shiqi Sun , Ning Liu , Bo Jiang , Yilan Li , Jinchao Chen , Ying Zhang , Yichen Zhu , Senem Velipasalar","doi":"10.1016/j.neunet.2025.107673","DOIUrl":null,"url":null,"abstract":"<div><div>The rapid advancement of Large Vision Language Models (LVLMs) has spurred significant progress in autonomous driving, especially in end-to-end trajectory prediction, which is crucial for enabling autonomous driving across diverse traffic scenarios. Nevertheless, the onboard computational requirements of autonomous vehicles present challenges for deploying LVLMs on resource-constrained devices, as they demand substantial processing power. Token pruning is one of the most promising approach that achieves considerable inference speed gains without requiring additional model training. While token pruning has demonstrated its efficacy in various domains, it appears that the current approaches are designed for generalized tasks and have not been tailored to address the unique demands of trajectory prediction in autonomous driving. Specifically, within the context of trajectory prediction of autonomous driving, there are two considerations that have not been adequately addressed: (i) content information, where irrelevant visual elements, despite their complex features, cannot be pruned effectively due to their non-trivial appearance; (ii) distance information, which is critical for accurate trajectory prediction but often overlooked by conventional pruning approaches. As a result, directly applying existing pruning methods to LVLMs without considering these crucial differences may lead to a degradation in performance. To overcome these challenges, we propose a novel token pruning method, LiDAR-aided Token Prune (LaTP), specifically designed for LVLM-based trajectory prediction in autonomous driving. LaTP efficiently integrates LiDAR points to provide distance information for camera inputs and uses a content- and distance-aware token importance indicator to discard visual tokens that are inconsequential for driving. This approach significantly improves inference speed without compromising control accuracy. Experiments on the nuScenes dataset validate the effectiveness of our method, showing superior performance compared to general token pruning baselines. Specifically, LaTP achieves a pruning ratio of up to 75% while maintaining an Average Displacement Error (ADE) of 2.03 meters and a Collision Rate (col.) of 2.35%, demonstrating its ability to significantly reduce computational load without sacrificing prediction accuracy.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"190 ","pages":"Article 107673"},"PeriodicalIF":6.3000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025005532","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The rapid advancement of Large Vision Language Models (LVLMs) has spurred significant progress in autonomous driving, especially in end-to-end trajectory prediction, which is crucial for enabling autonomous driving across diverse traffic scenarios. Nevertheless, the onboard computational requirements of autonomous vehicles present challenges for deploying LVLMs on resource-constrained devices, as they demand substantial processing power. Token pruning is one of the most promising approach that achieves considerable inference speed gains without requiring additional model training. While token pruning has demonstrated its efficacy in various domains, it appears that the current approaches are designed for generalized tasks and have not been tailored to address the unique demands of trajectory prediction in autonomous driving. Specifically, within the context of trajectory prediction of autonomous driving, there are two considerations that have not been adequately addressed: (i) content information, where irrelevant visual elements, despite their complex features, cannot be pruned effectively due to their non-trivial appearance; (ii) distance information, which is critical for accurate trajectory prediction but often overlooked by conventional pruning approaches. As a result, directly applying existing pruning methods to LVLMs without considering these crucial differences may lead to a degradation in performance. To overcome these challenges, we propose a novel token pruning method, LiDAR-aided Token Prune (LaTP), specifically designed for LVLM-based trajectory prediction in autonomous driving. LaTP efficiently integrates LiDAR points to provide distance information for camera inputs and uses a content- and distance-aware token importance indicator to discard visual tokens that are inconsequential for driving. This approach significantly improves inference speed without compromising control accuracy. Experiments on the nuScenes dataset validate the effectiveness of our method, showing superior performance compared to general token pruning baselines. Specifically, LaTP achieves a pruning ratio of up to 75% while maintaining an Average Displacement Error (ADE) of 2.03 meters and a Collision Rate (col.) of 2.35%, demonstrating its ability to significantly reduce computational load without sacrificing prediction accuracy.

查看原文本刊更多论文

LaTP：激光雷达辅助的多模态标记修剪，用于自动驾驶的有效轨迹预测

大视觉语言模型（LVLMs）的快速发展推动了自动驾驶领域的重大进展，特别是在端到端轨迹预测方面，这对于实现不同交通场景下的自动驾驶至关重要。然而，自动驾驶汽车的车载计算需求给在资源受限的设备上部署lvlm带来了挑战，因为它们需要大量的处理能力。令牌修剪是最有希望的方法之一，它可以在不需要额外模型训练的情况下获得相当大的推理速度提升。虽然令牌修剪已经在各个领域证明了它的有效性，但目前的方法似乎是为通用任务设计的，并没有针对自动驾驶中轨迹预测的独特需求进行定制。具体来说，在自动驾驶轨迹预测的背景下，有两个考虑因素尚未得到充分解决：(i)内容信息，其中不相关的视觉元素尽管具有复杂的特征，但由于其非琐碎的外观，无法有效地修剪；（ii）距离信息，这对准确的轨迹预测至关重要，但通常被传统的修剪方法所忽视。因此，直接将现有的修剪方法应用于lvlm而不考虑这些关键差异可能会导致性能下降。为了克服这些挑战，我们提出了一种新的令牌修剪方法，激光雷达辅助令牌修剪（LaTP），专为自动驾驶中基于lvlm的轨迹预测而设计。LaTP有效地集成了激光雷达点，为摄像头输入提供距离信息，并使用内容和距离感知标记重要性指示器来丢弃对驾驶无关紧要的视觉标记。这种方法在不影响控制精度的情况下显著提高了推理速度。在nuScenes数据集上的实验验证了我们的方法的有效性，与一般标记修剪基线相比，显示出优越的性能。具体来说，LaTP在保持2.03米的平均位移误差（ADE）和2.35%的碰撞率（col）的同时，实现了高达75%的剪枝率，这表明它能够在不牺牲预测精度的情况下显著降低计算负荷。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.