TFF-Net: Multi-Task Visual Perception Incorporated With Temporal Feature Fusion for Driving Scene Understanding

IF 5.3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Open Journal of Intelligent Transportation Systems Pub Date : 2026-02-18 DOI:10.1109/OJITS.2026.3665906

Huei-Yung Lin;Shih-Han Wei

{"title":"TFF-Net: Multi-Task Visual Perception Incorporated With Temporal Feature Fusion for Driving Scene Understanding","authors":"Huei-Yung Lin;Shih-Han Wei","doi":"10.1109/OJITS.2026.3665906","DOIUrl":null,"url":null,"abstract":"With the rapid advancement of autonomous driving technology, accurate perception of road scenes has become a cornerstone for achieving safe and efficient self-driving. Among various perception tasks, lane detection, road marking segmentation, road surface area extraction, and object detection are core components that directly affect vehicle navigation decisions, positioning accuracy, and obstacle avoidance capability. However, conventional techniques are often trained on single-task datasets, which not only limit the sources of available training data but also fail to fully leverage the potential of diverse scenes across datasets. In this paper we propose a multi-task visual perception system. It integrates lane detection, traffic marking semantics, road surface segmentation, and object detection within a unified framework. By sharing features through the multi-task framework, the overall computational efficiency is improved. To overcome the limitation of single-task data, the proposed TFF-Net adopts cross-dataset training to effectively integrate the data sources for different tasks, and enhances the model’s generalization ability across diverse scenes. By taking consecutive images as input, the model compensates for missing information caused by occlusion or poor lighting conditions in the current frame to improve the overall perception stability. In experiments, the proposed network is evaluated on multiple datasets across four tasks. The results have demonstrated that our approach achieves performance superior to existing methods on different metrics. Code is available at <uri>https://github.com/hank890121/MTVP</uri>","PeriodicalId":100631,"journal":{"name":"IEEE Open Journal of Intelligent Transportation Systems","volume":"7 ","pages":"669-679"},"PeriodicalIF":5.3000,"publicationDate":"2026-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11398110","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of Intelligent Transportation Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11398110/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

With the rapid advancement of autonomous driving technology, accurate perception of road scenes has become a cornerstone for achieving safe and efficient self-driving. Among various perception tasks, lane detection, road marking segmentation, road surface area extraction, and object detection are core components that directly affect vehicle navigation decisions, positioning accuracy, and obstacle avoidance capability. However, conventional techniques are often trained on single-task datasets, which not only limit the sources of available training data but also fail to fully leverage the potential of diverse scenes across datasets. In this paper we propose a multi-task visual perception system. It integrates lane detection, traffic marking semantics, road surface segmentation, and object detection within a unified framework. By sharing features through the multi-task framework, the overall computational efficiency is improved. To overcome the limitation of single-task data, the proposed TFF-Net adopts cross-dataset training to effectively integrate the data sources for different tasks, and enhances the model’s generalization ability across diverse scenes. By taking consecutive images as input, the model compensates for missing information caused by occlusion or poor lighting conditions in the current frame to improve the overall perception stability. In experiments, the proposed network is evaluated on multiple datasets across four tasks. The results have demonstrated that our approach achieves performance superior to existing methods on different metrics. Code is available at https://github.com/hank890121/MTVP

查看原文本刊更多论文

基于时间特征融合的多任务视觉感知驾驶场景理解

随着自动驾驶技术的飞速发展，对道路场景的准确感知已成为实现安全高效自动驾驶的基石。在各种感知任务中，车道检测、道路标记分割、路面面积提取和目标检测是直接影响车辆导航决策、定位精度和避障能力的核心组件。然而，传统技术通常是在单任务数据集上进行训练的，这不仅限制了可用训练数据的来源，而且无法充分利用数据集上不同场景的潜力。本文提出了一种多任务视觉感知系统。它将车道检测、交通标记语义、路面分割和目标检测集成在一个统一的框架内。通过多任务框架共享特征，提高了整体计算效率。为了克服单任务数据的局限性，本文提出的TFF-Net采用跨数据集训练，有效地整合了不同任务的数据源，增强了模型跨不同场景的泛化能力。该模型通过连续图像作为输入，补偿当前帧中由于遮挡或光照条件差导致的信息缺失，提高整体感知稳定性。在实验中，提出的网络在四个任务的多个数据集上进行了评估。结果表明，我们的方法在不同的指标上实现了优于现有方法的性能。代码可从https://github.com/hank890121/MTVP获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Open Journal of Intelligent Transportation Systems

CiteScore

5.40

自引率

0.00%

发文量