TFF-Net: Multi-Task Visual Perception Incorporated With Temporal Feature Fusion for Driving Scene Understanding

IF 5.3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Huei-Yung Lin;Shih-Han Wei
{"title":"TFF-Net: Multi-Task Visual Perception Incorporated With Temporal Feature Fusion for Driving Scene Understanding","authors":"Huei-Yung Lin;Shih-Han Wei","doi":"10.1109/OJITS.2026.3665906","DOIUrl":null,"url":null,"abstract":"With the rapid advancement of autonomous driving technology, accurate perception of road scenes has become a cornerstone for achieving safe and efficient self-driving. Among various perception tasks, lane detection, road marking segmentation, road surface area extraction, and object detection are core components that directly affect vehicle navigation decisions, positioning accuracy, and obstacle avoidance capability. However, conventional techniques are often trained on single-task datasets, which not only limit the sources of available training data but also fail to fully leverage the potential of diverse scenes across datasets. In this paper we propose a multi-task visual perception system. It integrates lane detection, traffic marking semantics, road surface segmentation, and object detection within a unified framework. By sharing features through the multi-task framework, the overall computational efficiency is improved. To overcome the limitation of single-task data, the proposed TFF-Net adopts cross-dataset training to effectively integrate the data sources for different tasks, and enhances the model’s generalization ability across diverse scenes. By taking consecutive images as input, the model compensates for missing information caused by occlusion or poor lighting conditions in the current frame to improve the overall perception stability. In experiments, the proposed network is evaluated on multiple datasets across four tasks. The results have demonstrated that our approach achieves performance superior to existing methods on different metrics. Code is available at <uri>https://github.com/hank890121/MTVP</uri>","PeriodicalId":100631,"journal":{"name":"IEEE Open Journal of Intelligent Transportation Systems","volume":"7 ","pages":"669-679"},"PeriodicalIF":5.3000,"publicationDate":"2026-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11398110","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of Intelligent Transportation Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11398110/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

With the rapid advancement of autonomous driving technology, accurate perception of road scenes has become a cornerstone for achieving safe and efficient self-driving. Among various perception tasks, lane detection, road marking segmentation, road surface area extraction, and object detection are core components that directly affect vehicle navigation decisions, positioning accuracy, and obstacle avoidance capability. However, conventional techniques are often trained on single-task datasets, which not only limit the sources of available training data but also fail to fully leverage the potential of diverse scenes across datasets. In this paper we propose a multi-task visual perception system. It integrates lane detection, traffic marking semantics, road surface segmentation, and object detection within a unified framework. By sharing features through the multi-task framework, the overall computational efficiency is improved. To overcome the limitation of single-task data, the proposed TFF-Net adopts cross-dataset training to effectively integrate the data sources for different tasks, and enhances the model’s generalization ability across diverse scenes. By taking consecutive images as input, the model compensates for missing information caused by occlusion or poor lighting conditions in the current frame to improve the overall perception stability. In experiments, the proposed network is evaluated on multiple datasets across four tasks. The results have demonstrated that our approach achieves performance superior to existing methods on different metrics. Code is available at https://github.com/hank890121/MTVP
基于时间特征融合的多任务视觉感知驾驶场景理解
随着自动驾驶技术的飞速发展,对道路场景的准确感知已成为实现安全高效自动驾驶的基石。在各种感知任务中,车道检测、道路标记分割、路面面积提取和目标检测是直接影响车辆导航决策、定位精度和避障能力的核心组件。然而,传统技术通常是在单任务数据集上进行训练的,这不仅限制了可用训练数据的来源,而且无法充分利用数据集上不同场景的潜力。本文提出了一种多任务视觉感知系统。它将车道检测、交通标记语义、路面分割和目标检测集成在一个统一的框架内。通过多任务框架共享特征,提高了整体计算效率。为了克服单任务数据的局限性,本文提出的TFF-Net采用跨数据集训练,有效地整合了不同任务的数据源,增强了模型跨不同场景的泛化能力。该模型通过连续图像作为输入,补偿当前帧中由于遮挡或光照条件差导致的信息缺失,提高整体感知稳定性。在实验中,提出的网络在四个任务的多个数据集上进行了评估。结果表明,我们的方法在不同的指标上实现了优于现有方法的性能。代码可从https://github.com/hank890121/MTVP获得
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.40
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书