{"title":"TFF-Net: Multi-Task Visual Perception Incorporated With Temporal Feature Fusion for Driving Scene Understanding","authors":"Huei-Yung Lin;Shih-Han Wei","doi":"10.1109/OJITS.2026.3665906","DOIUrl":null,"url":null,"abstract":"With the rapid advancement of autonomous driving technology, accurate perception of road scenes has become a cornerstone for achieving safe and efficient self-driving. Among various perception tasks, lane detection, road marking segmentation, road surface area extraction, and object detection are core components that directly affect vehicle navigation decisions, positioning accuracy, and obstacle avoidance capability. However, conventional techniques are often trained on single-task datasets, which not only limit the sources of available training data but also fail to fully leverage the potential of diverse scenes across datasets. In this paper we propose a multi-task visual perception system. It integrates lane detection, traffic marking semantics, road surface segmentation, and object detection within a unified framework. By sharing features through the multi-task framework, the overall computational efficiency is improved. To overcome the limitation of single-task data, the proposed TFF-Net adopts cross-dataset training to effectively integrate the data sources for different tasks, and enhances the model’s generalization ability across diverse scenes. By taking consecutive images as input, the model compensates for missing information caused by occlusion or poor lighting conditions in the current frame to improve the overall perception stability. In experiments, the proposed network is evaluated on multiple datasets across four tasks. The results have demonstrated that our approach achieves performance superior to existing methods on different metrics. Code is available at <uri>https://github.com/hank890121/MTVP</uri>","PeriodicalId":100631,"journal":{"name":"IEEE Open Journal of Intelligent Transportation Systems","volume":"7 ","pages":"669-679"},"PeriodicalIF":5.3000,"publicationDate":"2026-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11398110","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of Intelligent Transportation Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11398110/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid advancement of autonomous driving technology, accurate perception of road scenes has become a cornerstone for achieving safe and efficient self-driving. Among various perception tasks, lane detection, road marking segmentation, road surface area extraction, and object detection are core components that directly affect vehicle navigation decisions, positioning accuracy, and obstacle avoidance capability. However, conventional techniques are often trained on single-task datasets, which not only limit the sources of available training data but also fail to fully leverage the potential of diverse scenes across datasets. In this paper we propose a multi-task visual perception system. It integrates lane detection, traffic marking semantics, road surface segmentation, and object detection within a unified framework. By sharing features through the multi-task framework, the overall computational efficiency is improved. To overcome the limitation of single-task data, the proposed TFF-Net adopts cross-dataset training to effectively integrate the data sources for different tasks, and enhances the model’s generalization ability across diverse scenes. By taking consecutive images as input, the model compensates for missing information caused by occlusion or poor lighting conditions in the current frame to improve the overall perception stability. In experiments, the proposed network is evaluated on multiple datasets across four tasks. The results have demonstrated that our approach achieves performance superior to existing methods on different metrics. Code is available at https://github.com/hank890121/MTVP