Feng Ma;Xin Jiang;Chen Chen;Jie Sun;Xin-Ping Yan;Jin Wang
{"title":"Waterway-BEV: Generate Bird’s Eye View Layouts of a Waterway From a First-Person View Camera Using Cross-View Transformers","authors":"Feng Ma;Xin Jiang;Chen Chen;Jie Sun;Xin-Ping Yan;Jin Wang","doi":"10.1109/TITS.2025.3554717","DOIUrl":null,"url":null,"abstract":"In the domain of autonomous ship navigation, the construction of bird’s-eye view (BEV) layouts for waterways has obvious significance. A helmsman can generate the BEV layout of the waterway using his/her eyes only. To simulate this intelligence, a novel neural network-based algorithm named Waterway-BEV is proposed, which enables reconstructing a local map formed by the waterway layout and ship occupancies in the bird’s-eye view given a first person view monocular image only. Waterway-BEV employs an efficient SEResNeXt encoder to extract features from first person view (FPV) monocular images, capturing deep semantic information related to waterways and ships. Due to the variations in information across different perspectives, Waterway-BEV incorporates a Cross-View Transformation Module, which takes the constraint of cycle consistency between views into account and makes full use of their correlation to strengthen the view transformation and scene understanding. To fully leverage the feature output of the SEResNeXt encoder, Waterway-BEV employs a decoder based on a dedicated lightweight network. This decoder is responsible for decoding the enhanced bird’s-eye view (BEV) feature maps and generating the BEV layout. By employing the Focal Loss as the loss function for model optimization, Waterway-BEV takes into account the quantity and classification difficulty of ship samples during the training process, thereby improving the generation performance and convergence speed. The experiments demonstrated that Waterway-BEV achieved notable performance metrics, with mIOU and mAP rates reaching 97.8% and 98.2%, respectively, in waterway bird’s-eye view layout generation. Waterway-BEV outperformed other state-of-the-art (SOTA) algorithms in generating BEV layouts of waterways. In particular, during specialized scenarios such as crossroads of waterways and tasks involving small target ships, Waterway-BEV consistently generated satisfactory bird’s-eye view layouts, demonstrating robustness and applicability.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 6","pages":"8078-8096"},"PeriodicalIF":8.4000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Transportation Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10960549/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
引用次数: 0
Abstract
In the domain of autonomous ship navigation, the construction of bird’s-eye view (BEV) layouts for waterways has obvious significance. A helmsman can generate the BEV layout of the waterway using his/her eyes only. To simulate this intelligence, a novel neural network-based algorithm named Waterway-BEV is proposed, which enables reconstructing a local map formed by the waterway layout and ship occupancies in the bird’s-eye view given a first person view monocular image only. Waterway-BEV employs an efficient SEResNeXt encoder to extract features from first person view (FPV) monocular images, capturing deep semantic information related to waterways and ships. Due to the variations in information across different perspectives, Waterway-BEV incorporates a Cross-View Transformation Module, which takes the constraint of cycle consistency between views into account and makes full use of their correlation to strengthen the view transformation and scene understanding. To fully leverage the feature output of the SEResNeXt encoder, Waterway-BEV employs a decoder based on a dedicated lightweight network. This decoder is responsible for decoding the enhanced bird’s-eye view (BEV) feature maps and generating the BEV layout. By employing the Focal Loss as the loss function for model optimization, Waterway-BEV takes into account the quantity and classification difficulty of ship samples during the training process, thereby improving the generation performance and convergence speed. The experiments demonstrated that Waterway-BEV achieved notable performance metrics, with mIOU and mAP rates reaching 97.8% and 98.2%, respectively, in waterway bird’s-eye view layout generation. Waterway-BEV outperformed other state-of-the-art (SOTA) algorithms in generating BEV layouts of waterways. In particular, during specialized scenarios such as crossroads of waterways and tasks involving small target ships, Waterway-BEV consistently generated satisfactory bird’s-eye view layouts, demonstrating robustness and applicability.
期刊介绍:
The theoretical, experimental and operational aspects of electrical and electronics engineering and information technologies as applied to Intelligent Transportation Systems (ITS). Intelligent Transportation Systems are defined as those systems utilizing synergistic technologies and systems engineering concepts to develop and improve transportation systems of all kinds. The scope of this interdisciplinary activity includes the promotion, consolidation and coordination of ITS technical activities among IEEE entities, and providing a focus for cooperative activities, both internally and externally.