{"title":"面向无人驾驶移动平台的实时高分辨率软硬件协同设计神经架构搜索","authors":"ZiWen Dou, Jun Tian, HaiQuan Sang, MingMing Zhang","doi":"10.1016/j.jnca.2025.104282","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional manually designed high-resolution networks on mobile computing platforms often struggle to balance accuracy and inference speed. To address the issue of large computational costs in high-resolution neural networks, which makes them difficult to deploy on mobile computing platforms, we simplified the traditional multi-scale feature extraction process by reducing the three-branch fusion to a two-branch fusion, establishing a lightweight network-level search space. We applied gradient descent to iteratively optimize the two-layer parameters within the search space and used the pareto optimal algorithm to balance inference speed and accuracy. After convergence, we obtained a multi-scale feature extraction neural network structure that satisfies the balance inference speed and accuracy. When combined with different feature decoders, this structure enables real-time semantic segmentation and monocular depth estimation tasks on mobile platforms. An self-constructed unmanned mobile platform, built on a mobile computing platform, was used to collect image data from real-world environments to create a custom dataset. This dataset was used to validate the perception capabilities of the designed semantic segmentation and monocular depth estimation model on the mobile platform in real-world scenarios. The experiments demonstrate that our semantic segmentation model, designed for the NVIDIA NX mobile computing platform, achieves an accuracy of 71.7% for 1024 ×2048 high-resolution images, with an inference speed of 25.25 FPS. This represents a 39.2% improvement in inference speed over existing SOTA methods. Meanwhile, our monocular depth estimation model on the NVIDIA NX achieves an absolute relative error (Abs Rel) of 0.091, with an inference speed of 14.46 FPS. This method improves inference speed by 87.7% compared to existing methods, while preserving high accuracy. The code is available: <span><span>https://github.com/douziwenhit/RealtimeSeg</span><svg><path></path></svg></span> and <span><span>https://github.com/douziwenhit/RealtimeMDE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54784,"journal":{"name":"Journal of Network and Computer Applications","volume":"243 ","pages":"Article 104282"},"PeriodicalIF":8.0000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Real-time high-resolution hardware–software co-design neural architecture search for unmanned mobile platforms\",\"authors\":\"ZiWen Dou, Jun Tian, HaiQuan Sang, MingMing Zhang\",\"doi\":\"10.1016/j.jnca.2025.104282\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Traditional manually designed high-resolution networks on mobile computing platforms often struggle to balance accuracy and inference speed. To address the issue of large computational costs in high-resolution neural networks, which makes them difficult to deploy on mobile computing platforms, we simplified the traditional multi-scale feature extraction process by reducing the three-branch fusion to a two-branch fusion, establishing a lightweight network-level search space. We applied gradient descent to iteratively optimize the two-layer parameters within the search space and used the pareto optimal algorithm to balance inference speed and accuracy. After convergence, we obtained a multi-scale feature extraction neural network structure that satisfies the balance inference speed and accuracy. When combined with different feature decoders, this structure enables real-time semantic segmentation and monocular depth estimation tasks on mobile platforms. An self-constructed unmanned mobile platform, built on a mobile computing platform, was used to collect image data from real-world environments to create a custom dataset. This dataset was used to validate the perception capabilities of the designed semantic segmentation and monocular depth estimation model on the mobile platform in real-world scenarios. The experiments demonstrate that our semantic segmentation model, designed for the NVIDIA NX mobile computing platform, achieves an accuracy of 71.7% for 1024 ×2048 high-resolution images, with an inference speed of 25.25 FPS. This represents a 39.2% improvement in inference speed over existing SOTA methods. Meanwhile, our monocular depth estimation model on the NVIDIA NX achieves an absolute relative error (Abs Rel) of 0.091, with an inference speed of 14.46 FPS. This method improves inference speed by 87.7% compared to existing methods, while preserving high accuracy. The code is available: <span><span>https://github.com/douziwenhit/RealtimeSeg</span><svg><path></path></svg></span> and <span><span>https://github.com/douziwenhit/RealtimeMDE</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":54784,\"journal\":{\"name\":\"Journal of Network and Computer Applications\",\"volume\":\"243 \",\"pages\":\"Article 104282\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Network and Computer Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1084804525001791\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Network and Computer Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1084804525001791","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Real-time high-resolution hardware–software co-design neural architecture search for unmanned mobile platforms
Traditional manually designed high-resolution networks on mobile computing platforms often struggle to balance accuracy and inference speed. To address the issue of large computational costs in high-resolution neural networks, which makes them difficult to deploy on mobile computing platforms, we simplified the traditional multi-scale feature extraction process by reducing the three-branch fusion to a two-branch fusion, establishing a lightweight network-level search space. We applied gradient descent to iteratively optimize the two-layer parameters within the search space and used the pareto optimal algorithm to balance inference speed and accuracy. After convergence, we obtained a multi-scale feature extraction neural network structure that satisfies the balance inference speed and accuracy. When combined with different feature decoders, this structure enables real-time semantic segmentation and monocular depth estimation tasks on mobile platforms. An self-constructed unmanned mobile platform, built on a mobile computing platform, was used to collect image data from real-world environments to create a custom dataset. This dataset was used to validate the perception capabilities of the designed semantic segmentation and monocular depth estimation model on the mobile platform in real-world scenarios. The experiments demonstrate that our semantic segmentation model, designed for the NVIDIA NX mobile computing platform, achieves an accuracy of 71.7% for 1024 ×2048 high-resolution images, with an inference speed of 25.25 FPS. This represents a 39.2% improvement in inference speed over existing SOTA methods. Meanwhile, our monocular depth estimation model on the NVIDIA NX achieves an absolute relative error (Abs Rel) of 0.091, with an inference speed of 14.46 FPS. This method improves inference speed by 87.7% compared to existing methods, while preserving high accuracy. The code is available: https://github.com/douziwenhit/RealtimeSeg and https://github.com/douziwenhit/RealtimeMDE.
期刊介绍:
The Journal of Network and Computer Applications welcomes research contributions, surveys, and notes in all areas relating to computer networks and applications thereof. Sample topics include new design techniques, interesting or novel applications, components or standards; computer networks with tools such as WWW; emerging standards for internet protocols; Wireless networks; Mobile Computing; emerging computing models such as cloud computing, grid computing; applications of networked systems for remote collaboration and telemedicine, etc. The journal is abstracted and indexed in Scopus, Engineering Index, Web of Science, Science Citation Index Expanded and INSPEC.