面向无人驾驶移动平台的实时高分辨率软硬件协同设计神经架构搜索

IF 8 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Network and Computer Applications Pub Date : 2025-08-20 DOI:10.1016/j.jnca.2025.104282

ZiWen Dou, Jun Tian, HaiQuan Sang, MingMing Zhang

{"title":"面向无人驾驶移动平台的实时高分辨率软硬件协同设计神经架构搜索","authors":"ZiWen Dou, Jun Tian, HaiQuan Sang, MingMing Zhang","doi":"10.1016/j.jnca.2025.104282","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional manually designed high-resolution networks on mobile computing platforms often struggle to balance accuracy and inference speed. To address the issue of large computational costs in high-resolution neural networks, which makes them difficult to deploy on mobile computing platforms, we simplified the traditional multi-scale feature extraction process by reducing the three-branch fusion to a two-branch fusion, establishing a lightweight network-level search space. We applied gradient descent to iteratively optimize the two-layer parameters within the search space and used the pareto optimal algorithm to balance inference speed and accuracy. After convergence, we obtained a multi-scale feature extraction neural network structure that satisfies the balance inference speed and accuracy. When combined with different feature decoders, this structure enables real-time semantic segmentation and monocular depth estimation tasks on mobile platforms. An self-constructed unmanned mobile platform, built on a mobile computing platform, was used to collect image data from real-world environments to create a custom dataset. This dataset was used to validate the perception capabilities of the designed semantic segmentation and monocular depth estimation model on the mobile platform in real-world scenarios. The experiments demonstrate that our semantic segmentation model, designed for the NVIDIA NX mobile computing platform, achieves an accuracy of 71.7% for 1024 ×2048 high-resolution images, with an inference speed of 25.25 FPS. This represents a 39.2% improvement in inference speed over existing SOTA methods. Meanwhile, our monocular depth estimation model on the NVIDIA NX achieves an absolute relative error (Abs Rel) of 0.091, with an inference speed of 14.46 FPS. This method improves inference speed by 87.7% compared to existing methods, while preserving high accuracy. The code is available: <span><span>https://github.com/douziwenhit/RealtimeSeg</span><svg><path></path></svg></span> and <span><span>https://github.com/douziwenhit/RealtimeMDE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54784,"journal":{"name":"Journal of Network and Computer Applications","volume":"243 ","pages":"Article 104282"},"PeriodicalIF":8.0000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Real-time high-resolution hardware–software co-design neural architecture search for unmanned mobile platforms\",\"authors\":\"ZiWen Dou, Jun Tian, HaiQuan Sang, MingMing Zhang\",\"doi\":\"10.1016/j.jnca.2025.104282\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Traditional manually designed high-resolution networks on mobile computing platforms often struggle to balance accuracy and inference speed. To address the issue of large computational costs in high-resolution neural networks, which makes them difficult to deploy on mobile computing platforms, we simplified the traditional multi-scale feature extraction process by reducing the three-branch fusion to a two-branch fusion, establishing a lightweight network-level search space. We applied gradient descent to iteratively optimize the two-layer parameters within the search space and used the pareto optimal algorithm to balance inference speed and accuracy. After convergence, we obtained a multi-scale feature extraction neural network structure that satisfies the balance inference speed and accuracy. When combined with different feature decoders, this structure enables real-time semantic segmentation and monocular depth estimation tasks on mobile platforms. An self-constructed unmanned mobile platform, built on a mobile computing platform, was used to collect image data from real-world environments to create a custom dataset. This dataset was used to validate the perception capabilities of the designed semantic segmentation and monocular depth estimation model on the mobile platform in real-world scenarios. The experiments demonstrate that our semantic segmentation model, designed for the NVIDIA NX mobile computing platform, achieves an accuracy of 71.7% for 1024 ×2048 high-resolution images, with an inference speed of 25.25 FPS. This represents a 39.2% improvement in inference speed over existing SOTA methods. Meanwhile, our monocular depth estimation model on the NVIDIA NX achieves an absolute relative error (Abs Rel) of 0.091, with an inference speed of 14.46 FPS. This method improves inference speed by 87.7% compared to existing methods, while preserving high accuracy. The code is available: <span><span>https://github.com/douziwenhit/RealtimeSeg</span><svg><path></path></svg></span> and <span><span>https://github.com/douziwenhit/RealtimeMDE</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":54784,\"journal\":{\"name\":\"Journal of Network and Computer Applications\",\"volume\":\"243 \",\"pages\":\"Article 104282\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Network and Computer Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1084804525001791\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Network and Computer Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1084804525001791","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

传统的在移动计算平台上手工设计的高分辨率网络往往难以平衡准确性和推理速度。为了解决高分辨率神经网络计算成本大、难以在移动计算平台上部署的问题，我们将传统的多尺度特征提取过程简化，将三分支融合简化为两分支融合，建立了轻量级的网络级搜索空间。采用梯度下降法在搜索空间内对两层参数进行迭代优化，并采用帕累托最优算法平衡推理速度和准确率。经过收敛，得到了满足平衡推理速度和精度的多尺度特征提取神经网络结构。当与不同的特征解码器相结合时，该结构可以在移动平台上实现实时语义分割和单目深度估计任务。自建无人移动平台，建立在移动计算平台上，采集真实环境图像数据，创建自定义数据集。利用该数据集验证了所设计的语义分割和单目深度估计模型在移动平台上的感知能力。实验表明，我们针对NVIDIA NX移动计算平台设计的语义分割模型对1024张×2048高分辨率图像的分割准确率达到71.7%，推理速度达到25.25 FPS。这表明与现有SOTA方法相比，推理速度提高了39.2%。同时，我们的单目深度估计模型在NVIDIA NX上的绝对相对误差（Abs Rel）为0.091，推理速度为14.46 FPS。与现有方法相比，该方法的推理速度提高了87.7%，同时保持了较高的准确率。代码可从https://github.com/douziwenhit/RealtimeSeg和https://github.com/douziwenhit/RealtimeMDE获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Real-time high-resolution hardware–software co-design neural architecture search for unmanned mobile platforms

Traditional manually designed high-resolution networks on mobile computing platforms often struggle to balance accuracy and inference speed. To address the issue of large computational costs in high-resolution neural networks, which makes them difficult to deploy on mobile computing platforms, we simplified the traditional multi-scale feature extraction process by reducing the three-branch fusion to a two-branch fusion, establishing a lightweight network-level search space. We applied gradient descent to iteratively optimize the two-layer parameters within the search space and used the pareto optimal algorithm to balance inference speed and accuracy. After convergence, we obtained a multi-scale feature extraction neural network structure that satisfies the balance inference speed and accuracy. When combined with different feature decoders, this structure enables real-time semantic segmentation and monocular depth estimation tasks on mobile platforms. An self-constructed unmanned mobile platform, built on a mobile computing platform, was used to collect image data from real-world environments to create a custom dataset. This dataset was used to validate the perception capabilities of the designed semantic segmentation and monocular depth estimation model on the mobile platform in real-world scenarios. The experiments demonstrate that our semantic segmentation model, designed for the NVIDIA NX mobile computing platform, achieves an accuracy of 71.7% for 1024 ×2048 high-resolution images, with an inference speed of 25.25 FPS. This represents a 39.2% improvement in inference speed over existing SOTA methods. Meanwhile, our monocular depth estimation model on the NVIDIA NX achieves an absolute relative error (Abs Rel) of 0.091, with an inference speed of 14.46 FPS. This method improves inference speed by 87.7% compared to existing methods, while preserving high accuracy. The code is available: https://github.com/douziwenhit/RealtimeSeg and https://github.com/douziwenhit/RealtimeMDE.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Network and Computer Applications 工程技术-计算机：跨学科应用

CiteScore

21.50

自引率

3.40%

发文量

142

审稿时长

37 days

期刊介绍： The Journal of Network and Computer Applications welcomes research contributions, surveys, and notes in all areas relating to computer networks and applications thereof. Sample topics include new design techniques, interesting or novel applications, components or standards; computer networks with tools such as WWW; emerging standards for internet protocols; Wireless networks; Mobile Computing; emerging computing models such as cloud computing, grid computing; applications of networked systems for remote collaboration and telemedicine, etc. The journal is abstracted and indexed in Scopus, Engineering Index, Web of Science, Science Citation Index Expanded and INSPEC.