Ling-Xiao Qin, Hong-Mei Sun, Xiao-Meng Duan, Cheng-Yue Che, Rui-Sheng Jia
{"title":"Adaptive learning-enhanced lightweight network for real-time vehicle density estimation","authors":"Ling-Xiao Qin, Hong-Mei Sun, Xiao-Meng Duan, Cheng-Yue Che, Rui-Sheng Jia","doi":"10.1007/s00371-024-03572-3","DOIUrl":null,"url":null,"abstract":"<p>In order to maintain competitive density estimation performance, most of the existing works design cumbersome network structures to extract and refine vehicle features, resulting in huge computational resource consumption and storage burden during the inference process, which severely limits their deployment scope and makes it difficult to be applied in practical scenarios. To solve the above problems, we propose a lightweight network for real-time vehicle density estimation (LSENet). Specifically, the network consists of three parts: a pre-trained heavy teacher network, an adaptive integration block and a lightweight student network. First, a teacher network based on a deep single-column transformer is designed as a means to provide effective global dependency and vehicle distribution knowledge for the student network to learn. Second, to address the intermediate layer mismatch and dimensionality inconsistency between the teacher network and the student network, an adaptive integration block is designed to efficiently guide the student network learning by dynamically assigning the self-attention heads that has the most influence on the network decision as a source of distilled knowledge. Finally, to complement the fine-grained features, CNN blocks are designed in parallel with the student network transformer backbone as a way to improve the network’s ability to capture vehicle details. Extensive experiments on two vehicle benchmark datasets, TRANCOS and VisDrone2019, show that LSENet achieves an optimal trade-off between density estimation accuracy and operational speed compared to other state-of-the-art methods and is therefore suitable for deployment on computationally resource-poor edge devices. Our codes will be available at https://github.com/goudaner1/LSENet.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03572-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In order to maintain competitive density estimation performance, most of the existing works design cumbersome network structures to extract and refine vehicle features, resulting in huge computational resource consumption and storage burden during the inference process, which severely limits their deployment scope and makes it difficult to be applied in practical scenarios. To solve the above problems, we propose a lightweight network for real-time vehicle density estimation (LSENet). Specifically, the network consists of three parts: a pre-trained heavy teacher network, an adaptive integration block and a lightweight student network. First, a teacher network based on a deep single-column transformer is designed as a means to provide effective global dependency and vehicle distribution knowledge for the student network to learn. Second, to address the intermediate layer mismatch and dimensionality inconsistency between the teacher network and the student network, an adaptive integration block is designed to efficiently guide the student network learning by dynamically assigning the self-attention heads that has the most influence on the network decision as a source of distilled knowledge. Finally, to complement the fine-grained features, CNN blocks are designed in parallel with the student network transformer backbone as a way to improve the network’s ability to capture vehicle details. Extensive experiments on two vehicle benchmark datasets, TRANCOS and VisDrone2019, show that LSENet achieves an optimal trade-off between density estimation accuracy and operational speed compared to other state-of-the-art methods and is therefore suitable for deployment on computationally resource-poor edge devices. Our codes will be available at https://github.com/goudaner1/LSENet.