{"title":"LidNet:用于自动驾驶的激光雷达点云序列增强感知和运动预测","authors":"Yasser H. Khalil, H. Mouftah","doi":"10.1109/GLOBECOM48099.2022.10001152","DOIUrl":null,"url":null,"abstract":"Autonomous driving is strongly contingent on perception and motion prediction for scene understanding. In this paper, we propose LIDAR Network (LidNet) to boost perception and motion prediction accuracy by redesigning MotionNet architecture. MotionNet is a new real-time encoder-decoder model that achieves joint perception and motion prediction at a pixel level. LidNet improves MotionNet performance by replacing every two spatial convolution layers in its encoder-decoder architecture with residual blocks and relies on average pooling rather than strided convolution for spatial reduction. In addition, we adjust the lateral skip connections linking encoders and decoders to result in a symmetric network. The global temporal maximum pooling layers on the lateral connections are replaced with temporal average pooling. Further, we introduce a center layer between the encoder-decoder architecture, with no spatial reduction applied at the lowest levels. Our extensive evaluation performed on the nuScenes dataset confirms that LidNet outperforms the state-of-the-art and operates in real-time.","PeriodicalId":313199,"journal":{"name":"GLOBECOM 2022 - 2022 IEEE Global Communications Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"LidNet: Boosting Perception and Motion Prediction from a Sequence of LIDAR Point Clouds for Autonomous Driving\",\"authors\":\"Yasser H. Khalil, H. Mouftah\",\"doi\":\"10.1109/GLOBECOM48099.2022.10001152\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Autonomous driving is strongly contingent on perception and motion prediction for scene understanding. In this paper, we propose LIDAR Network (LidNet) to boost perception and motion prediction accuracy by redesigning MotionNet architecture. MotionNet is a new real-time encoder-decoder model that achieves joint perception and motion prediction at a pixel level. LidNet improves MotionNet performance by replacing every two spatial convolution layers in its encoder-decoder architecture with residual blocks and relies on average pooling rather than strided convolution for spatial reduction. In addition, we adjust the lateral skip connections linking encoders and decoders to result in a symmetric network. The global temporal maximum pooling layers on the lateral connections are replaced with temporal average pooling. Further, we introduce a center layer between the encoder-decoder architecture, with no spatial reduction applied at the lowest levels. Our extensive evaluation performed on the nuScenes dataset confirms that LidNet outperforms the state-of-the-art and operates in real-time.\",\"PeriodicalId\":313199,\"journal\":{\"name\":\"GLOBECOM 2022 - 2022 IEEE Global Communications Conference\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GLOBECOM 2022 - 2022 IEEE Global Communications Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GLOBECOM48099.2022.10001152\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GLOBECOM 2022 - 2022 IEEE Global Communications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GLOBECOM48099.2022.10001152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
LidNet: Boosting Perception and Motion Prediction from a Sequence of LIDAR Point Clouds for Autonomous Driving
Autonomous driving is strongly contingent on perception and motion prediction for scene understanding. In this paper, we propose LIDAR Network (LidNet) to boost perception and motion prediction accuracy by redesigning MotionNet architecture. MotionNet is a new real-time encoder-decoder model that achieves joint perception and motion prediction at a pixel level. LidNet improves MotionNet performance by replacing every two spatial convolution layers in its encoder-decoder architecture with residual blocks and relies on average pooling rather than strided convolution for spatial reduction. In addition, we adjust the lateral skip connections linking encoders and decoders to result in a symmetric network. The global temporal maximum pooling layers on the lateral connections are replaced with temporal average pooling. Further, we introduce a center layer between the encoder-decoder architecture, with no spatial reduction applied at the lowest levels. Our extensive evaluation performed on the nuScenes dataset confirms that LidNet outperforms the state-of-the-art and operates in real-time.