Bharat Lohani, Parvej Khan, Vaibhav Kumar, Siddhartha Gupta
{"title":"Role of Simulated Lidar Data for Training 3D Deep Learning Models: An Exhaustive Analysis","authors":"Bharat Lohani, Parvej Khan, Vaibhav Kumar, Siddhartha Gupta","doi":"10.1007/s12524-024-01905-2","DOIUrl":null,"url":null,"abstract":"<p>The use of 3D Deep Learning (DL) models for LiDAR data segmentation has attracted much interest in recent years. However, the generation of labeled point cloud data, which is a prerequisite for training DL models, is a highly resource-intensive exercise. Simulated LiDAR data, which are already labeled, provide a cost-effective alternative, but their efficacy and usefulness must be evaluated. This paper examines the role of simulated LiDAR point clouds in training DL models. A high-fidelity 3D terrain model representing the real environment is developed, and the in-house physics-based simulator “Limulator” is used to generate labeled point clouds through various realizations. The paper outlines a few major hypotheses to assess the usefulness of simulated data in training DL models. The hypotheses are designed to assess the role of simulated data alone or in combination with real data or by strategic boosting of minor classes in simulated data. Several experiments are carried out to test these hypotheses. An experiment involves training a DL model, PointCNN in this case, using various combinations of simulated and real LiDAR data and measuring its performance to segment the test data. Results show that training using simulated data alone can produce an overall accuracy (OA) of 89% and the weighted-averaged F1 score of 88.81%. It is further observed that training using a combination of simulated and real data can achieve accuracies comparable to when only a large quantity of real data is employed. Strategic boosting of minor classes in simulated data improves the accuracies of minor classes by up to 23% compared to only real data. Training a DL model using simulated data, due to the ease in its generation and positive impact on segmentation accuracy, can be highly beneficial in the use of DL for LiDAR data. The use of simulated data for training has the potential to minimize the resource-intensive exercise of developing labeled real data.</p>","PeriodicalId":17510,"journal":{"name":"Journal of the Indian Society of Remote Sensing","volume":"4 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Indian Society of Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s12524-024-01905-2","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
The use of 3D Deep Learning (DL) models for LiDAR data segmentation has attracted much interest in recent years. However, the generation of labeled point cloud data, which is a prerequisite for training DL models, is a highly resource-intensive exercise. Simulated LiDAR data, which are already labeled, provide a cost-effective alternative, but their efficacy and usefulness must be evaluated. This paper examines the role of simulated LiDAR point clouds in training DL models. A high-fidelity 3D terrain model representing the real environment is developed, and the in-house physics-based simulator “Limulator” is used to generate labeled point clouds through various realizations. The paper outlines a few major hypotheses to assess the usefulness of simulated data in training DL models. The hypotheses are designed to assess the role of simulated data alone or in combination with real data or by strategic boosting of minor classes in simulated data. Several experiments are carried out to test these hypotheses. An experiment involves training a DL model, PointCNN in this case, using various combinations of simulated and real LiDAR data and measuring its performance to segment the test data. Results show that training using simulated data alone can produce an overall accuracy (OA) of 89% and the weighted-averaged F1 score of 88.81%. It is further observed that training using a combination of simulated and real data can achieve accuracies comparable to when only a large quantity of real data is employed. Strategic boosting of minor classes in simulated data improves the accuracies of minor classes by up to 23% compared to only real data. Training a DL model using simulated data, due to the ease in its generation and positive impact on segmentation accuracy, can be highly beneficial in the use of DL for LiDAR data. The use of simulated data for training has the potential to minimize the resource-intensive exercise of developing labeled real data.
期刊介绍:
The aims and scope of the Journal of the Indian Society of Remote Sensing are to help towards advancement, dissemination and application of the knowledge of Remote Sensing technology, which is deemed to include photo interpretation, photogrammetry, aerial photography, image processing, and other related technologies in the field of survey, planning and management of natural resources and other areas of application where the technology is considered to be appropriate, to promote interaction among all persons, bodies, institutions (private and/or state-owned) and industries interested in achieving advancement, dissemination and application of the technology, to encourage and undertake research in remote sensing and related technologies and to undertake and execute all acts which shall promote all or any of the aims and objectives of the Indian Society of Remote Sensing.