Chase D. Carthen, Araam Zaremehrjardi, Vinh D. Le, Carlos Cardillo, S. Strachan, A. Tavakkoli, F. Harris, S. Dascalu
{"title":"在空间数据管道中编排Apache NiFi/MiNiFi","authors":"Chase D. Carthen, Araam Zaremehrjardi, Vinh D. Le, Carlos Cardillo, S. Strachan, A. Tavakkoli, F. Harris, S. Dascalu","doi":"10.1109/SERA57763.2023.10197731","DOIUrl":null,"url":null,"abstract":"In many smart city projects, a common choice to capture spatial information is the inclusion of LiDAR data, but this decision will often invoke severe growing pains within the existing infrastructure. In this paper, we introduce a data pipeline that orchestrates Apache NiFi (NiFi), Apache MiNiFi (MiNiFi), and several other tools as an automated solution in order to relay and archive LiDAR data captured by deployed edge devices. The LiDAR sensors utilized within this workflow are Velodyne Ultra Pucks sensors that capture at a rate of 10 frames per second and produces 6-7 GB packet capture (PCAP) files per hour. By both compressing the file after capturing it and compressing the file in real-time, we discovered that gzip produced a file of 5 GB and saved about 5 minutes in transmission time to NiFi, as well as saving considerable CPU time when compressing the file in real-time. Alternatively, we chose XZ as the compression algorithm for the ingestion of LiDAR data onto an institution compute cluster due to its high compression ratio. In order to evaluate the capabilities of our system design, the features of this data pipeline were compared against existing third-party services, namely Globus and RSync.","PeriodicalId":211080,"journal":{"name":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Orchestrating Apache NiFi/MiNiFi within a Spatial Data Pipeline\",\"authors\":\"Chase D. Carthen, Araam Zaremehrjardi, Vinh D. Le, Carlos Cardillo, S. Strachan, A. Tavakkoli, F. Harris, S. Dascalu\",\"doi\":\"10.1109/SERA57763.2023.10197731\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many smart city projects, a common choice to capture spatial information is the inclusion of LiDAR data, but this decision will often invoke severe growing pains within the existing infrastructure. In this paper, we introduce a data pipeline that orchestrates Apache NiFi (NiFi), Apache MiNiFi (MiNiFi), and several other tools as an automated solution in order to relay and archive LiDAR data captured by deployed edge devices. The LiDAR sensors utilized within this workflow are Velodyne Ultra Pucks sensors that capture at a rate of 10 frames per second and produces 6-7 GB packet capture (PCAP) files per hour. By both compressing the file after capturing it and compressing the file in real-time, we discovered that gzip produced a file of 5 GB and saved about 5 minutes in transmission time to NiFi, as well as saving considerable CPU time when compressing the file in real-time. Alternatively, we chose XZ as the compression algorithm for the ingestion of LiDAR data onto an institution compute cluster due to its high compression ratio. In order to evaluate the capabilities of our system design, the features of this data pipeline were compared against existing third-party services, namely Globus and RSync.\",\"PeriodicalId\":211080,\"journal\":{\"name\":\"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)\",\"volume\":\"110 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SERA57763.2023.10197731\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERA57763.2023.10197731","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Orchestrating Apache NiFi/MiNiFi within a Spatial Data Pipeline
In many smart city projects, a common choice to capture spatial information is the inclusion of LiDAR data, but this decision will often invoke severe growing pains within the existing infrastructure. In this paper, we introduce a data pipeline that orchestrates Apache NiFi (NiFi), Apache MiNiFi (MiNiFi), and several other tools as an automated solution in order to relay and archive LiDAR data captured by deployed edge devices. The LiDAR sensors utilized within this workflow are Velodyne Ultra Pucks sensors that capture at a rate of 10 frames per second and produces 6-7 GB packet capture (PCAP) files per hour. By both compressing the file after capturing it and compressing the file in real-time, we discovered that gzip produced a file of 5 GB and saved about 5 minutes in transmission time to NiFi, as well as saving considerable CPU time when compressing the file in real-time. Alternatively, we chose XZ as the compression algorithm for the ingestion of LiDAR data onto an institution compute cluster due to its high compression ratio. In order to evaluate the capabilities of our system design, the features of this data pipeline were compared against existing third-party services, namely Globus and RSync.