{"title":"Perfomance Evaluation of Java Serialization Frameworks on Geospatial Big Data","authors":"Filip Ricov, K. Pripužić","doi":"10.23919/SpliTech55088.2022.9854334","DOIUrl":null,"url":null,"abstract":"Geospatial Big Data refers to spatial datasets exceeding the capacity of current computing systems. These datasets usually contain millions of vector geometries (such as points, polygons and linestrings) that are used to represent the spatial component of geographic features. Each geometry consists of one or more interconnected vertices, where each vertex describes a geographic location. Due to its large volume or high frequency of generation, Geospatial Big Data must be stored and processed in a distributed manner, usually using an open-source Big Data platform such as Apache Spark. This often requires serialization and deserialization of geometries when sending and receiving them among distributed computers. Therefore, the performance of serialization and deserialization has a significant impact on the overall processing performance of Geospatial Big Data. In this paper, we first briefly present seven popular Java serialization frameworks that can work with geometries and then experimentally evaluate and compare their serialization and deserialization performance on Geospatial Big Data.","PeriodicalId":295373,"journal":{"name":"2022 7th International Conference on Smart and Sustainable Technologies (SpliTech)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Smart and Sustainable Technologies (SpliTech)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/SpliTech55088.2022.9854334","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Geospatial Big Data refers to spatial datasets exceeding the capacity of current computing systems. These datasets usually contain millions of vector geometries (such as points, polygons and linestrings) that are used to represent the spatial component of geographic features. Each geometry consists of one or more interconnected vertices, where each vertex describes a geographic location. Due to its large volume or high frequency of generation, Geospatial Big Data must be stored and processed in a distributed manner, usually using an open-source Big Data platform such as Apache Spark. This often requires serialization and deserialization of geometries when sending and receiving them among distributed computers. Therefore, the performance of serialization and deserialization has a significant impact on the overall processing performance of Geospatial Big Data. In this paper, we first briefly present seven popular Java serialization frameworks that can work with geometries and then experimentally evaluate and compare their serialization and deserialization performance on Geospatial Big Data.