时空应用与互联网大数据基准的表征与比较

2018 26th International Conference on Geoinformatics Pub Date : 2018-06-01 DOI:10.1109/GEOINFORMATICS.2018.8557164

Wen Xiong, Kun Yang, Yanhui Zhu

{"title":"时空应用与互联网大数据基准的表征与比较","authors":"Wen Xiong, Kun Yang, Yanhui Zhu","doi":"10.1109/GEOINFORMATICS.2018.8557164","DOIUrl":null,"url":null,"abstract":"Urban traffic data analysis platform is an important infrastructure to a modern city. As the spatial-temporal data produced in traffic transportation systems explosively growth, the operators in traffic field are trying to adopt the new emerging big data solutions born in the internet area. However, it is hard to find a high cost/performance solution to build this platform because diverse combinations of hardware and software configuration. Currently, the operators selecting solutions depend on simple evaluation results based on internet benchmarks such as terasort. Two issues including: (1) is it appropriate that evaluating a solution for spatial-temporal applications by internet benchmark; and (2) what is the characteristic of spatial-temporal application and the potential optimization measurements, have never been fully explored. We address this issue by a novel workload characterization tool, called Extensible Metric Importance Analysis (EMIA), for big data applications. The key idea is a performance model based on ensemble learning, which takes the program metrics as the input, outputs the performance metric such as execution time, and ranks these metrics as their corresponding importance. Based on EMIA, we apply principal component analysis (PCA) to program behaviors of five representative spatial-temporal applications and nine popular internet big data benchmarks. Experimental results show that spatial-temporary applications present unique characteristics and it is unreasonable to evaluate solutions for spatial-temporary applications by internet benchmarks. Moreover, we optimize spatial-temporary applications via applying measurements to the key factors identified by EMIA, achieving obviously performance improvement.","PeriodicalId":142380,"journal":{"name":"2018 26th International Conference on Geoinformatics","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Characterization and Comparison of Spatial-Temporal Applications and Internet Big Data Benchmarks\",\"authors\":\"Wen Xiong, Kun Yang, Yanhui Zhu\",\"doi\":\"10.1109/GEOINFORMATICS.2018.8557164\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Urban traffic data analysis platform is an important infrastructure to a modern city. As the spatial-temporal data produced in traffic transportation systems explosively growth, the operators in traffic field are trying to adopt the new emerging big data solutions born in the internet area. However, it is hard to find a high cost/performance solution to build this platform because diverse combinations of hardware and software configuration. Currently, the operators selecting solutions depend on simple evaluation results based on internet benchmarks such as terasort. Two issues including: (1) is it appropriate that evaluating a solution for spatial-temporal applications by internet benchmark; and (2) what is the characteristic of spatial-temporal application and the potential optimization measurements, have never been fully explored. We address this issue by a novel workload characterization tool, called Extensible Metric Importance Analysis (EMIA), for big data applications. The key idea is a performance model based on ensemble learning, which takes the program metrics as the input, outputs the performance metric such as execution time, and ranks these metrics as their corresponding importance. Based on EMIA, we apply principal component analysis (PCA) to program behaviors of five representative spatial-temporal applications and nine popular internet big data benchmarks. Experimental results show that spatial-temporary applications present unique characteristics and it is unreasonable to evaluate solutions for spatial-temporary applications by internet benchmarks. Moreover, we optimize spatial-temporary applications via applying measurements to the key factors identified by EMIA, achieving obviously performance improvement.\",\"PeriodicalId\":142380,\"journal\":{\"name\":\"2018 26th International Conference on Geoinformatics\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 26th International Conference on Geoinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GEOINFORMATICS.2018.8557164\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 26th International Conference on Geoinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GEOINFORMATICS.2018.8557164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

城市交通数据分析平台是现代城市的重要基础设施。随着交通运输系统产生的时空数据的爆炸式增长，交通领域的运营商正在尝试采用互联网领域诞生的新兴大数据解决方案。然而，由于硬件和软件配置的不同组合，很难找到一个高成本/高性能的解决方案来构建这个平台。目前，运营商选择解决方案依赖于基于terasort等互联网基准的简单评估结果。两个问题包括:(1)通过互联网基准来评估时空应用的解决方案是否合适;(2)时空应用的特点和潜在的优化措施尚未得到充分的探索。我们通过一种新的工作负载表征工具来解决这个问题，该工具称为可扩展度量重要性分析(EMIA)，用于大数据应用程序。关键思想是基于集成学习的性能模型，该模型将程序度量作为输入，输出执行时间等性能度量，并将这些度量按其相应的重要性进行排序。基于EMIA，我们将主成分分析(PCA)应用于5个具有代表性的时空应用程序和9个流行的互联网大数据基准的程序行为。实验结果表明，空间临时应用具有独特的特性，通过互联网基准测试来评估空间临时应用的解决方案是不合理的。此外，我们通过对EMIA确定的关键因素进行测量来优化空间-临时应用，实现了明显的性能改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Characterization and Comparison of Spatial-Temporal Applications and Internet Big Data Benchmarks

Urban traffic data analysis platform is an important infrastructure to a modern city. As the spatial-temporal data produced in traffic transportation systems explosively growth, the operators in traffic field are trying to adopt the new emerging big data solutions born in the internet area. However, it is hard to find a high cost/performance solution to build this platform because diverse combinations of hardware and software configuration. Currently, the operators selecting solutions depend on simple evaluation results based on internet benchmarks such as terasort. Two issues including: (1) is it appropriate that evaluating a solution for spatial-temporal applications by internet benchmark; and (2) what is the characteristic of spatial-temporal application and the potential optimization measurements, have never been fully explored. We address this issue by a novel workload characterization tool, called Extensible Metric Importance Analysis (EMIA), for big data applications. The key idea is a performance model based on ensemble learning, which takes the program metrics as the input, outputs the performance metric such as execution time, and ranks these metrics as their corresponding importance. Based on EMIA, we apply principal component analysis (PCA) to program behaviors of five representative spatial-temporal applications and nine popular internet big data benchmarks. Experimental results show that spatial-temporary applications present unique characteristics and it is unreasonable to evaluate solutions for spatial-temporary applications by internet benchmarks. Moreover, we optimize spatial-temporary applications via applying measurements to the key factors identified by EMIA, achieving obviously performance improvement.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 26th International Conference on Geoinformatics

自引率

0.00%

发文量