时空应用与互联网大数据基准的表征与比较

Wen Xiong, Kun Yang, Yanhui Zhu
{"title":"时空应用与互联网大数据基准的表征与比较","authors":"Wen Xiong, Kun Yang, Yanhui Zhu","doi":"10.1109/GEOINFORMATICS.2018.8557164","DOIUrl":null,"url":null,"abstract":"Urban traffic data analysis platform is an important infrastructure to a modern city. As the spatial-temporal data produced in traffic transportation systems explosively growth, the operators in traffic field are trying to adopt the new emerging big data solutions born in the internet area. However, it is hard to find a high cost/performance solution to build this platform because diverse combinations of hardware and software configuration. Currently, the operators selecting solutions depend on simple evaluation results based on internet benchmarks such as terasort. Two issues including: (1) is it appropriate that evaluating a solution for spatial-temporal applications by internet benchmark; and (2) what is the characteristic of spatial-temporal application and the potential optimization measurements, have never been fully explored. We address this issue by a novel workload characterization tool, called Extensible Metric Importance Analysis (EMIA), for big data applications. The key idea is a performance model based on ensemble learning, which takes the program metrics as the input, outputs the performance metric such as execution time, and ranks these metrics as their corresponding importance. Based on EMIA, we apply principal component analysis (PCA) to program behaviors of five representative spatial-temporal applications and nine popular internet big data benchmarks. Experimental results show that spatial-temporary applications present unique characteristics and it is unreasonable to evaluate solutions for spatial-temporary applications by internet benchmarks. Moreover, we optimize spatial-temporary applications via applying measurements to the key factors identified by EMIA, achieving obviously performance improvement.","PeriodicalId":142380,"journal":{"name":"2018 26th International Conference on Geoinformatics","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Characterization and Comparison of Spatial-Temporal Applications and Internet Big Data Benchmarks\",\"authors\":\"Wen Xiong, Kun Yang, Yanhui Zhu\",\"doi\":\"10.1109/GEOINFORMATICS.2018.8557164\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Urban traffic data analysis platform is an important infrastructure to a modern city. As the spatial-temporal data produced in traffic transportation systems explosively growth, the operators in traffic field are trying to adopt the new emerging big data solutions born in the internet area. However, it is hard to find a high cost/performance solution to build this platform because diverse combinations of hardware and software configuration. Currently, the operators selecting solutions depend on simple evaluation results based on internet benchmarks such as terasort. Two issues including: (1) is it appropriate that evaluating a solution for spatial-temporal applications by internet benchmark; and (2) what is the characteristic of spatial-temporal application and the potential optimization measurements, have never been fully explored. We address this issue by a novel workload characterization tool, called Extensible Metric Importance Analysis (EMIA), for big data applications. The key idea is a performance model based on ensemble learning, which takes the program metrics as the input, outputs the performance metric such as execution time, and ranks these metrics as their corresponding importance. Based on EMIA, we apply principal component analysis (PCA) to program behaviors of five representative spatial-temporal applications and nine popular internet big data benchmarks. Experimental results show that spatial-temporary applications present unique characteristics and it is unreasonable to evaluate solutions for spatial-temporary applications by internet benchmarks. Moreover, we optimize spatial-temporary applications via applying measurements to the key factors identified by EMIA, achieving obviously performance improvement.\",\"PeriodicalId\":142380,\"journal\":{\"name\":\"2018 26th International Conference on Geoinformatics\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 26th International Conference on Geoinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GEOINFORMATICS.2018.8557164\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 26th International Conference on Geoinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GEOINFORMATICS.2018.8557164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

城市交通数据分析平台是现代城市的重要基础设施。随着交通运输系统产生的时空数据的爆炸式增长,交通领域的运营商正在尝试采用互联网领域诞生的新兴大数据解决方案。然而,由于硬件和软件配置的不同组合,很难找到一个高成本/高性能的解决方案来构建这个平台。目前,运营商选择解决方案依赖于基于terasort等互联网基准的简单评估结果。两个问题包括:(1)通过互联网基准来评估时空应用的解决方案是否合适;(2)时空应用的特点和潜在的优化措施尚未得到充分的探索。我们通过一种新的工作负载表征工具来解决这个问题,该工具称为可扩展度量重要性分析(EMIA),用于大数据应用程序。关键思想是基于集成学习的性能模型,该模型将程序度量作为输入,输出执行时间等性能度量,并将这些度量按其相应的重要性进行排序。基于EMIA,我们将主成分分析(PCA)应用于5个具有代表性的时空应用程序和9个流行的互联网大数据基准的程序行为。实验结果表明,空间临时应用具有独特的特性,通过互联网基准测试来评估空间临时应用的解决方案是不合理的。此外,我们通过对EMIA确定的关键因素进行测量来优化空间-临时应用,实现了明显的性能改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Characterization and Comparison of Spatial-Temporal Applications and Internet Big Data Benchmarks
Urban traffic data analysis platform is an important infrastructure to a modern city. As the spatial-temporal data produced in traffic transportation systems explosively growth, the operators in traffic field are trying to adopt the new emerging big data solutions born in the internet area. However, it is hard to find a high cost/performance solution to build this platform because diverse combinations of hardware and software configuration. Currently, the operators selecting solutions depend on simple evaluation results based on internet benchmarks such as terasort. Two issues including: (1) is it appropriate that evaluating a solution for spatial-temporal applications by internet benchmark; and (2) what is the characteristic of spatial-temporal application and the potential optimization measurements, have never been fully explored. We address this issue by a novel workload characterization tool, called Extensible Metric Importance Analysis (EMIA), for big data applications. The key idea is a performance model based on ensemble learning, which takes the program metrics as the input, outputs the performance metric such as execution time, and ranks these metrics as their corresponding importance. Based on EMIA, we apply principal component analysis (PCA) to program behaviors of five representative spatial-temporal applications and nine popular internet big data benchmarks. Experimental results show that spatial-temporary applications present unique characteristics and it is unreasonable to evaluate solutions for spatial-temporary applications by internet benchmarks. Moreover, we optimize spatial-temporary applications via applying measurements to the key factors identified by EMIA, achieving obviously performance improvement.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信