大数据分析研究的挑战与机遇

IEEE International Performance, Computing, and Communications Conference Pub Date : 2014-12-01 DOI:10.1109/PCCC.2014.7017014

N. Duffield, Jie Wu

{"title":"大数据分析研究的挑战与机遇","authors":"N. Duffield, Jie Wu","doi":"10.1109/PCCC.2014.7017014","DOIUrl":null,"url":null,"abstract":"One response to the proliferation of massive datasets in many fields has been to develop ingenious ways to throw resources at the problem, for example, using massive fault tolerant storage architectures, supercomputing platforms, and parallel graph computation models. However, not all environments can support this scale of resources, and not all queries need an exact response. Massive and diverse operational datasets have been employed by large Internet Service Providers for a number of years, and mathematical methods have underpinned their response to the challenges of data scale, incompleteness and complexity that are prevalent both in ISP data and in big data more generally. This talk reviews some recent progress in this direction, and surveys some new roles for sampling methods in Big Data.","PeriodicalId":442628,"journal":{"name":"IEEE International Performance, Computing, and Communications Conference","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Challenges and opportunities for analysis based research in big data\",\"authors\":\"N. Duffield, Jie Wu\",\"doi\":\"10.1109/PCCC.2014.7017014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One response to the proliferation of massive datasets in many fields has been to develop ingenious ways to throw resources at the problem, for example, using massive fault tolerant storage architectures, supercomputing platforms, and parallel graph computation models. However, not all environments can support this scale of resources, and not all queries need an exact response. Massive and diverse operational datasets have been employed by large Internet Service Providers for a number of years, and mathematical methods have underpinned their response to the challenges of data scale, incompleteness and complexity that are prevalent both in ISP data and in big data more generally. This talk reviews some recent progress in this direction, and surveys some new roles for sampling methods in Big Data.\",\"PeriodicalId\":442628,\"journal\":{\"name\":\"IEEE International Performance, Computing, and Communications Conference\",\"volume\":\"89 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Performance, Computing, and Communications Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PCCC.2014.7017014\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Performance, Computing, and Communications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PCCC.2014.7017014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

对大量数据集在许多领域的扩散的一种回应是开发巧妙的方法来解决问题，例如，使用大规模容错存储架构、超级计算平台和并行图计算模型。然而，并不是所有的环境都能支持这种规模的资源，也不是所有的查询都需要精确的响应。大型互联网服务提供商多年来一直使用大规模和多样化的操作数据集，数学方法支撑了他们对ISP数据和更普遍的大数据中普遍存在的数据规模、不完整性和复杂性挑战的响应。这次演讲回顾了这一方向的一些最新进展，并调查了抽样方法在大数据中的一些新作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Challenges and opportunities for analysis based research in big data

One response to the proliferation of massive datasets in many fields has been to develop ingenious ways to throw resources at the problem, for example, using massive fault tolerant storage architectures, supercomputing platforms, and parallel graph computation models. However, not all environments can support this scale of resources, and not all queries need an exact response. Massive and diverse operational datasets have been employed by large Internet Service Providers for a number of years, and mathematical methods have underpinned their response to the challenges of data scale, incompleteness and complexity that are prevalent both in ISP data and in big data more generally. This talk reviews some recent progress in this direction, and surveys some new roles for sampling methods in Big Data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE International Performance, Computing, and Communications Conference

自引率

0.00%

发文量