{"title":"大数据分析研究的挑战与机遇","authors":"N. Duffield, Jie Wu","doi":"10.1109/PCCC.2014.7017014","DOIUrl":null,"url":null,"abstract":"One response to the proliferation of massive datasets in many fields has been to develop ingenious ways to throw resources at the problem, for example, using massive fault tolerant storage architectures, supercomputing platforms, and parallel graph computation models. However, not all environments can support this scale of resources, and not all queries need an exact response. Massive and diverse operational datasets have been employed by large Internet Service Providers for a number of years, and mathematical methods have underpinned their response to the challenges of data scale, incompleteness and complexity that are prevalent both in ISP data and in big data more generally. This talk reviews some recent progress in this direction, and surveys some new roles for sampling methods in Big Data.","PeriodicalId":442628,"journal":{"name":"IEEE International Performance, Computing, and Communications Conference","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Challenges and opportunities for analysis based research in big data\",\"authors\":\"N. Duffield, Jie Wu\",\"doi\":\"10.1109/PCCC.2014.7017014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One response to the proliferation of massive datasets in many fields has been to develop ingenious ways to throw resources at the problem, for example, using massive fault tolerant storage architectures, supercomputing platforms, and parallel graph computation models. However, not all environments can support this scale of resources, and not all queries need an exact response. Massive and diverse operational datasets have been employed by large Internet Service Providers for a number of years, and mathematical methods have underpinned their response to the challenges of data scale, incompleteness and complexity that are prevalent both in ISP data and in big data more generally. This talk reviews some recent progress in this direction, and surveys some new roles for sampling methods in Big Data.\",\"PeriodicalId\":442628,\"journal\":{\"name\":\"IEEE International Performance, Computing, and Communications Conference\",\"volume\":\"89 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Performance, Computing, and Communications Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PCCC.2014.7017014\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Performance, Computing, and Communications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PCCC.2014.7017014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Challenges and opportunities for analysis based research in big data
One response to the proliferation of massive datasets in many fields has been to develop ingenious ways to throw resources at the problem, for example, using massive fault tolerant storage architectures, supercomputing platforms, and parallel graph computation models. However, not all environments can support this scale of resources, and not all queries need an exact response. Massive and diverse operational datasets have been employed by large Internet Service Providers for a number of years, and mathematical methods have underpinned their response to the challenges of data scale, incompleteness and complexity that are prevalent both in ISP data and in big data more generally. This talk reviews some recent progress in this direction, and surveys some new roles for sampling methods in Big Data.