{"title":"大规模探索性分析,清洗和建模事件检测在现实世界的电力系统数据","authors":"R. Hafen, Tara D. Gibson, K. K. Dam, T. Critchlow","doi":"10.1145/2536780.2536783","DOIUrl":null,"url":null,"abstract":"In this paper, we present an approach to large-scale data analysis, Divide and Recombine (D&R), and describe a hardware and software implementation that supports this approach. We then illustrate the use of D&R on large-scale power systems sensor data to perform initial exploration, discover multiple data integrity issues, build and validate algorithms to filter bad data, and construct statistical event detection algorithms. This paper also reports on experiences using a non-traditional Hadoop distributed computing setup on top of a HPC computing cluster.","PeriodicalId":153844,"journal":{"name":"HiPCNA-PG '13","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Large-scale exploratory analysis, cleaning, and modeling for event detection in real-world power systems data\",\"authors\":\"R. Hafen, Tara D. Gibson, K. K. Dam, T. Critchlow\",\"doi\":\"10.1145/2536780.2536783\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present an approach to large-scale data analysis, Divide and Recombine (D&R), and describe a hardware and software implementation that supports this approach. We then illustrate the use of D&R on large-scale power systems sensor data to perform initial exploration, discover multiple data integrity issues, build and validate algorithms to filter bad data, and construct statistical event detection algorithms. This paper also reports on experiences using a non-traditional Hadoop distributed computing setup on top of a HPC computing cluster.\",\"PeriodicalId\":153844,\"journal\":{\"name\":\"HiPCNA-PG '13\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"HiPCNA-PG '13\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2536780.2536783\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"HiPCNA-PG '13","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2536780.2536783","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Large-scale exploratory analysis, cleaning, and modeling for event detection in real-world power systems data
In this paper, we present an approach to large-scale data analysis, Divide and Recombine (D&R), and describe a hardware and software implementation that supports this approach. We then illustrate the use of D&R on large-scale power systems sensor data to perform initial exploration, discover multiple data integrity issues, build and validate algorithms to filter bad data, and construct statistical event detection algorithms. This paper also reports on experiences using a non-traditional Hadoop distributed computing setup on top of a HPC computing cluster.