原始风数据异常数据识别的两阶段多模型集成

2016 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC) Pub Date : 2016-10-01 DOI:10.1109/APPEEC.2016.7779621

K. Hou, D. Xia, Qun Li, Xingwei Xu, Han Yue, Kefei Wang, Lei Chen, Le Zheng

{"title":"原始风数据异常数据识别的两阶段多模型集成","authors":"K. Hou, D. Xia, Qun Li, Xingwei Xu, Han Yue, Kefei Wang, Lei Chen, Le Zheng","doi":"10.1109/APPEEC.2016.7779621","DOIUrl":null,"url":null,"abstract":"Wind energy integration research generally relies on complex sensors located at remote sites. The procedure for generating high-level synthetic information from databases containing large amounts of low-level data must therefore account for possible sensor failures and imperfect input data. Data-mining methods are widely used for recognizing the relationship between wind farm power output and wind speed, which is important for wind power prediction. Incorrect and unnatural data has great influence on the results. To address this problem, the paper presents an empirical methodology that can efficiently preprocess and filter the raw wind data using a two-stage ensemble of diverse models. First, abnormal features are extracted from raw wind data and the dataset is labeled according to the wind farm operation state records and the characters of typical abnormal data. Next, a two-stage classification model is built by Random Forest (RF) and Gradient Boosting Decision Tree (GBDT). In the first stage, a RF classifier is trained with the labeled dataset as input. In the second stage, a GBDT classifier is trained with the labeled dataset and the RF classification result as input. Finally, the testing set is predicted respectively by the two trained models and the average of forecast values of the RF model and the GBDT model are considered as the final result. The methodology was tested successfully on the data collected from a large wind farm in northeast China.","PeriodicalId":117485,"journal":{"name":"2016 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A two-stage ensemble of diverse models for recognition of abnormal data in raw wind data\",\"authors\":\"K. Hou, D. Xia, Qun Li, Xingwei Xu, Han Yue, Kefei Wang, Lei Chen, Le Zheng\",\"doi\":\"10.1109/APPEEC.2016.7779621\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Wind energy integration research generally relies on complex sensors located at remote sites. The procedure for generating high-level synthetic information from databases containing large amounts of low-level data must therefore account for possible sensor failures and imperfect input data. Data-mining methods are widely used for recognizing the relationship between wind farm power output and wind speed, which is important for wind power prediction. Incorrect and unnatural data has great influence on the results. To address this problem, the paper presents an empirical methodology that can efficiently preprocess and filter the raw wind data using a two-stage ensemble of diverse models. First, abnormal features are extracted from raw wind data and the dataset is labeled according to the wind farm operation state records and the characters of typical abnormal data. Next, a two-stage classification model is built by Random Forest (RF) and Gradient Boosting Decision Tree (GBDT). In the first stage, a RF classifier is trained with the labeled dataset as input. In the second stage, a GBDT classifier is trained with the labeled dataset and the RF classification result as input. Finally, the testing set is predicted respectively by the two trained models and the average of forecast values of the RF model and the GBDT model are considered as the final result. The methodology was tested successfully on the data collected from a large wind farm in northeast China.\",\"PeriodicalId\":117485,\"journal\":{\"name\":\"2016 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APPEEC.2016.7779621\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APPEEC.2016.7779621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

风能整合研究通常依赖于位于偏远地点的复杂传感器。因此，从包含大量低级数据的数据库生成高级合成信息的程序必须考虑到可能出现的传感器故障和不完善的输入数据。数据挖掘方法被广泛用于识别风电场输出功率与风速之间的关系，这对风电功率预测具有重要意义。不正确和不自然的数据对结果有很大的影响。为了解决这一问题，本文提出了一种经验方法，该方法可以使用不同模型的两阶段集成有效地预处理和过滤原始风数据。首先，从原始风数据中提取异常特征，并根据风电场运行状态记录和典型异常数据特征对数据集进行标注;其次，利用随机森林(RF)和梯度增强决策树(GBDT)建立了两阶段分类模型。在第一阶段，使用标记的数据集作为输入来训练RF分类器。在第二阶段，使用标记的数据集和RF分类结果作为输入来训练GBDT分类器。最后，将两个训练好的模型分别对测试集进行预测，并将RF模型和GBDT模型预测值的平均值作为最终结果。该方法在中国东北某大型风电场的数据上得到了成功的验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A two-stage ensemble of diverse models for recognition of abnormal data in raw wind data

Wind energy integration research generally relies on complex sensors located at remote sites. The procedure for generating high-level synthetic information from databases containing large amounts of low-level data must therefore account for possible sensor failures and imperfect input data. Data-mining methods are widely used for recognizing the relationship between wind farm power output and wind speed, which is important for wind power prediction. Incorrect and unnatural data has great influence on the results. To address this problem, the paper presents an empirical methodology that can efficiently preprocess and filter the raw wind data using a two-stage ensemble of diverse models. First, abnormal features are extracted from raw wind data and the dataset is labeled according to the wind farm operation state records and the characters of typical abnormal data. Next, a two-stage classification model is built by Random Forest (RF) and Gradient Boosting Decision Tree (GBDT). In the first stage, a RF classifier is trained with the labeled dataset as input. In the second stage, a GBDT classifier is trained with the labeled dataset and the RF classification result as input. Finally, the testing set is predicted respectively by the two trained models and the average of forecast values of the RF model and the GBDT model are considered as the final result. The methodology was tested successfully on the data collected from a large wind farm in northeast China.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC)

自引率

0.00%

发文量