{"title":"Initial Data Corruption Impact on Machine Learning Models' Performance in Energy Consumption Forecast","authors":"A. Khalyasmaa, P. Matrenin","doi":"10.1109/USSEC53120.2021.9655724","DOIUrl":null,"url":null,"abstract":"The paper discusses the problem of operational risks from the application of models based on machine learning in the power industry as in the case of the power consumption forecasting problem. Currently, studies on the machine learning application in the power industry are primarily aimed at improving the accuracy, adaptive capabilities of models, selecting and preprocessing of features. At the same time, the risks at the stage of trained models' application are not given due attention, although the incorrect use of the trained models can lead to a critical deterioration in accuracy and the appearance of errors unacceptable for the models' operation. The paper considers an example of constructing XGBoost and Random Forest models for power consumption short-term forecasting of a mining enterprise, taking into account meteorological factors. Various scenarios of corruption of the initial data used by the model to form a forecast are considered. It is shown how losses and gaps in the initial data increase the power consumption forecast error, causing the risk of significant financial losses when operating on the electricity market.","PeriodicalId":260032,"journal":{"name":"2021 Ural-Siberian Smart Energy Conference (USSEC)","volume":"1075 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Ural-Siberian Smart Energy Conference (USSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/USSEC53120.2021.9655724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The paper discusses the problem of operational risks from the application of models based on machine learning in the power industry as in the case of the power consumption forecasting problem. Currently, studies on the machine learning application in the power industry are primarily aimed at improving the accuracy, adaptive capabilities of models, selecting and preprocessing of features. At the same time, the risks at the stage of trained models' application are not given due attention, although the incorrect use of the trained models can lead to a critical deterioration in accuracy and the appearance of errors unacceptable for the models' operation. The paper considers an example of constructing XGBoost and Random Forest models for power consumption short-term forecasting of a mining enterprise, taking into account meteorological factors. Various scenarios of corruption of the initial data used by the model to form a forecast are considered. It is shown how losses and gaps in the initial data increase the power consumption forecast error, causing the risk of significant financial losses when operating on the electricity market.