ALGORITHM FOR CONSTRUCTIONAL CHARACTERISTICS DATA CLEANSING OF LARGE-SCALE PUBLIC BUILDINGS DATABASE

Hrvoje Krstić, M. Teni
{"title":"ALGORITHM FOR CONSTRUCTIONAL CHARACTERISTICS DATA CLEANSING OF LARGE-SCALE PUBLIC BUILDINGS DATABASE","authors":"Hrvoje Krstić, M. Teni","doi":"10.2495/HPSM180221","DOIUrl":null,"url":null,"abstract":"Research presented in this paper utilizes public-sector buildings database obtained from the Croatian Energy Management Information System (EMIS) which comprises over 3,500 public sector buildings. EMIS provides a transparent oversight and control of energy consumption, making itself an inevitable tool for systematic energy management. The EMIS database holds static technical data of each facility, including general, constructional data, energy performance data and dynamic energy usage data. But there are a lot of variables in a database with data values that are impossible, i.e. have values that are not logical or outside of possible, acceptable ranges, and they are probably the consequence of user input errors. Besides this, there are also cases with missing data. As previously stated, this raises the question: Is it possible to make an algorithm for data cleansing and find a way to calculate the missing data? To use the obtained database for further, more complex, analysis like clustering, machine learning and neural network applications, it is necessary to remove extreme values from the database. Research presented in this paper deals with this problem with an emphasis on buildings constructional characteristics and proposes a cleansing algorithm. As a result a possible range of variables and procedure for replacement of invalid input values is proposed. Research results and findings can be used in similar buildings databases to optimize the datasets and exclude variables with extreme values which can significantly impact modelling process. Further, the proposed algorithm can be useful when making decisions for energy refurbishment and building maintenance since it eliminates cases from the database that have misleading data. The presented results show that in some cases there are more than 80% of missing or excluded data. Findings can also be implemented in EMIS or a similar system to avoid further entering of unacceptable data values.","PeriodicalId":340058,"journal":{"name":"High Performance and Optimum Design of Structures and Materials III","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"High Performance and Optimum Design of Structures and Materials III","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2495/HPSM180221","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Research presented in this paper utilizes public-sector buildings database obtained from the Croatian Energy Management Information System (EMIS) which comprises over 3,500 public sector buildings. EMIS provides a transparent oversight and control of energy consumption, making itself an inevitable tool for systematic energy management. The EMIS database holds static technical data of each facility, including general, constructional data, energy performance data and dynamic energy usage data. But there are a lot of variables in a database with data values that are impossible, i.e. have values that are not logical or outside of possible, acceptable ranges, and they are probably the consequence of user input errors. Besides this, there are also cases with missing data. As previously stated, this raises the question: Is it possible to make an algorithm for data cleansing and find a way to calculate the missing data? To use the obtained database for further, more complex, analysis like clustering, machine learning and neural network applications, it is necessary to remove extreme values from the database. Research presented in this paper deals with this problem with an emphasis on buildings constructional characteristics and proposes a cleansing algorithm. As a result a possible range of variables and procedure for replacement of invalid input values is proposed. Research results and findings can be used in similar buildings databases to optimize the datasets and exclude variables with extreme values which can significantly impact modelling process. Further, the proposed algorithm can be useful when making decisions for energy refurbishment and building maintenance since it eliminates cases from the database that have misleading data. The presented results show that in some cases there are more than 80% of missing or excluded data. Findings can also be implemented in EMIS or a similar system to avoid further entering of unacceptable data values.
大型公共建筑数据库结构特征数据清洗算法
本文中提出的研究利用了从克罗地亚能源管理信息系统(EMIS)获得的公共部门建筑数据库,其中包括3500多座公共部门建筑。EMIS提供了对能源消耗的透明监督和控制,使其成为系统能源管理的必然工具。环境管理信息系统数据库保存每个设施的静态技术数据,包括一般数据、建筑数据、能源表现数据和动态能源使用数据。但是,数据库中有很多变量的数据值是不可能的,也就是说,它们的值不符合逻辑或超出可能的、可接受的范围,它们可能是用户输入错误的结果。除此之外,还有数据缺失的情况。如前所述,这提出了一个问题:是否有可能为数据清理制定一种算法,并找到一种方法来计算丢失的数据?为了使用获得的数据库进行进一步、更复杂的分析,如聚类、机器学习和神经网络应用,有必要从数据库中删除极值。本文的研究着重于建筑物的结构特征,并提出了一种清洁算法。因此,提出了一个可能的变量范围和替换无效输入值的过程。研究结果和发现可用于类似的建筑物数据库,以优化数据集,并排除具有极端值的变量,这些变量可能会显著影响建模过程。此外,该算法在进行能源翻新和建筑维护决策时非常有用,因为它消除了数据库中具有误导性数据的案例。所提出的结果表明,在某些情况下,有超过80%的缺失或排除的数据。调查结果也可以在EMIS或类似的系统中执行,以避免进一步输入不可接受的数据值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信