{"title":"Methodology for Estimating the Cost of Construction Equipment Based on the Analysis of Important Characteristics Using Machine Learning Methods","authors":"Nataliya Boyko, Oleksii Lukash","doi":"10.1155/2023/8833753","DOIUrl":null,"url":null,"abstract":"This paper considers the current market pace, which requires a corresponding competitive advantage. This study forecasted the cost of heavy machinery depending on geolocation and essential characteristics by the field of activity. This study analyzes specific categories of heavy machinery for important price characteristics. The study classified them by keywords in the text description as essential characteristics. Accordingly, a dataset was formed based on the data obtained. The research objective is to collect and structure data from web resources for the sale of heavy equipment. This paper describes in detail the preliminary data processing. The main stages of preprocessing are presented in detail: detection and processing of missing data, removing anomalous data, coding of categorical data, and scaling. The method of the average value of a specific grouped set was applied to fill in the gaps according to the characteristics and available data. The mode value from the grouped items was used to fill in the gaps. The interquartile range and standard deviation were used to detect anomalies. We used the Kolmogorov–Smirnov, KS_Test, and Lilliefors tests to check the data for normality. In this study, the assessment of abnormal data was applied separately to each set of grouped data with the same parameters. The study built and analyzed models using machine learning methods (linear and polynomial regression, decision trees, random forest, support vector machine, and neural network). Two data encoding methods were used to achieve maximum model accuracy: Label Encoder and One Hot Encoder. The work of each algorithm is considered on the example of the created dataset. In this study, the parameter used for coding was the geolocation of heavy equipment. The study pays additional attention to the specific characteristics of heavy machinery by the sector of the economy. The existing methods and tools for price forecasting, depending on the specific characteristics of the equipment, were analyzed. The practical significance of this work lies in developing an algorithm for predicting the cost of heavy machinery by assessing several parameters.","PeriodicalId":15716,"journal":{"name":"Journal of Engineering","volume":"45 1","pages":"0"},"PeriodicalIF":1.7000,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2023/8833753","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
This paper considers the current market pace, which requires a corresponding competitive advantage. This study forecasted the cost of heavy machinery depending on geolocation and essential characteristics by the field of activity. This study analyzes specific categories of heavy machinery for important price characteristics. The study classified them by keywords in the text description as essential characteristics. Accordingly, a dataset was formed based on the data obtained. The research objective is to collect and structure data from web resources for the sale of heavy equipment. This paper describes in detail the preliminary data processing. The main stages of preprocessing are presented in detail: detection and processing of missing data, removing anomalous data, coding of categorical data, and scaling. The method of the average value of a specific grouped set was applied to fill in the gaps according to the characteristics and available data. The mode value from the grouped items was used to fill in the gaps. The interquartile range and standard deviation were used to detect anomalies. We used the Kolmogorov–Smirnov, KS_Test, and Lilliefors tests to check the data for normality. In this study, the assessment of abnormal data was applied separately to each set of grouped data with the same parameters. The study built and analyzed models using machine learning methods (linear and polynomial regression, decision trees, random forest, support vector machine, and neural network). Two data encoding methods were used to achieve maximum model accuracy: Label Encoder and One Hot Encoder. The work of each algorithm is considered on the example of the created dataset. In this study, the parameter used for coding was the geolocation of heavy equipment. The study pays additional attention to the specific characteristics of heavy machinery by the sector of the economy. The existing methods and tools for price forecasting, depending on the specific characteristics of the equipment, were analyzed. The practical significance of this work lies in developing an algorithm for predicting the cost of heavy machinery by assessing several parameters.
期刊介绍:
Journal of Engineering is a peer-reviewed, Open Access journal that publishes original research articles as well as review articles in several areas of engineering. The subject areas covered by the journal are: - Chemical Engineering - Civil Engineering - Computer Engineering - Electrical Engineering - Industrial Engineering - Mechanical Engineering