{"title":"从原始到精炼:构建机器学习 (ML)、深度学习 (DL) 和强化学习 (RL) 模型的数据预处理","authors":"","doi":"10.1016/j.autcon.2024.105844","DOIUrl":null,"url":null,"abstract":"<div><div>As the use of predictive models in construction rapidly increases, the need for preprocessing raw construction data has become more critical. This systematic review investigates data preprocessing techniques for machine learning (ML), deep learning (DL), and reinforcement learning (RL) models in the construction domain. Through a comprehensive analysis of 457 studies, the prevalence of six data types (i.e., tabular, image, video frame, time series, text, and point cloud) and their respective preprocessing methods are examined. Key findings reveal data transformation, cleaning, reduction, augmentation, and scaling as fundamental preprocessing categories, with applications varying across data types. The paper highlights knowledge gaps, including limited synthetic data adoption, lack of standardized annotation practices, absence of comprehensive preprocessing frameworks, and need for automated labeling. Furthermore, critical considerations regarding data privacy, security, sharing, and management practices are discussed. The review underscores the pivotal role of robust data preprocessing in enabling reliable predictive models.</div></div>","PeriodicalId":8660,"journal":{"name":"Automation in Construction","volume":null,"pages":null},"PeriodicalIF":9.6000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From raw to refined: Data preprocessing for construction machine learning (ML), deep learning (DL), and reinforcement learning (RL) models\",\"authors\":\"\",\"doi\":\"10.1016/j.autcon.2024.105844\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>As the use of predictive models in construction rapidly increases, the need for preprocessing raw construction data has become more critical. This systematic review investigates data preprocessing techniques for machine learning (ML), deep learning (DL), and reinforcement learning (RL) models in the construction domain. Through a comprehensive analysis of 457 studies, the prevalence of six data types (i.e., tabular, image, video frame, time series, text, and point cloud) and their respective preprocessing methods are examined. Key findings reveal data transformation, cleaning, reduction, augmentation, and scaling as fundamental preprocessing categories, with applications varying across data types. The paper highlights knowledge gaps, including limited synthetic data adoption, lack of standardized annotation practices, absence of comprehensive preprocessing frameworks, and need for automated labeling. Furthermore, critical considerations regarding data privacy, security, sharing, and management practices are discussed. The review underscores the pivotal role of robust data preprocessing in enabling reliable predictive models.</div></div>\",\"PeriodicalId\":8660,\"journal\":{\"name\":\"Automation in Construction\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":9.6000,\"publicationDate\":\"2024-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automation in Construction\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0926580524005806\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CONSTRUCTION & BUILDING TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automation in Construction","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0926580524005806","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
From raw to refined: Data preprocessing for construction machine learning (ML), deep learning (DL), and reinforcement learning (RL) models
As the use of predictive models in construction rapidly increases, the need for preprocessing raw construction data has become more critical. This systematic review investigates data preprocessing techniques for machine learning (ML), deep learning (DL), and reinforcement learning (RL) models in the construction domain. Through a comprehensive analysis of 457 studies, the prevalence of six data types (i.e., tabular, image, video frame, time series, text, and point cloud) and their respective preprocessing methods are examined. Key findings reveal data transformation, cleaning, reduction, augmentation, and scaling as fundamental preprocessing categories, with applications varying across data types. The paper highlights knowledge gaps, including limited synthetic data adoption, lack of standardized annotation practices, absence of comprehensive preprocessing frameworks, and need for automated labeling. Furthermore, critical considerations regarding data privacy, security, sharing, and management practices are discussed. The review underscores the pivotal role of robust data preprocessing in enabling reliable predictive models.
期刊介绍:
Automation in Construction is an international journal that focuses on publishing original research papers related to the use of Information Technologies in various aspects of the construction industry. The journal covers topics such as design, engineering, construction technologies, and the maintenance and management of constructed facilities.
The scope of Automation in Construction is extensive and covers all stages of the construction life cycle. This includes initial planning and design, construction of the facility, operation and maintenance, as well as the eventual dismantling and recycling of buildings and engineering structures.