Intelligent classification of construction quality problems based on unbalanced short text data mining

IF 6 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY
Dan Wang , Kai Yin , Hailong Wang
{"title":"Intelligent classification of construction quality problems based on unbalanced short text data mining","authors":"Dan Wang ,&nbsp;Kai Yin ,&nbsp;Hailong Wang","doi":"10.1016/j.asej.2024.102983","DOIUrl":null,"url":null,"abstract":"<div><p>Construction Quality Management (CQM) is important for achieving project quality objectives. Currently, CQM is mainly achieved through cyclical inspections and various tests and subsequent analysis of the generated text records. These texts record various construction quality problems (CQPs) that need to be categorized and analyzed by quality managers. However, the current classification and analysis of CQPs is mainly achieved by manual analysis or natural language processing (NLP), the former is time-consuming and labor-intensive, while the latter improves the processing efficiency but is limited by the classification perspective and fails to fully capture the root causes of the CQPs. CQPs text usually describes the problems based on the inspection area, and multiple types of problems may exist simultaneously in each record. The previous classification model of CQPs based on sub-projects can only distinguish the frequent quality problems of sub-projects but cannot analyze the essential characteristics of CQPs, and ignores the comprehensive characteristics of CQPs, such as short text and unbalanced data. Therefore, aiming at the problem of mixed text information and diverse categories of CQPs, this study constructs a TDA-WV-TextCNN model for automatic text categorization by combining the characteristics of unbalanced data and short text of CQPs, taking the actual on-site inspection reports from multiple sources as the data base, and determining the classification labels based on the perspective of CQPs result orientation. The model combines the part-of-speech-based Text Data Augmentation (TDA) method, Word2vec (WV) technique and Text Convolutional Neural Network (TextCNN) algorithm. The results show that the TDA-WV-TextCNN model has a short training time and a high accuracy in short text classification; the part-of-speech-based TDA method expands the small sample data by extracting the core feature words and the word position change, realizing the text data equalization and subsequently improving the accuracy of the model; multiple sources of data increase the diversity of data, redundant text increases the amount of data, both play an important role in improving the performance of the model, so the deletion of duplicate text is related to the model’s demand for the amount of data The research results provide a method to categorize quality reports quickly and accurately, which helps to construct the engineering quality knowledge system.</p></div>","PeriodicalId":48648,"journal":{"name":"Ain Shams Engineering Journal","volume":"15 10","pages":"Article 102983"},"PeriodicalIF":6.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2090447924003587/pdfft?md5=bf655363e7cb4bde9304105666e04f6a&pid=1-s2.0-S2090447924003587-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ain Shams Engineering Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2090447924003587","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Construction Quality Management (CQM) is important for achieving project quality objectives. Currently, CQM is mainly achieved through cyclical inspections and various tests and subsequent analysis of the generated text records. These texts record various construction quality problems (CQPs) that need to be categorized and analyzed by quality managers. However, the current classification and analysis of CQPs is mainly achieved by manual analysis or natural language processing (NLP), the former is time-consuming and labor-intensive, while the latter improves the processing efficiency but is limited by the classification perspective and fails to fully capture the root causes of the CQPs. CQPs text usually describes the problems based on the inspection area, and multiple types of problems may exist simultaneously in each record. The previous classification model of CQPs based on sub-projects can only distinguish the frequent quality problems of sub-projects but cannot analyze the essential characteristics of CQPs, and ignores the comprehensive characteristics of CQPs, such as short text and unbalanced data. Therefore, aiming at the problem of mixed text information and diverse categories of CQPs, this study constructs a TDA-WV-TextCNN model for automatic text categorization by combining the characteristics of unbalanced data and short text of CQPs, taking the actual on-site inspection reports from multiple sources as the data base, and determining the classification labels based on the perspective of CQPs result orientation. The model combines the part-of-speech-based Text Data Augmentation (TDA) method, Word2vec (WV) technique and Text Convolutional Neural Network (TextCNN) algorithm. The results show that the TDA-WV-TextCNN model has a short training time and a high accuracy in short text classification; the part-of-speech-based TDA method expands the small sample data by extracting the core feature words and the word position change, realizing the text data equalization and subsequently improving the accuracy of the model; multiple sources of data increase the diversity of data, redundant text increases the amount of data, both play an important role in improving the performance of the model, so the deletion of duplicate text is related to the model’s demand for the amount of data The research results provide a method to categorize quality reports quickly and accurately, which helps to construct the engineering quality knowledge system.

基于非平衡短文本数据挖掘的建筑质量问题智能分类
施工质量管理(CQM)对于实现项目质量目标非常重要。目前,CQM 主要通过周期性检查和各种测试以及随后对生成的文本记录进行分析来实现。这些文本记录了各种施工质量问题(CQP),需要质量管理人员对其进行分类和分析。然而,目前对 CQPs 的分类和分析主要通过人工分析或自然语言处理(NLP)来实现,前者耗时耗力,后者虽然提高了处理效率,但受限于分类角度,无法全面捕捉 CQPs 的根本原因。CQPs 文本通常根据检查区域来描述问题,每条记录中可能同时存在多种类型的问题。以往基于分项工程的 CQPs 分类模型只能区分出分项工程经常出现的质量问题,无法分析 CQPs 的本质特征,忽略了 CQPs 文本短小、数据不均衡等综合特征。因此,本研究针对 CQP 文本信息混杂、类别多样的问题,结合 CQP 数据不均衡、文本短小的特点,以多种来源的实际现场检测报告为数据基础,基于 CQP 结果导向的视角确定分类标签,构建了文本自动分类的 TDA-WV-TextCNN 模型。该模型结合了基于部分语音的文本数据增强(TDA)方法、Word2vec(WV)技术和文本卷积神经网络(TextCNN)算法。结果表明,TDA-WV-TextCNN 模型训练时间短,短文分类准确率高;基于部分语音的 TDA 方法通过提取核心特征词和词位变化,扩展了小样本数据,实现了文本数据均衡化,从而提高了模型的准确率;多源数据增加了数据的多样性,冗余文本增加了数据量,两者对提高模型的性能都有重要作用,因此删除重复文本关系到模型对数据量的需求 研究成果提供了一种快速准确地对质量报告进行分类的方法,有助于构建工程质量知识体系。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ain Shams Engineering Journal
Ain Shams Engineering Journal Engineering-General Engineering
CiteScore
10.80
自引率
13.30%
发文量
441
审稿时长
49 weeks
期刊介绍: in Shams Engineering Journal is an international journal devoted to publication of peer reviewed original high-quality research papers and review papers in both traditional topics and those of emerging science and technology. Areas of both theoretical and fundamental interest as well as those concerning industrial applications, emerging instrumental techniques and those which have some practical application to an aspect of human endeavor, such as the preservation of the environment, health, waste disposal are welcome. The overall focus is on original and rigorous scientific research results which have generic significance. Ain Shams Engineering Journal focuses upon aspects of mechanical engineering, electrical engineering, civil engineering, chemical engineering, petroleum engineering, environmental engineering, architectural and urban planning engineering. Papers in which knowledge from other disciplines is integrated with engineering are especially welcome like nanotechnology, material sciences, and computational methods as well as applied basic sciences: engineering mathematics, physics and chemistry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信