Xingbo Gong , Xingyu Tao , Yuqing Xu , Helen H.L. Kwok , Weiwei Chen , Da Shi , Dezhi Li , Jack C.P. Cheng
{"title":"基于建筑信息建模和大型语言模型的建筑ESG自动感知数据处理","authors":"Xingbo Gong , Xingyu Tao , Yuqing Xu , Helen H.L. Kwok , Weiwei Chen , Da Shi , Dezhi Li , Jack C.P. Cheng","doi":"10.1016/j.aei.2025.103920","DOIUrl":null,"url":null,"abstract":"<div><div>Environmental, Social and Governance (ESG) assessment and disclosure are critical for architecture, engineering, and construction (AEC) companies to market their financial results, reputational position, and compliance with regulatory requirements. Within this framework, the environmental (“E”) dimension presents unique and formidable data management challenges distinct from social and governance aspects. Specifically, the complex interplay of quantitative metrics and qualitative descriptions within ‘E’-aware data (e.g., measurable resource consumption alongside descriptive material sourcing practices, emissions figures coupled with compliance narratives), amplified by its sheer volume and the persistent ambiguity of environmental indicators and reporting standards, poses significant obstacles to effective ‘E’-aware data disclosure. Large Language Models (LLMs) possess inherent advantages in processing such complex environmental information due to their proficient language processing and generalization capabilities. Nonetheless, the development of LLM-based methods explicitly tailored for environmental data management within the construction sector remains underexplored. To this end, this study introduces an automated, LLM-enhanced “E”-aware data processing approach for the construction industry. The innovation of this framework is threefold. First, fifteen “E”-aware indicators are meticulously crafted to align with the specific needs of construction entities. Second, an “E”-aware algorithm, integrated within the Building Information Modeling (BIM) framework, is devised to streamline the aggregation and quantification of environmental data. Third, an LLM-enhanced complex structured data processing mechanism using retrieval augmented generation (RAG) is proposed to facilitate the efficient processing of “E”-aware data pertinent to construction projects. An illustrative case study is employed to validate the feasibility and efficacy of the proposed methodology. The results demonstrate that the developed automated RAG-LLM enhanced framework significantly advances current practice by: (1) enabling standardized “E”-aware data specifications and source mapping; (2) drastically reducing processing time for large-scale ESG documentation (saving 64.4% of time); and (3) providing a robust solution for handling multi-source, multi-format data, thereby enhancing the efficiency and reliability of environmental management and ESG disclosure in the AEC industry.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"69 ","pages":"Article 103920"},"PeriodicalIF":9.9000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automated “E”-aware data processing for construction ESG using building information modeling and large language model\",\"authors\":\"Xingbo Gong , Xingyu Tao , Yuqing Xu , Helen H.L. Kwok , Weiwei Chen , Da Shi , Dezhi Li , Jack C.P. Cheng\",\"doi\":\"10.1016/j.aei.2025.103920\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Environmental, Social and Governance (ESG) assessment and disclosure are critical for architecture, engineering, and construction (AEC) companies to market their financial results, reputational position, and compliance with regulatory requirements. Within this framework, the environmental (“E”) dimension presents unique and formidable data management challenges distinct from social and governance aspects. Specifically, the complex interplay of quantitative metrics and qualitative descriptions within ‘E’-aware data (e.g., measurable resource consumption alongside descriptive material sourcing practices, emissions figures coupled with compliance narratives), amplified by its sheer volume and the persistent ambiguity of environmental indicators and reporting standards, poses significant obstacles to effective ‘E’-aware data disclosure. Large Language Models (LLMs) possess inherent advantages in processing such complex environmental information due to their proficient language processing and generalization capabilities. Nonetheless, the development of LLM-based methods explicitly tailored for environmental data management within the construction sector remains underexplored. To this end, this study introduces an automated, LLM-enhanced “E”-aware data processing approach for the construction industry. The innovation of this framework is threefold. First, fifteen “E”-aware indicators are meticulously crafted to align with the specific needs of construction entities. Second, an “E”-aware algorithm, integrated within the Building Information Modeling (BIM) framework, is devised to streamline the aggregation and quantification of environmental data. Third, an LLM-enhanced complex structured data processing mechanism using retrieval augmented generation (RAG) is proposed to facilitate the efficient processing of “E”-aware data pertinent to construction projects. An illustrative case study is employed to validate the feasibility and efficacy of the proposed methodology. The results demonstrate that the developed automated RAG-LLM enhanced framework significantly advances current practice by: (1) enabling standardized “E”-aware data specifications and source mapping; (2) drastically reducing processing time for large-scale ESG documentation (saving 64.4% of time); and (3) providing a robust solution for handling multi-source, multi-format data, thereby enhancing the efficiency and reliability of environmental management and ESG disclosure in the AEC industry.</div></div>\",\"PeriodicalId\":50941,\"journal\":{\"name\":\"Advanced Engineering Informatics\",\"volume\":\"69 \",\"pages\":\"Article 103920\"},\"PeriodicalIF\":9.9000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advanced Engineering Informatics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1474034625008134\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034625008134","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
环境、社会和治理(ESG)评估和披露对于建筑、工程和施工(AEC)公司推销其财务结果、声誉地位和遵守监管要求至关重要。在此框架中,环境(“E”)维度呈现出与社会和治理方面不同的独特而强大的数据管理挑战。具体来说,“E”意识数据中定量指标和定性描述的复杂相互作用(例如,可测量的资源消耗与描述性材料采购实践、排放数字与合规叙述相结合),由于其庞大的数量和环境指标和报告标准的持续模糊性而被放大,对有效的“E”意识数据披露构成了重大障碍。大型语言模型(Large Language Models, llm)由于其熟练的语言处理和泛化能力,在处理如此复杂的环境信息方面具有固有的优势。尽管如此,明确为建筑行业的环境数据管理量身定制的基于法学硕士的方法的开发仍未得到充分探索。为此,本研究为建筑行业引入了一种自动化的、llm增强的“E”感知数据处理方法。这个框架的创新有三个方面。首先,精心制定了15项“E”意识指标,以配合建筑实体的具体需求。其次,在建筑信息模型(BIM)框架内集成了一个“E”感知算法,以简化环境数据的汇总和量化。第三,提出了一种基于检索增强生成(RAG)的llm增强复杂结构化数据处理机制,以促进对与建筑项目相关的“E”感知数据的高效处理。通过实例分析,验证了所提方法的可行性和有效性。结果表明,开发的自动化RAG-LLM增强框架通过以下方式显著推进了当前的实践:(1)实现了标准化的“E”感知数据规范和源映射;(2)大幅缩短大规模ESG文件的处理时间(节省64.4%的时间);(3)为处理多来源、多格式的数据提供强大的解决方案,从而提高AEC行业环境管理和ESG披露的效率和可靠性。
Automated “E”-aware data processing for construction ESG using building information modeling and large language model
Environmental, Social and Governance (ESG) assessment and disclosure are critical for architecture, engineering, and construction (AEC) companies to market their financial results, reputational position, and compliance with regulatory requirements. Within this framework, the environmental (“E”) dimension presents unique and formidable data management challenges distinct from social and governance aspects. Specifically, the complex interplay of quantitative metrics and qualitative descriptions within ‘E’-aware data (e.g., measurable resource consumption alongside descriptive material sourcing practices, emissions figures coupled with compliance narratives), amplified by its sheer volume and the persistent ambiguity of environmental indicators and reporting standards, poses significant obstacles to effective ‘E’-aware data disclosure. Large Language Models (LLMs) possess inherent advantages in processing such complex environmental information due to their proficient language processing and generalization capabilities. Nonetheless, the development of LLM-based methods explicitly tailored for environmental data management within the construction sector remains underexplored. To this end, this study introduces an automated, LLM-enhanced “E”-aware data processing approach for the construction industry. The innovation of this framework is threefold. First, fifteen “E”-aware indicators are meticulously crafted to align with the specific needs of construction entities. Second, an “E”-aware algorithm, integrated within the Building Information Modeling (BIM) framework, is devised to streamline the aggregation and quantification of environmental data. Third, an LLM-enhanced complex structured data processing mechanism using retrieval augmented generation (RAG) is proposed to facilitate the efficient processing of “E”-aware data pertinent to construction projects. An illustrative case study is employed to validate the feasibility and efficacy of the proposed methodology. The results demonstrate that the developed automated RAG-LLM enhanced framework significantly advances current practice by: (1) enabling standardized “E”-aware data specifications and source mapping; (2) drastically reducing processing time for large-scale ESG documentation (saving 64.4% of time); and (3) providing a robust solution for handling multi-source, multi-format data, thereby enhancing the efficiency and reliability of environmental management and ESG disclosure in the AEC industry.
期刊介绍:
Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.