ESGReveal：一种基于法学硕士的方法，用于从ESG报告中提取结构化数据

IF 10 1区环境科学与生态学 Q1 ENGINEERING, ENVIRONMENTAL

Journal of Cleaner Production Pub Date : 2025-01-15 DOI:10.1016/j.jclepro.2024.144572

Yi Zou , Mengying Shi , Zhongjie Chen , Zhu Deng , Zongxiong Lei , Zihan Zeng , Shiming Yang , Hongxiang Tong , Lei Xiao , Wenwen Zhou

{"title":"ESGReveal：一种基于法学硕士的方法，用于从ESG报告中提取结构化数据","authors":"Yi Zou , Mengying Shi , Zhongjie Chen , Zhu Deng , Zongxiong Lei , Zihan Zeng , Shiming Yang , Hongxiang Tong , Lei Xiao , Wenwen Zhou","doi":"10.1016/j.jclepro.2024.144572","DOIUrl":null,"url":null,"abstract":"<div><div>As an important source for disclosure a company's environmental, social, and governance (ESG) performance, stock exchanges gradually strengthen their requirements for listed companies to periodically submit their ESG exports. However, these documents are often unstructured, making it difficult to directly evaluate a company's disclosure level as well as the performance quantitatively. In this study, we develop a quantitative framework, ESGReveal, for assessing corporate ESG performance based on large language model (LLM) techniques. Specifically, by integrating retrieval-augmented generation (RAG) technology with LLMs, we extract relevant performance data from complex corporate ESG reports. The ESGReveal framework consists of three primary modules: an ESG Metadata module for standardized queries, a Report Preprocessing module for database construction, and an LLM Agent module for data extraction. We evaluated the performance of various LLMs, including GPT-3.5, GPT-4, ChatGLM, and QWEN, and found that GPT-4 achieved 76.9% accuracy in data extraction and 83.7% accuracy in disclosure analysis, showing the best improvement over baseline models. We applied this ESGReveal model to 2249 ESG reports published by 166 companies across 12 industries listed on the Hong Kong Stock Exchange (HKEx), analyzing the disclosure and performance of key ESG indicators. Results show that for mandatory environmental and social indicators required by HKEx, the sample companies achieved disclosure rates of 69.5% and 57.2%, respectively. Different industries exhibited varying performance in key ESG indicators, such as the proportion of direct and indirect greenhouse gas emissions, highlighting key areas for future emission reduction efforts. These findings underscore the need to strengthen ESG practices across sectors and emphasize both general and sector-specific ESG initiatives. In summary, by leveraging the capabilities of LLM and RAG technologies, ESGReveal offers a practical and efficient solution to the pressing need for consistent and accurate ESG information retrieval.</div></div>","PeriodicalId":349,"journal":{"name":"Journal of Cleaner Production","volume":"489 ","pages":"Article 144572"},"PeriodicalIF":10.0000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ESGReveal: An LLM-based approach for extracting structured data from ESG reports\",\"authors\":\"Yi Zou , Mengying Shi , Zhongjie Chen , Zhu Deng , Zongxiong Lei , Zihan Zeng , Shiming Yang , Hongxiang Tong , Lei Xiao , Wenwen Zhou\",\"doi\":\"10.1016/j.jclepro.2024.144572\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>As an important source for disclosure a company's environmental, social, and governance (ESG) performance, stock exchanges gradually strengthen their requirements for listed companies to periodically submit their ESG exports. However, these documents are often unstructured, making it difficult to directly evaluate a company's disclosure level as well as the performance quantitatively. In this study, we develop a quantitative framework, ESGReveal, for assessing corporate ESG performance based on large language model (LLM) techniques. Specifically, by integrating retrieval-augmented generation (RAG) technology with LLMs, we extract relevant performance data from complex corporate ESG reports. The ESGReveal framework consists of three primary modules: an ESG Metadata module for standardized queries, a Report Preprocessing module for database construction, and an LLM Agent module for data extraction. We evaluated the performance of various LLMs, including GPT-3.5, GPT-4, ChatGLM, and QWEN, and found that GPT-4 achieved 76.9% accuracy in data extraction and 83.7% accuracy in disclosure analysis, showing the best improvement over baseline models. We applied this ESGReveal model to 2249 ESG reports published by 166 companies across 12 industries listed on the Hong Kong Stock Exchange (HKEx), analyzing the disclosure and performance of key ESG indicators. Results show that for mandatory environmental and social indicators required by HKEx, the sample companies achieved disclosure rates of 69.5% and 57.2%, respectively. Different industries exhibited varying performance in key ESG indicators, such as the proportion of direct and indirect greenhouse gas emissions, highlighting key areas for future emission reduction efforts. These findings underscore the need to strengthen ESG practices across sectors and emphasize both general and sector-specific ESG initiatives. In summary, by leveraging the capabilities of LLM and RAG technologies, ESGReveal offers a practical and efficient solution to the pressing need for consistent and accurate ESG information retrieval.</div></div>\",\"PeriodicalId\":349,\"journal\":{\"name\":\"Journal of Cleaner Production\",\"volume\":\"489 \",\"pages\":\"Article 144572\"},\"PeriodicalIF\":10.0000,\"publicationDate\":\"2025-01-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cleaner Production\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0959652624040216\",\"RegionNum\":1,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ENVIRONMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cleaner Production","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0959652624040216","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}

引用次数: 0

摘要

作为公司环境、社会和治理（ESG）绩效披露的重要来源，证券交易所逐步加强了对上市公司定期提交ESG输出的要求。然而，这些文件往往是非结构化的，因此很难直接评估公司的披露水平和量化业绩。在本研究中，我们开发了一个定量框架ESGReveal，用于基于大型语言模型（LLM）技术评估企业ESG绩效。具体来说，通过将检索增强生成（RAG）技术与法学硕士相结合，我们从复杂的企业ESG报告中提取出相关的绩效数据。ESGReveal框架由三个主要模块组成：用于标准化查询的ESG Metadata模块、用于数据库构建的Report Preprocessing模块和用于数据提取的LLM Agent模块。我们评估了包括GPT-3.5、GPT-4、ChatGLM和QWEN在内的各种llm的性能，发现GPT-4在数据提取方面的准确率为76.9%，在披露分析方面的准确率为83.7%，比基线模型的提高幅度最大。我们将ESGReveal模型应用于香港联合交易所（HKEx） 12个行业166家公司发布的2249份ESG报告，分析了关键ESG指标的披露和表现。结果显示，对于港交所要求的强制性环境和社会指标，样本公司的披露率分别为69.5%和57.2%。不同行业在直接和间接温室气体排放比例等关键ESG指标上表现各异，凸显了未来减排的重点领域。这些研究结果强调了加强跨部门ESG实践的必要性，并强调了一般和特定部门的ESG倡议。综上所述，通过利用LLM和RAG技术的能力，ESGReveal为一致和准确的ESG信息检索提供了一种实用高效的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ESGReveal: An LLM-based approach for extracting structured data from ESG reports

As an important source for disclosure a company's environmental, social, and governance (ESG) performance, stock exchanges gradually strengthen their requirements for listed companies to periodically submit their ESG exports. However, these documents are often unstructured, making it difficult to directly evaluate a company's disclosure level as well as the performance quantitatively. In this study, we develop a quantitative framework, ESGReveal, for assessing corporate ESG performance based on large language model (LLM) techniques. Specifically, by integrating retrieval-augmented generation (RAG) technology with LLMs, we extract relevant performance data from complex corporate ESG reports. The ESGReveal framework consists of three primary modules: an ESG Metadata module for standardized queries, a Report Preprocessing module for database construction, and an LLM Agent module for data extraction. We evaluated the performance of various LLMs, including GPT-3.5, GPT-4, ChatGLM, and QWEN, and found that GPT-4 achieved 76.9% accuracy in data extraction and 83.7% accuracy in disclosure analysis, showing the best improvement over baseline models. We applied this ESGReveal model to 2249 ESG reports published by 166 companies across 12 industries listed on the Hong Kong Stock Exchange (HKEx), analyzing the disclosure and performance of key ESG indicators. Results show that for mandatory environmental and social indicators required by HKEx, the sample companies achieved disclosure rates of 69.5% and 57.2%, respectively. Different industries exhibited varying performance in key ESG indicators, such as the proportion of direct and indirect greenhouse gas emissions, highlighting key areas for future emission reduction efforts. These findings underscore the need to strengthen ESG practices across sectors and emphasize both general and sector-specific ESG initiatives. In summary, by leveraging the capabilities of LLM and RAG technologies, ESGReveal offers a practical and efficient solution to the pressing need for consistent and accurate ESG information retrieval.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Cleaner Production 环境科学-工程：环境

CiteScore

20.40

自引率

9.00%

发文量

4720

审稿时长

111 days

期刊介绍： The Journal of Cleaner Production is an international, transdisciplinary journal that addresses and discusses theoretical and practical Cleaner Production, Environmental, and Sustainability issues. It aims to help societies become more sustainable by focusing on the concept of 'Cleaner Production', which aims at preventing waste production and increasing efficiencies in energy, water, resources, and human capital use. The journal serves as a platform for corporations, governments, education institutions, regions, and societies to engage in discussions and research related to Cleaner Production, environmental, and sustainability practices.