{"title":"A Large Language Model-based Framework to Retrieve Life Cycle Inventory and Environmental Impact Data from Scientific Literature","authors":"Avan Kumar, Farshid Nazemi, Hariprasad Kodamana, Manojkumar Ramteke, Bhavik R. Bakshi","doi":"10.1021/acs.est.5c05955","DOIUrl":null,"url":null,"abstract":"Life cycle assessment (LCA) quantifies environmental impacts from raw material extraction to end-of-life (EoL) treatment, yet its accuracy depends on reliable life cycle inventory (LCI) data. However, obtaining such data is time-consuming and requires an extensive literature review or access to databases that are often behind paywalls that hinder transparent research. This study introduces a systematic framework leveraging a retrained large language model (LLM) to assist LCA practitioners in retrieving LCI data and insightful information about their environmental impact. The framework follows a three-stage process: (i) a fine-tuned classification model identifies relevant documents, (ii) the LLaMA-2-7B model is pretrained on selected texts to inject domain knowledge into its database, and (iii) a fine-tuned Q&A model extracts LCI and environmental impact data from the scientific literature. The resulting LLM is termed as “Sustain-LLaMA”. We implement this framework in two cases: methanol production and plastic packaging EoL treatment. After retraining, the classification models achieve high accuracies (0.850 for methanol, 0.952 for plastic packaging) for unseen data, which means effectively distinguishing relevant studies. The Q&A models with Retrieval Augmentated Generation (RAG) yield F1 scores of 0.823 for methanol and 0.855 for plastic studies. The Q&A models’ performances are validated against the version of LLaMA-2-7B without retraining, ChatGPT-4o, and the USLCI database, demonstrating comparable or superior accuracy and efficiency. This framework enhances scalability and precision by automating LCI data retrieval, offering a promising tool for guiding the chemical and plastic industries toward sustainability.","PeriodicalId":36,"journal":{"name":"环境科学与技术","volume":"75 1","pages":""},"PeriodicalIF":11.3000,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"环境科学与技术","FirstCategoryId":"1","ListUrlMain":"https://doi.org/10.1021/acs.est.5c05955","RegionNum":1,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Life cycle assessment (LCA) quantifies environmental impacts from raw material extraction to end-of-life (EoL) treatment, yet its accuracy depends on reliable life cycle inventory (LCI) data. However, obtaining such data is time-consuming and requires an extensive literature review or access to databases that are often behind paywalls that hinder transparent research. This study introduces a systematic framework leveraging a retrained large language model (LLM) to assist LCA practitioners in retrieving LCI data and insightful information about their environmental impact. The framework follows a three-stage process: (i) a fine-tuned classification model identifies relevant documents, (ii) the LLaMA-2-7B model is pretrained on selected texts to inject domain knowledge into its database, and (iii) a fine-tuned Q&A model extracts LCI and environmental impact data from the scientific literature. The resulting LLM is termed as “Sustain-LLaMA”. We implement this framework in two cases: methanol production and plastic packaging EoL treatment. After retraining, the classification models achieve high accuracies (0.850 for methanol, 0.952 for plastic packaging) for unseen data, which means effectively distinguishing relevant studies. The Q&A models with Retrieval Augmentated Generation (RAG) yield F1 scores of 0.823 for methanol and 0.855 for plastic studies. The Q&A models’ performances are validated against the version of LLaMA-2-7B without retraining, ChatGPT-4o, and the USLCI database, demonstrating comparable or superior accuracy and efficiency. This framework enhances scalability and precision by automating LCI data retrieval, offering a promising tool for guiding the chemical and plastic industries toward sustainability.
期刊介绍:
Environmental Science & Technology (ES&T) is a co-sponsored academic and technical magazine by the Hubei Provincial Environmental Protection Bureau and the Hubei Provincial Academy of Environmental Sciences.
Environmental Science & Technology (ES&T) holds the status of Chinese core journals, scientific papers source journals of China, Chinese Science Citation Database source journals, and Chinese Academic Journal Comprehensive Evaluation Database source journals. This publication focuses on the academic field of environmental protection, featuring articles related to environmental protection and technical advancements.