{"title":"以数据为中心的机器学习方法挑战的调查,以弥合生命周期库存数据差距","authors":"Bu Zhao, Jitong Jiang, Ming Xu, Qingshi Tu","doi":"10.1111/jiec.70022","DOIUrl":null,"url":null,"abstract":"<p>Life cycle assessment (LCA) is a systematic approach to quantify the environmental impacts of a product system from its entire life cycle. Despite its wide use in assessing mature technologies, the inventory data gap has been a fundamental challenge that limits the application of LCA to emerging new processes. Machine learning (ML) methods are among the possible solutions that can mitigate these data gaps in an automated and scalable way. Nonetheless, the performance of existing ML methods is unstable which limits the trustworthiness and generalizability of the models. In this study, we conducted a data-centric investigation to delineate the causes of the unstable performance using a similarity-based ML framework based on Ecoinvent 3.1 unit process (UPR) database. We found that the pattern of imbalance in the data for method development, manifest by the substantial differences in (1) flow and process availability and (2) the order of magnitude of their values, is a major cause of the unstable performance. We also identified the causes due to the challenges with ML method development workflow, particularly, the steps of data preprocessing, and ML model training (e.g., randomness in train–test data splits). In addition, we also tested the proposed ML method on the U.S. Life Cycle Inventory Database, where we observed that the generalizability of the method was highly influenced by the database size of the application. To address these issues, we proposed that further research should focus on reducing the barriers in database integration such that both the size and balance of the data for ML method development can be improved.</p>","PeriodicalId":16050,"journal":{"name":"Journal of Industrial Ecology","volume":"29 3","pages":"955-966"},"PeriodicalIF":4.9000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jiec.70022","citationCount":"0","resultStr":"{\"title\":\"A data-centric investigation on the challenges of machine learning methods for bridging life cycle inventory data gaps\",\"authors\":\"Bu Zhao, Jitong Jiang, Ming Xu, Qingshi Tu\",\"doi\":\"10.1111/jiec.70022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Life cycle assessment (LCA) is a systematic approach to quantify the environmental impacts of a product system from its entire life cycle. Despite its wide use in assessing mature technologies, the inventory data gap has been a fundamental challenge that limits the application of LCA to emerging new processes. Machine learning (ML) methods are among the possible solutions that can mitigate these data gaps in an automated and scalable way. Nonetheless, the performance of existing ML methods is unstable which limits the trustworthiness and generalizability of the models. In this study, we conducted a data-centric investigation to delineate the causes of the unstable performance using a similarity-based ML framework based on Ecoinvent 3.1 unit process (UPR) database. We found that the pattern of imbalance in the data for method development, manifest by the substantial differences in (1) flow and process availability and (2) the order of magnitude of their values, is a major cause of the unstable performance. We also identified the causes due to the challenges with ML method development workflow, particularly, the steps of data preprocessing, and ML model training (e.g., randomness in train–test data splits). In addition, we also tested the proposed ML method on the U.S. Life Cycle Inventory Database, where we observed that the generalizability of the method was highly influenced by the database size of the application. To address these issues, we proposed that further research should focus on reducing the barriers in database integration such that both the size and balance of the data for ML method development can be improved.</p>\",\"PeriodicalId\":16050,\"journal\":{\"name\":\"Journal of Industrial Ecology\",\"volume\":\"29 3\",\"pages\":\"955-966\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jiec.70022\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Industrial Ecology\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/jiec.70022\",\"RegionNum\":3,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ENVIRONMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Industrial Ecology","FirstCategoryId":"93","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jiec.70022","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ENVIRONMENTAL","Score":null,"Total":0}
A data-centric investigation on the challenges of machine learning methods for bridging life cycle inventory data gaps
Life cycle assessment (LCA) is a systematic approach to quantify the environmental impacts of a product system from its entire life cycle. Despite its wide use in assessing mature technologies, the inventory data gap has been a fundamental challenge that limits the application of LCA to emerging new processes. Machine learning (ML) methods are among the possible solutions that can mitigate these data gaps in an automated and scalable way. Nonetheless, the performance of existing ML methods is unstable which limits the trustworthiness and generalizability of the models. In this study, we conducted a data-centric investigation to delineate the causes of the unstable performance using a similarity-based ML framework based on Ecoinvent 3.1 unit process (UPR) database. We found that the pattern of imbalance in the data for method development, manifest by the substantial differences in (1) flow and process availability and (2) the order of magnitude of their values, is a major cause of the unstable performance. We also identified the causes due to the challenges with ML method development workflow, particularly, the steps of data preprocessing, and ML model training (e.g., randomness in train–test data splits). In addition, we also tested the proposed ML method on the U.S. Life Cycle Inventory Database, where we observed that the generalizability of the method was highly influenced by the database size of the application. To address these issues, we proposed that further research should focus on reducing the barriers in database integration such that both the size and balance of the data for ML method development can be improved.
期刊介绍:
The Journal of Industrial Ecology addresses a series of related topics:
material and energy flows studies (''industrial metabolism'')
technological change
dematerialization and decarbonization
life cycle planning, design and assessment
design for the environment
extended producer responsibility (''product stewardship'')
eco-industrial parks (''industrial symbiosis'')
product-oriented environmental policy
eco-efficiency
Journal of Industrial Ecology is open to and encourages submissions that are interdisciplinary in approach. In addition to more formal academic papers, the journal seeks to provide a forum for continuing exchange of information and opinions through contributions from scholars, environmental managers, policymakers, advocates and others involved in environmental science, management and policy.