{"title":"Simultaneous outlier-exclusion and distributionally robust learning through partial optimal transport","authors":"Zhongyu Zhang, Biao Huang, Zukui Li","doi":"10.1016/j.compchemeng.2025.109408","DOIUrl":null,"url":null,"abstract":"<div><div>Distributionally robust optimization (DRO) is a powerful framework that mitigates the impact of distributional uncertainty. It aims to optimize the worst-case performance over all possible distributions within an ambiguity set, defined around a nominal distribution which is often set as the empirical distribution constructed from data. However, the presence of outliers in the data may distort the construction of the ambiguity set, thereby degrading the performance of DRO. In this work, we propose an integrated approach that combines outlier exclusion and robust model training. Applying partial optimal transport, we identify and retain the subset of samples that contribute to lower model loss, effectively filtering out potential outliers that cause large losses. This retained subset is used to construct the nominal distribution for the Wasserstein DRO formulation, which addresses the residual distributional uncertainty. We derive tractable formulations for both regression and classification problems under this framework and demonstrate its effectiveness through numerical experiments and real-world chemical process datasets. The results demonstrate that the proposed method provides a simple, effective, and implementable solution for robust learning under both outlier contamination and distributional shifts.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"205 ","pages":"Article 109408"},"PeriodicalIF":3.9000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425004119","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Distributionally robust optimization (DRO) is a powerful framework that mitigates the impact of distributional uncertainty. It aims to optimize the worst-case performance over all possible distributions within an ambiguity set, defined around a nominal distribution which is often set as the empirical distribution constructed from data. However, the presence of outliers in the data may distort the construction of the ambiguity set, thereby degrading the performance of DRO. In this work, we propose an integrated approach that combines outlier exclusion and robust model training. Applying partial optimal transport, we identify and retain the subset of samples that contribute to lower model loss, effectively filtering out potential outliers that cause large losses. This retained subset is used to construct the nominal distribution for the Wasserstein DRO formulation, which addresses the residual distributional uncertainty. We derive tractable formulations for both regression and classification problems under this framework and demonstrate its effectiveness through numerical experiments and real-world chemical process datasets. The results demonstrate that the proposed method provides a simple, effective, and implementable solution for robust learning under both outlier contamination and distributional shifts.
期刊介绍:
Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.