Data-driven initialization of evolutionary methods for process synthesis considering centrality and diversity criteria

IF 3.9 2区工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Chemical Engineering Pub Date : 2025-09-23 DOI:10.1016/j.compchemeng.2025.109416

Jean-Marc Commenge, Andres Piña-Martinez

{"title":"Data-driven initialization of evolutionary methods for process synthesis considering centrality and diversity criteria","authors":"Jean-Marc Commenge, Andres Piña-Martinez","doi":"10.1016/j.compchemeng.2025.109416","DOIUrl":null,"url":null,"abstract":"<div><div>Process synthesis using evolutionary methods, based on the iterative application of mutation operators, requires to initialize the method by one or a set of process flowsheets. Appropriate initialization might reduce computation times by providing first proposals that decrease the number of mutations to reach optimal structures, in terms of units and connectivity. This work illustrates how to identify, from a given database of flowsheets, the flowsheets that might play a pivotal role in the further evolutionary synthesis. A home-made database with over 2000 flowsheets, digitalized from 800 recent scientific publications, is used, exhibiting the variety of possible structures from single distillation columns to biorefinery layouts. Selection of initialization flowsheets should ensure diversity in structures and units while minimizing the number of mutations needed to evolve to any other process flowsheet. A distance function is defined as the minimum number of mutations required to transform one flowsheet into another, and computed for all pairs of flowsheets in the database enabling to compare their topologies and quantitatively analyze the population. Four sampling strategies are compared, considering centrality criteria, sampling flowsheets in groups of similar structures, random sampling, and k-medoids clustering. For each strategy, the distribution of distances from the selected structures to the database population and their diversity are compared. Centrality-based selection minimizes the required number of mutations but shows poor units’ diversity. Selection from distinct groups of similar structures improves performance only for distant flowsheets. Random sampling ensures diversity but performs poorly in reducing required mutations. Conversely, k-medoids sampling shows good performance in both the number of required mutations and the diversity of selected flowsheets, making it a balanced method for flowsheet sampling. The initialization strategies are applied to the case study of benzene chlorination and their fitness and diversity are monitored along the generations of the evolutionary synthesis.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"204 ","pages":"Article 109416"},"PeriodicalIF":3.9000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425004193","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Process synthesis using evolutionary methods, based on the iterative application of mutation operators, requires to initialize the method by one or a set of process flowsheets. Appropriate initialization might reduce computation times by providing first proposals that decrease the number of mutations to reach optimal structures, in terms of units and connectivity. This work illustrates how to identify, from a given database of flowsheets, the flowsheets that might play a pivotal role in the further evolutionary synthesis. A home-made database with over 2000 flowsheets, digitalized from 800 recent scientific publications, is used, exhibiting the variety of possible structures from single distillation columns to biorefinery layouts. Selection of initialization flowsheets should ensure diversity in structures and units while minimizing the number of mutations needed to evolve to any other process flowsheet. A distance function is defined as the minimum number of mutations required to transform one flowsheet into another, and computed for all pairs of flowsheets in the database enabling to compare their topologies and quantitatively analyze the population. Four sampling strategies are compared, considering centrality criteria, sampling flowsheets in groups of similar structures, random sampling, and k-medoids clustering. For each strategy, the distribution of distances from the selected structures to the database population and their diversity are compared. Centrality-based selection minimizes the required number of mutations but shows poor units’ diversity. Selection from distinct groups of similar structures improves performance only for distant flowsheets. Random sampling ensures diversity but performs poorly in reducing required mutations. Conversely, k-medoids sampling shows good performance in both the number of required mutations and the diversity of selected flowsheets, making it a balanced method for flowsheet sampling. The initialization strategies are applied to the case study of benzene chlorination and their fitness and diversity are monitored along the generations of the evolutionary synthesis.

Abstract Image

查看原文本刊更多论文

考虑中心性和多样性准则的过程综合进化方法的数据驱动初始化

基于突变算子的迭代应用，采用进化方法进行过程综合，需要通过一个或一组过程流程图对方法进行初始化。适当的初始化可以通过提供减少突变数量以达到最佳结构（就单位和连通性而言）的第一个建议来减少计算时间。这项工作说明了如何从给定的流程图数据库中识别可能在进一步的进化合成中发挥关键作用的流程图。使用了一个自制的数据库，其中包含2000多个流程图，从800个最近的科学出版物中数字化，展示了从单一蒸馏塔到生物炼制布局的各种可能结构。初始化流程的选择应确保结构和单元的多样性，同时最大限度地减少进化到任何其他工艺流程所需的突变数量。将距离函数定义为将一个流程转换为另一个流程所需的最小突变数，并计算数据库中所有对流程的距离函数，以便比较它们的拓扑结构并定量分析总体。考虑中心性标准、相似结构组的抽样流程、随机抽样和k- medium聚类，对四种抽样策略进行了比较。对于每种策略，比较从所选结构到数据库种群的距离分布及其多样性。基于中心性的选择最小化了所需的突变数量，但表现出较差的单位多样性。从相似结构的不同组中进行选择，仅对距离较远的流程才会提高性能。随机抽样确保了多样性，但在减少所需突变方面表现不佳。相反，k-medoids采样在所需突变的数量和所选流程的多样性方面都表现出良好的性能，使其成为一种平衡的流程采样方法。将这些初始化策略应用于苯氯化反应的实例研究，并在进化合成过程中对其适应度和多样性进行了监测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Chemical Engineering 工程技术-工程：化工

CiteScore

8.70

自引率

14.00%

发文量

374

审稿时长

70 days

期刊介绍： Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.