Vasil Shteriyanov , Rimma Dzhusupova , Jan Bosch , Helena Holmström Olsson
{"title":"From text to meaning: Semantic interpretation of non-standardized metadata in piping and instrumentation diagrams","authors":"Vasil Shteriyanov , Rimma Dzhusupova , Jan Bosch , Helena Holmström Olsson","doi":"10.1016/j.compchemeng.2025.109436","DOIUrl":null,"url":null,"abstract":"<div><div>The extraction of structured metadata from Piping and Instrumentation Diagrams (P&IDs) is a major bottleneck for digitalization in the process industries. Existing methods, based on Optical Character Recognition (OCR), stop at raw text extraction, failing to interpret critical engineering information encoded within variable-format identifiers like pipeline numbers. This paper bridges this semantic gap by introducing a system for the format-aware interpretation of P&ID pipeline metadata. Our hybrid system architecture integrates deep learning for text recognition with domain interpretation rules that allow the system to adapt to new project formats without model retraining. These rules perform validation, error correction, and semantic mapping of raw text to structured data. We validated our system on a challenging dataset of real-world P&IDs from four distinct industrial projects, each with a unique and complex pipeline number format. Our method achieved 91.1% end-to-end accuracy, demonstrating a significant leap in performance over standard OCR tools, which proved insufficient for the task. This work presents a robust solution that unlocks valuable data from non-standardized engineering documents, providing a practical pathway for creating reliable digital twins and supporting plant lifecycle management in the chemical engineering sector.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"204 ","pages":"Article 109436"},"PeriodicalIF":3.9000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425004399","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
The extraction of structured metadata from Piping and Instrumentation Diagrams (P&IDs) is a major bottleneck for digitalization in the process industries. Existing methods, based on Optical Character Recognition (OCR), stop at raw text extraction, failing to interpret critical engineering information encoded within variable-format identifiers like pipeline numbers. This paper bridges this semantic gap by introducing a system for the format-aware interpretation of P&ID pipeline metadata. Our hybrid system architecture integrates deep learning for text recognition with domain interpretation rules that allow the system to adapt to new project formats without model retraining. These rules perform validation, error correction, and semantic mapping of raw text to structured data. We validated our system on a challenging dataset of real-world P&IDs from four distinct industrial projects, each with a unique and complex pipeline number format. Our method achieved 91.1% end-to-end accuracy, demonstrating a significant leap in performance over standard OCR tools, which proved insufficient for the task. This work presents a robust solution that unlocks valuable data from non-standardized engineering documents, providing a practical pathway for creating reliable digital twins and supporting plant lifecycle management in the chemical engineering sector.
期刊介绍:
Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.