Xiaoyang Ren, Dongwei Dou, Xianying He, Fangfang Cui, Jie Zhao
{"title":"A serialization method for digitizing the image-based medical laboratory report.","authors":"Xiaoyang Ren, Dongwei Dou, Xianying He, Fangfang Cui, Jie Zhao","doi":"10.1177/20552076251334431","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>When applying for teleconsultations, medical laboratory reports are usually photographed with a mobile phone, and the photographic results are uploaded as teleconsultation application materials. It is very meaningful to extract the content of the image medical laboratory report and store the content digitally. There are already applications of OCR technology for medical text file recognition, but no researchers have recognized the format of the medical laboratory report and obtained the report content as a serialized process to digitize the image report. This article proposes a serialization method to digitize the medical laboratory report image.</p><p><strong>Materials and methods: </strong>This article first collects 330 image-based medical laboratory reports, annotates the format of the medical laboratory reports, and forms a training dataset for the layout analysis model. Then, using the pre-trained model, the dataset is trained to obtain a layout analysis model that can correctly recognize the format of the medical laboratory report. Then, the layout of the input image-based medical laboratory report is analyzed, and the layout analysis results are used to call the text detection and text recognition models to obtain the digital content of the image report. Finally, adjusting the layout of the digital content and storing the digital content as a docx file.</p><p><strong>Results: </strong>After training the layout analysis model, integrating layout analysis, text detection, and text recognition, we have obtained a serialization method that digitizes the content of the image medical laboratory report, restores the report format, shields sensitive and irrelevant content, and digitizes the report content of interest.</p><p><strong>Conclusions: </strong>By digitizing the image medical laboratory report through the serialization method, we can correctly display the content of the medical laboratory report for teleconsultation, while removing irrelevant content in the report, such as user names, examination equipment numbers, etc.</p>","PeriodicalId":51333,"journal":{"name":"DIGITAL HEALTH","volume":"11 ","pages":"20552076251334431"},"PeriodicalIF":2.9000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12035204/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DIGITAL HEALTH","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/20552076251334431","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: When applying for teleconsultations, medical laboratory reports are usually photographed with a mobile phone, and the photographic results are uploaded as teleconsultation application materials. It is very meaningful to extract the content of the image medical laboratory report and store the content digitally. There are already applications of OCR technology for medical text file recognition, but no researchers have recognized the format of the medical laboratory report and obtained the report content as a serialized process to digitize the image report. This article proposes a serialization method to digitize the medical laboratory report image.
Materials and methods: This article first collects 330 image-based medical laboratory reports, annotates the format of the medical laboratory reports, and forms a training dataset for the layout analysis model. Then, using the pre-trained model, the dataset is trained to obtain a layout analysis model that can correctly recognize the format of the medical laboratory report. Then, the layout of the input image-based medical laboratory report is analyzed, and the layout analysis results are used to call the text detection and text recognition models to obtain the digital content of the image report. Finally, adjusting the layout of the digital content and storing the digital content as a docx file.
Results: After training the layout analysis model, integrating layout analysis, text detection, and text recognition, we have obtained a serialization method that digitizes the content of the image medical laboratory report, restores the report format, shields sensitive and irrelevant content, and digitizes the report content of interest.
Conclusions: By digitizing the image medical laboratory report through the serialization method, we can correctly display the content of the medical laboratory report for teleconsultation, while removing irrelevant content in the report, such as user names, examination equipment numbers, etc.