A serialization method for digitizing the image-based medical laboratory report.

IF 2.9 3区医学 Q2 HEALTH CARE SCIENCES & SERVICES

DIGITAL HEALTH Pub Date : 2025-04-15 eCollection Date: 2025-01-01 DOI:10.1177/20552076251334431

Xiaoyang Ren, Dongwei Dou, Xianying He, Fangfang Cui, Jie Zhao

{"title":"A serialization method for digitizing the image-based medical laboratory report.","authors":"Xiaoyang Ren, Dongwei Dou, Xianying He, Fangfang Cui, Jie Zhao","doi":"10.1177/20552076251334431","DOIUrl":null,"url":null,"abstract":"Background: When applying for teleconsultations, medical laboratory reports are usually photographed with a mobile phone, and the photographic results are uploaded as teleconsultation application materials. It is very meaningful to extract the content of the image medical laboratory report and store the content digitally. There are already applications of OCR technology for medical text file recognition, but no researchers have recognized the format of the medical laboratory report and obtained the report content as a serialized process to digitize the image report. This article proposes a serialization method to digitize the medical laboratory report image.Materials and methods: This article first collects 330 image-based medical laboratory reports, annotates the format of the medical laboratory reports, and forms a training dataset for the layout analysis model. Then, using the pre-trained model, the dataset is trained to obtain a layout analysis model that can correctly recognize the format of the medical laboratory report. Then, the layout of the input image-based medical laboratory report is analyzed, and the layout analysis results are used to call the text detection and text recognition models to obtain the digital content of the image report. Finally, adjusting the layout of the digital content and storing the digital content as a docx file.Results: After training the layout analysis model, integrating layout analysis, text detection, and text recognition, we have obtained a serialization method that digitizes the content of the image medical laboratory report, restores the report format, shields sensitive and irrelevant content, and digitizes the report content of interest.Conclusions: By digitizing the image medical laboratory report through the serialization method, we can correctly display the content of the medical laboratory report for teleconsultation, while removing irrelevant content in the report, such as user names, examination equipment numbers, etc.","PeriodicalId":51333,"journal":{"name":"DIGITAL HEALTH","volume":"11 ","pages":"20552076251334431"},"PeriodicalIF":2.9000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12035204/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DIGITAL HEALTH","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/20552076251334431","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: When applying for teleconsultations, medical laboratory reports are usually photographed with a mobile phone, and the photographic results are uploaded as teleconsultation application materials. It is very meaningful to extract the content of the image medical laboratory report and store the content digitally. There are already applications of OCR technology for medical text file recognition, but no researchers have recognized the format of the medical laboratory report and obtained the report content as a serialized process to digitize the image report. This article proposes a serialization method to digitize the medical laboratory report image.

Materials and methods: This article first collects 330 image-based medical laboratory reports, annotates the format of the medical laboratory reports, and forms a training dataset for the layout analysis model. Then, using the pre-trained model, the dataset is trained to obtain a layout analysis model that can correctly recognize the format of the medical laboratory report. Then, the layout of the input image-based medical laboratory report is analyzed, and the layout analysis results are used to call the text detection and text recognition models to obtain the digital content of the image report. Finally, adjusting the layout of the digital content and storing the digital content as a docx file.

Results: After training the layout analysis model, integrating layout analysis, text detection, and text recognition, we have obtained a serialization method that digitizes the content of the image medical laboratory report, restores the report format, shields sensitive and irrelevant content, and digitizes the report content of interest.

Conclusions: By digitizing the image medical laboratory report through the serialization method, we can correctly display the content of the medical laboratory report for teleconsultation, while removing irrelevant content in the report, such as user names, examination equipment numbers, etc.

查看原文本刊更多论文

一种用于数字化基于图像的医学实验室报告的序列化方法。

背景：远程会诊申请时，通常是用手机拍摄医学化验报告，并上传照片结果作为远程会诊申请材料。对图像医学检验报告的内容进行提取并进行数字化存储具有十分重要的意义。OCR技术在医学文本文件识别方面已有应用，但目前还没有研究人员对医学实验室报告的格式进行识别，并将报告内容作为序列化过程进行数字化图像报告。提出了一种将医学实验室报告图像进行序列化数字化的方法。材料与方法：本文首先收集330份基于图像的医学实验室报告，对医学实验室报告的格式进行标注，形成布局分析模型的训练数据集。然后，利用预训练模型对数据集进行训练，得到能够正确识别医学实验室报告格式的版面分析模型。然后，对输入的基于图像的医学实验室报告进行布局分析，利用布局分析结果调用文本检测和文本识别模型，获得图像报告的数字内容。最后，调整数字内容的布局，并将数字内容存储为docx文件。结果：通过对版面分析模型的训练，整合版面分析、文本检测、文本识别，我们获得了一种将图像医学实验室报告内容数字化、恢复报告格式、屏蔽敏感和不相关内容、将感兴趣的报告内容数字化的序列化方法。结论：通过序列化方法对影像医学化验报告进行数字化处理，可以正确显示远程会诊医学化验报告的内容，同时去除报告中不相关的内容，如用户名、检查设备编号等。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

DIGITAL HEALTH Multiple-

CiteScore

2.90

自引率

7.70%

发文量

302