Automated Medical Image Captioning with Soft Attention-Based LSTM Model Utilizing YOLOv4 Algorithm

Journal of Computer Science Pub Date : 2024-01-01 DOI:10.3844/jcssp.2024.52.68

Paspula Ravinder, Saravanan Srinivasan

{"title":"Automated Medical Image Captioning with Soft Attention-Based LSTM Model Utilizing YOLOv4 Algorithm","authors":"Paspula Ravinder, Saravanan Srinivasan","doi":"10.3844/jcssp.2024.52.68","DOIUrl":null,"url":null,"abstract":": The medical image captioning field is one of the prominent fields nowadays. The interpretation and captioning of medical images can be a time-consuming and costly process, often requiring expert support. The growing volume of medical images makes it challenging for radiologists to handle their workload alone. However, addressing the issues of high cost and time can be achieved by automating the process of medical image captioning while assisting radiologists in improving the reliability and accuracy of the generated captions. It also provides an opportunity for new radiologists with less experience to benefit from automated support. Despite previous efforts in automating medical image captioning, there are still some unresolved issues, including generating overly detailed captions, difficulty in identifying abnormal regions in complex images, and low accuracy and reliability of some generated captions. To tackle these challenges, we suggest the new deep learning model specifically tailored for captioning medical images. Our model aims to extract features from images and generate meaningful sentences related to the identified defects with high accuracy. The approach we present utilizes a multi-model neural network that closely mimics the human visual system and automatically learns to describe the content of images. Our proposed method consists of two stages. In the first stage, known as the information extraction phase, we employ the YOLOv4","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"117 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3844/jcssp.2024.52.68","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

: The medical image captioning field is one of the prominent fields nowadays. The interpretation and captioning of medical images can be a time-consuming and costly process, often requiring expert support. The growing volume of medical images makes it challenging for radiologists to handle their workload alone. However, addressing the issues of high cost and time can be achieved by automating the process of medical image captioning while assisting radiologists in improving the reliability and accuracy of the generated captions. It also provides an opportunity for new radiologists with less experience to benefit from automated support. Despite previous efforts in automating medical image captioning, there are still some unresolved issues, including generating overly detailed captions, difficulty in identifying abnormal regions in complex images, and low accuracy and reliability of some generated captions. To tackle these challenges, we suggest the new deep learning model specifically tailored for captioning medical images. Our model aims to extract features from images and generate meaningful sentences related to the identified defects with high accuracy. The approach we present utilizes a multi-model neural network that closely mimics the human visual system and automatically learns to describe the content of images. Our proposed method consists of two stages. In the first stage, known as the information extraction phase, we employ the YOLOv4

查看原文本刊更多论文

利用 YOLOv4 算法的基于软注意力的 LSTM 模型为医学图像自动添加字幕

:医学影像字幕领域是当今最重要的领域之一。医学影像的解释和说明是一个耗时耗钱的过程，通常需要专家的支持。医学影像的数量不断增加，使得放射科医生单独处理其工作量具有挑战性。然而，要解决成本高和时间长的问题，可以实现医学影像字幕处理过程的自动化，同时协助放射科医生提高生成字幕的可靠性和准确性。这也为经验不足的新放射科医生提供了从自动化支持中获益的机会。尽管之前在医学影像字幕自动化方面做出了努力，但仍有一些问题尚未解决，包括生成过于详细的字幕、难以识别复杂图像中的异常区域，以及某些生成字幕的准确性和可靠性较低。为了应对这些挑战，我们提出了专为医学图像字幕定制的新型深度学习模型。我们的模型旨在从图像中提取特征，并高精度地生成与识别出的缺陷相关的有意义的句子。我们提出的方法利用了一个多模型神经网络，该网络可近似模拟人类视觉系统，并自动学习描述图像内容。我们提出的方法包括两个阶段。在第一阶段，即信息提取阶段，我们采用 YOLOv4

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computer Science Computer Science-Computer Networks and Communications

CiteScore

1.70

自引率

0.00%

发文量

期刊介绍： Journal of Computer Science is aimed to publish research articles on theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems. JCS updated twelve times a year and is a peer reviewed journal covers the latest and most compelling research of the time.