Annotate and retrieve in vivo images using hybrid self-organizing map

The Visual Computer Pub Date : 2023-10-31 DOI:10.1007/s00371-023-03126-z

Parminder Kaur, Avleen Malhi, Husanbir Pannu

{"title":"Annotate and retrieve in vivo images using hybrid self-organizing map","authors":"Parminder Kaur, Avleen Malhi, Husanbir Pannu","doi":"10.1007/s00371-023-03126-z","DOIUrl":null,"url":null,"abstract":"Abstract Multimodal retrieval has gained much attention lately due to its effectiveness over uni-modal retrieval. For instance, visual features often under-constrain the description of an image in content-based retrieval; however, another modality, such as collateral text, can be introduced to abridge the semantic gap and make the retrieval process more efficient. This article proposes the application of cross-modal fusion and retrieval on real in vivo gastrointestinal images and linguistic cues, as the visual features alone are insufficient for image description and to assist gastroenterologists. So, a cross-modal information retrieval approach has been proposed to retrieve related images given text and vice versa while handling the heterogeneity gap issue among the modalities. The technique comprises two stages: (1) individual modality feature learning; and (2) fusion of two trained networks. In the first stage, two self-organizing maps (SOMs) are trained separately using images and texts, which are clustered in the respective SOMs based on their similarity. In the second (fusion) stage, the trained SOMs are integrated using an associative network to enable cross-modal retrieval. The underlying learning techniques of the associative network include Hebbian learning and Oja learning (Improved Hebbian learning). The introduced framework can annotate images with keywords and illustrate keywords with images, and it can also be extended to incorporate more diverse modalities. Extensive experimentation has been performed on real gastrointestinal images obtained from a known gastroenterologist that have collateral keywords with each image. The obtained results proved the efficacy of the algorithm and its significance in aiding gastroenterologists in quick and pertinent decision making.","PeriodicalId":227044,"journal":{"name":"The Visual Computer","volume":"2002 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-023-03126-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Multimodal retrieval has gained much attention lately due to its effectiveness over uni-modal retrieval. For instance, visual features often under-constrain the description of an image in content-based retrieval; however, another modality, such as collateral text, can be introduced to abridge the semantic gap and make the retrieval process more efficient. This article proposes the application of cross-modal fusion and retrieval on real in vivo gastrointestinal images and linguistic cues, as the visual features alone are insufficient for image description and to assist gastroenterologists. So, a cross-modal information retrieval approach has been proposed to retrieve related images given text and vice versa while handling the heterogeneity gap issue among the modalities. The technique comprises two stages: (1) individual modality feature learning; and (2) fusion of two trained networks. In the first stage, two self-organizing maps (SOMs) are trained separately using images and texts, which are clustered in the respective SOMs based on their similarity. In the second (fusion) stage, the trained SOMs are integrated using an associative network to enable cross-modal retrieval. The underlying learning techniques of the associative network include Hebbian learning and Oja learning (Improved Hebbian learning). The introduced framework can annotate images with keywords and illustrate keywords with images, and it can also be extended to incorporate more diverse modalities. Extensive experimentation has been performed on real gastrointestinal images obtained from a known gastroenterologist that have collateral keywords with each image. The obtained results proved the efficacy of the algorithm and its significance in aiding gastroenterologists in quick and pertinent decision making.

Abstract Image

查看原文本刊更多论文

使用混合自组织地图对活体图像进行注释和检索

摘要多模态检索由于其相对于单模态检索的有效性，近年来受到了广泛的关注。例如，在基于内容的检索中，视觉特征往往对图像的描述约束不足;然而，可以引入另一种形态，如附属文本，来弥补语义差距，使检索过程更有效。本文提出将跨模态融合和检索应用于真实的体内胃肠道图像和语言线索，因为仅凭视觉特征不足以进行图像描述和辅助胃肠病学家。为此，提出了一种跨模态信息检索方法，在处理模态间的异质性差距问题的同时，对给定文本的相关图像进行检索，反之亦然。该技术包括两个阶段:(1)个体模态特征学习;(2)两个训练好的网络融合。在第一阶段，使用图像和文本分别训练两个自组织地图(som)，并根据图像和文本的相似性将其聚类到各自的som中。在第二阶段(融合)，训练的SOMs使用关联网络进行整合，以实现跨模态检索。联想网络的基础学习技术包括Hebbian学习和Oja学习(Improved Hebbian learning)。引入的框架可以用关键字对图像进行注释，也可以用图像对关键字进行说明，并且还可以扩展到更多样化的模式。广泛的实验已经进行了真正的胃肠图像从一个已知的胃肠病学家，有附带关键词与每个图像。得到的结果证明了算法的有效性及其在帮助胃肠病学家快速和有针对性的决策方面的意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Visual Computer

自引率

0.00%

发文量