Cross-modal multi-label image classification modeling and recognition based on nonlinear

IF 2.4 Q2 ENGINEERING, MECHANICAL

Nonlinear Engineering - Modeling and Application Pub Date : 2023-01-01 DOI:10.1515/nleng-2022-0194

Shuping Yuan, Yang Chen, Cheng Ye, Mohammed Wasim Bhatt, Mhalasakant Saradeshmukh, Md. Shamim Hossain

{"title":"Cross-modal multi-label image classification modeling and recognition based on nonlinear","authors":"Shuping Yuan, Yang Chen, Cheng Ye, Mohammed Wasim Bhatt, Mhalasakant Saradeshmukh, Md. Shamim Hossain","doi":"10.1515/nleng-2022-0194","DOIUrl":null,"url":null,"abstract":"Abstract Recently, it has become a popular strategy in multi-label image recognition to predict those labels that co-occur in a picture. Previous work has concentrated on capturing label correlation but has neglected to correctly fuse picture features and label embeddings, which has a substantial influence on the model’s convergence efficiency and restricts future multi-label image recognition accuracy improvement. In order to better classify labeled training samples of corresponding categories in the field of image classification, a cross-modal multi-label image classification modeling and recognition method based on nonlinear is proposed. Multi-label classification models based on deep convolutional neural networks are constructed respectively. The visual classification model uses natural images and simple biomedical images with single labels to achieve heterogeneous transfer learning and homogeneous transfer learning, capturing the general features of the general field and the proprietary features of the biomedical field, while the text classification model uses the description text of simple biomedical images to achieve homogeneous transfer learning. The experimental results show that the multi-label classification model combining the two modes can obtain a hamming loss similar to the best performance of the evaluation task, and the macro average F1 value increases from 0.20 to 0.488, which is about 52.5% higher. The cross-modal multi-label image classification algorithm can better alleviate the problem of overfitting in most classes and has better cross-modal retrieval performance. In addition, the effectiveness and rationality of the two cross-modal mapping techniques are verified.","PeriodicalId":37863,"journal":{"name":"Nonlinear Engineering - Modeling and Application","volume":"32 2 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nonlinear Engineering - Modeling and Application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/nleng-2022-0194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MECHANICAL","Score":null,"Total":0}

引用次数: 1

Abstract

Abstract Recently, it has become a popular strategy in multi-label image recognition to predict those labels that co-occur in a picture. Previous work has concentrated on capturing label correlation but has neglected to correctly fuse picture features and label embeddings, which has a substantial influence on the model’s convergence efficiency and restricts future multi-label image recognition accuracy improvement. In order to better classify labeled training samples of corresponding categories in the field of image classification, a cross-modal multi-label image classification modeling and recognition method based on nonlinear is proposed. Multi-label classification models based on deep convolutional neural networks are constructed respectively. The visual classification model uses natural images and simple biomedical images with single labels to achieve heterogeneous transfer learning and homogeneous transfer learning, capturing the general features of the general field and the proprietary features of the biomedical field, while the text classification model uses the description text of simple biomedical images to achieve homogeneous transfer learning. The experimental results show that the multi-label classification model combining the two modes can obtain a hamming loss similar to the best performance of the evaluation task, and the macro average F1 value increases from 0.20 to 0.488, which is about 52.5% higher. The cross-modal multi-label image classification algorithm can better alleviate the problem of overfitting in most classes and has better cross-modal retrieval performance. In addition, the effectiveness and rationality of the two cross-modal mapping techniques are verified.

查看原文本刊更多论文

基于非线性的跨模态多标签图像分类建模与识别

摘要近年来，对图像中共存的标签进行预测已成为多标签图像识别中的一种流行策略。以往的工作主要集中在捕获标签相关性，而忽略了正确融合图像特征和标签嵌入，这对模型的收敛效率有很大影响，并制约了未来多标签图像识别精度的提高。为了在图像分类领域更好地对相应类别的标记训练样本进行分类，提出了一种基于非线性的跨模态多标签图像分类建模与识别方法。分别构建了基于深度卷积神经网络的多标签分类模型。视觉分类模型使用自然图像和带有单一标签的简单生物医学图像实现异构迁移学习和同质迁移学习，捕获一般领域的一般特征和生物医学领域的专有特征，而文本分类模型使用简单生物医学图像的描述文本实现同质迁移学习。实验结果表明，结合两种模式的多标签分类模型可以获得与评价任务最佳性能相近的汉明损失，宏观平均F1值从0.20提高到0.488，提高了约52.5%。跨模态多标签图像分类算法可以较好地缓解大多数类的过拟合问题，具有较好的跨模态检索性能。此外，还验证了两种跨模态映射技术的有效性和合理性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Nonlinear Engineering - Modeling and Application Multiple-

CiteScore

6.20

自引率

3.60%

发文量

审稿时长

44 weeks

期刊介绍： The Journal of Nonlinear Engineering aims to be a platform for sharing original research results in theoretical, experimental, practical, and applied nonlinear phenomena within engineering. It serves as a forum to exchange ideas and applications of nonlinear problems across various engineering disciplines. Articles are considered for publication if they explore nonlinearities in engineering systems, offering realistic mathematical modeling, utilizing nonlinearity for new designs, stabilizing systems, understanding system behavior through nonlinearity, optimizing systems based on nonlinear interactions, and developing algorithms to harness and leverage nonlinear elements.