Shuping Yuan, Yang Chen, Cheng Ye, Mohammed Wasim Bhatt, Mhalasakant Saradeshmukh, Md. Shamim Hossain
{"title":"Cross-modal multi-label image classification modeling and recognition based on nonlinear","authors":"Shuping Yuan, Yang Chen, Cheng Ye, Mohammed Wasim Bhatt, Mhalasakant Saradeshmukh, Md. Shamim Hossain","doi":"10.1515/nleng-2022-0194","DOIUrl":null,"url":null,"abstract":"Abstract Recently, it has become a popular strategy in multi-label image recognition to predict those labels that co-occur in a picture. Previous work has concentrated on capturing label correlation but has neglected to correctly fuse picture features and label embeddings, which has a substantial influence on the model’s convergence efficiency and restricts future multi-label image recognition accuracy improvement. In order to better classify labeled training samples of corresponding categories in the field of image classification, a cross-modal multi-label image classification modeling and recognition method based on nonlinear is proposed. Multi-label classification models based on deep convolutional neural networks are constructed respectively. The visual classification model uses natural images and simple biomedical images with single labels to achieve heterogeneous transfer learning and homogeneous transfer learning, capturing the general features of the general field and the proprietary features of the biomedical field, while the text classification model uses the description text of simple biomedical images to achieve homogeneous transfer learning. The experimental results show that the multi-label classification model combining the two modes can obtain a hamming loss similar to the best performance of the evaluation task, and the macro average F1 value increases from 0.20 to 0.488, which is about 52.5% higher. The cross-modal multi-label image classification algorithm can better alleviate the problem of overfitting in most classes and has better cross-modal retrieval performance. In addition, the effectiveness and rationality of the two cross-modal mapping techniques are verified.","PeriodicalId":37863,"journal":{"name":"Nonlinear Engineering - Modeling and Application","volume":"32 2 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nonlinear Engineering - Modeling and Application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/nleng-2022-0194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MECHANICAL","Score":null,"Total":0}
引用次数: 1
Abstract
Abstract Recently, it has become a popular strategy in multi-label image recognition to predict those labels that co-occur in a picture. Previous work has concentrated on capturing label correlation but has neglected to correctly fuse picture features and label embeddings, which has a substantial influence on the model’s convergence efficiency and restricts future multi-label image recognition accuracy improvement. In order to better classify labeled training samples of corresponding categories in the field of image classification, a cross-modal multi-label image classification modeling and recognition method based on nonlinear is proposed. Multi-label classification models based on deep convolutional neural networks are constructed respectively. The visual classification model uses natural images and simple biomedical images with single labels to achieve heterogeneous transfer learning and homogeneous transfer learning, capturing the general features of the general field and the proprietary features of the biomedical field, while the text classification model uses the description text of simple biomedical images to achieve homogeneous transfer learning. The experimental results show that the multi-label classification model combining the two modes can obtain a hamming loss similar to the best performance of the evaluation task, and the macro average F1 value increases from 0.20 to 0.488, which is about 52.5% higher. The cross-modal multi-label image classification algorithm can better alleviate the problem of overfitting in most classes and has better cross-modal retrieval performance. In addition, the effectiveness and rationality of the two cross-modal mapping techniques are verified.
期刊介绍:
The Journal of Nonlinear Engineering aims to be a platform for sharing original research results in theoretical, experimental, practical, and applied nonlinear phenomena within engineering. It serves as a forum to exchange ideas and applications of nonlinear problems across various engineering disciplines. Articles are considered for publication if they explore nonlinearities in engineering systems, offering realistic mathematical modeling, utilizing nonlinearity for new designs, stabilizing systems, understanding system behavior through nonlinearity, optimizing systems based on nonlinear interactions, and developing algorithms to harness and leverage nonlinear elements.