Doanh C. Bui, N. Nguyen, Nguyen D. Vo, Uyen Han Thuy Thai, Khang Nguyen
{"title":"Vi-DRSNet:一种用于医疗保健领域的越南语图像标注的新型混合模型","authors":"Doanh C. Bui, N. Nguyen, Nguyen D. Vo, Uyen Han Thuy Thai, Khang Nguyen","doi":"10.1109/MAPR56351.2022.9924781","DOIUrl":null,"url":null,"abstract":"Image Captioning is an exciting topic that attracts the research community from both computer vision and natural language processing fields. In this paper, we present a novel hybrid model, which is an effective combination of three modules: Dual-level Collaborative, Meshed-memory Decoder and Adaptive Decoder. In detail, we use Dual-level Collaborative for integrating grid features and region features. Besides, Meshed-memory Decoder is also employed to take advantage of all encoder outputs. Finally, the idea of an Adaptive Decoder is applied for embedding the Vietnamese linguistic aspect into decoding steps. Our approach achieves competitive results compared to other methods on the public and private tests of the VieCap4H benchmark without using any data augmentation method.","PeriodicalId":138642,"journal":{"name":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Vi-DRSNet: A Novel Hybrid Model for Vietnamese Image Captioning in Healthcare Domain\",\"authors\":\"Doanh C. Bui, N. Nguyen, Nguyen D. Vo, Uyen Han Thuy Thai, Khang Nguyen\",\"doi\":\"10.1109/MAPR56351.2022.9924781\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image Captioning is an exciting topic that attracts the research community from both computer vision and natural language processing fields. In this paper, we present a novel hybrid model, which is an effective combination of three modules: Dual-level Collaborative, Meshed-memory Decoder and Adaptive Decoder. In detail, we use Dual-level Collaborative for integrating grid features and region features. Besides, Meshed-memory Decoder is also employed to take advantage of all encoder outputs. Finally, the idea of an Adaptive Decoder is applied for embedding the Vietnamese linguistic aspect into decoding steps. Our approach achieves competitive results compared to other methods on the public and private tests of the VieCap4H benchmark without using any data augmentation method.\",\"PeriodicalId\":138642,\"journal\":{\"name\":\"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MAPR56351.2022.9924781\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Multimedia Analysis and Pattern Recognition (MAPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MAPR56351.2022.9924781","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Vi-DRSNet: A Novel Hybrid Model for Vietnamese Image Captioning in Healthcare Domain
Image Captioning is an exciting topic that attracts the research community from both computer vision and natural language processing fields. In this paper, we present a novel hybrid model, which is an effective combination of three modules: Dual-level Collaborative, Meshed-memory Decoder and Adaptive Decoder. In detail, we use Dual-level Collaborative for integrating grid features and region features. Besides, Meshed-memory Decoder is also employed to take advantage of all encoder outputs. Finally, the idea of an Adaptive Decoder is applied for embedding the Vietnamese linguistic aspect into decoding steps. Our approach achieves competitive results compared to other methods on the public and private tests of the VieCap4H benchmark without using any data augmentation method.