Bao G. Do, Doanh C. Bui, Nguyen D. Vo, Khang Nguyen
{"title":"医疗保健领域越南语图像标注的多尺度方法","authors":"Bao G. Do, Doanh C. Bui, Nguyen D. Vo, Khang Nguyen","doi":"10.1109/NICS56915.2022.10013398","DOIUrl":null,"url":null,"abstract":"The image caption generator is a task that aims to automatically generate a natural language with syntactically and semantically meaningful sentences to describe the visual content of a given image. This problem is attractive because it is a combination of two fields Computer Vision and Natural Language Processing. Despite some research on this problem, most of this research only focuses on generating English captions. In this paper, we present a Transformer-based model for this problem based on the VieCap4H dataset - the first grand dataset for the Healthcare domain in Vietnamese. In detail, we first propose the TG2F module to enhance visual representations and the BERT-based language model to obtain language presentation. Through experiments on the VieCap4H dataset, our approach achieves competitive results on the public test and private test without using any data augmentation method.","PeriodicalId":381028,"journal":{"name":"2022 9th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Multi-scale Approach for Vietnamese Image Captioning in Healthcare Domain\",\"authors\":\"Bao G. Do, Doanh C. Bui, Nguyen D. Vo, Khang Nguyen\",\"doi\":\"10.1109/NICS56915.2022.10013398\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The image caption generator is a task that aims to automatically generate a natural language with syntactically and semantically meaningful sentences to describe the visual content of a given image. This problem is attractive because it is a combination of two fields Computer Vision and Natural Language Processing. Despite some research on this problem, most of this research only focuses on generating English captions. In this paper, we present a Transformer-based model for this problem based on the VieCap4H dataset - the first grand dataset for the Healthcare domain in Vietnamese. In detail, we first propose the TG2F module to enhance visual representations and the BERT-based language model to obtain language presentation. Through experiments on the VieCap4H dataset, our approach achieves competitive results on the public test and private test without using any data augmentation method.\",\"PeriodicalId\":381028,\"journal\":{\"name\":\"2022 9th NAFOSTED Conference on Information and Computer Science (NICS)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 9th NAFOSTED Conference on Information and Computer Science (NICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NICS56915.2022.10013398\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 9th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS56915.2022.10013398","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Multi-scale Approach for Vietnamese Image Captioning in Healthcare Domain
The image caption generator is a task that aims to automatically generate a natural language with syntactically and semantically meaningful sentences to describe the visual content of a given image. This problem is attractive because it is a combination of two fields Computer Vision and Natural Language Processing. Despite some research on this problem, most of this research only focuses on generating English captions. In this paper, we present a Transformer-based model for this problem based on the VieCap4H dataset - the first grand dataset for the Healthcare domain in Vietnamese. In detail, we first propose the TG2F module to enhance visual representations and the BERT-based language model to obtain language presentation. Through experiments on the VieCap4H dataset, our approach achieves competitive results on the public test and private test without using any data augmentation method.