{"title":"越南语图像字幕特征提取方法的实证研究","authors":"Khang Nguyen","doi":"10.15625/1813-9663/38/4/17548","DOIUrl":null,"url":null,"abstract":"Image captioning is a challenging task that is still being addressed in the 2020s. The problem has the input as an image, and the output is the generated caption that describes the context of the input image. In this study, I focus on the image captioning problem in Vietnamese. In detail, I present the empirical study of feature extraction approaches using current state-of-the-art object detection methods to represent the images in the model space. Each type of feature is trained with the Transformer-based captioning model. I investigate the effectiveness of different feature types on two Vietnamese datasets: UIT-ViIC and VieCap4H, the two standard benchmark datasets. The experimental results show crucial insight into the feature extraction task for image captioning in Vietnamese.","PeriodicalId":15444,"journal":{"name":"Journal of Computer Science and Cybernetics","volume":"50 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EMPIRICAL STUDY OF FEATURE EXTRACTION APPROACHES FOR IMAGE CAPTIONING IN VIETNAMESE\",\"authors\":\"Khang Nguyen\",\"doi\":\"10.15625/1813-9663/38/4/17548\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image captioning is a challenging task that is still being addressed in the 2020s. The problem has the input as an image, and the output is the generated caption that describes the context of the input image. In this study, I focus on the image captioning problem in Vietnamese. In detail, I present the empirical study of feature extraction approaches using current state-of-the-art object detection methods to represent the images in the model space. Each type of feature is trained with the Transformer-based captioning model. I investigate the effectiveness of different feature types on two Vietnamese datasets: UIT-ViIC and VieCap4H, the two standard benchmark datasets. The experimental results show crucial insight into the feature extraction task for image captioning in Vietnamese.\",\"PeriodicalId\":15444,\"journal\":{\"name\":\"Journal of Computer Science and Cybernetics\",\"volume\":\"50 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Science and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15625/1813-9663/38/4/17548\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15625/1813-9663/38/4/17548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
EMPIRICAL STUDY OF FEATURE EXTRACTION APPROACHES FOR IMAGE CAPTIONING IN VIETNAMESE
Image captioning is a challenging task that is still being addressed in the 2020s. The problem has the input as an image, and the output is the generated caption that describes the context of the input image. In this study, I focus on the image captioning problem in Vietnamese. In detail, I present the empirical study of feature extraction approaches using current state-of-the-art object detection methods to represent the images in the model space. Each type of feature is trained with the Transformer-based captioning model. I investigate the effectiveness of different feature types on two Vietnamese datasets: UIT-ViIC and VieCap4H, the two standard benchmark datasets. The experimental results show crucial insight into the feature extraction task for image captioning in Vietnamese.