基于注意力的合并架构模型在印度语言图像字幕中的性能评估

Q3 Computer Science

中国图象图形学报 Pub Date : 2023-09-01 DOI:10.18178/joig.11.3.294-301

Rahul Tangsali, Swapnil Chhatre, Soham Naik, Pranav Bhagwat, Geetanjali Kale

{"title":"基于注意力的合并架构模型在印度语言图像字幕中的性能评估","authors":"Rahul Tangsali, Swapnil Chhatre, Soham Naik, Pranav Bhagwat, Geetanjali Kale","doi":"10.18178/joig.11.3.294-301","DOIUrl":null,"url":null,"abstract":"Image captioning is a growing topic of research in which numerous advancements have been made in the past few years. Deep learning methods have been used extensively for generating textual descriptions of image data. In addition, attention-based image captioning mechanisms have also been proposed, which give state-ofthe- art results in image captioning. However, many applications and analyses of these methodologies have not been made in the case of languages from the Indian subcontinent. This paper presents attention-based merge architecture models to achieve accurate captions of images in four Indian languages- Marathi, Kannada, Malayalam, and Tamil. The widely known Flickr8K dataset was used for this project. Pre-trained Convolutional Neural Network (CNN) models and language decoder attention models were implemented, which serve as the components of the mergearchitecture proposed here. Finally, the accuracy of the generated captions was compared against the gold captions using Bilingual Evaluation Understudy (BLEU) as an evaluation metric. It was observed that the merge architectures consisting of InceptionV3 give the best results for the languages we test on, the scores discussed in the paper. Highest BLEU-1 scores obtained for each language were: 0.4939 for Marathi, 0.4557 for Kannada, 0.5082 for Malayalam, and 0.5201 for Tamil. Our proposed architectures gave much higher scores than other architectures implemented for these languages.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":"17 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating Performances of Attention-Based Merge Architecture Models for Image Captioning in Indian Languages\",\"authors\":\"Rahul Tangsali, Swapnil Chhatre, Soham Naik, Pranav Bhagwat, Geetanjali Kale\",\"doi\":\"10.18178/joig.11.3.294-301\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image captioning is a growing topic of research in which numerous advancements have been made in the past few years. Deep learning methods have been used extensively for generating textual descriptions of image data. In addition, attention-based image captioning mechanisms have also been proposed, which give state-ofthe- art results in image captioning. However, many applications and analyses of these methodologies have not been made in the case of languages from the Indian subcontinent. This paper presents attention-based merge architecture models to achieve accurate captions of images in four Indian languages- Marathi, Kannada, Malayalam, and Tamil. The widely known Flickr8K dataset was used for this project. Pre-trained Convolutional Neural Network (CNN) models and language decoder attention models were implemented, which serve as the components of the mergearchitecture proposed here. Finally, the accuracy of the generated captions was compared against the gold captions using Bilingual Evaluation Understudy (BLEU) as an evaluation metric. It was observed that the merge architectures consisting of InceptionV3 give the best results for the languages we test on, the scores discussed in the paper. Highest BLEU-1 scores obtained for each language were: 0.4939 for Marathi, 0.4557 for Kannada, 0.5082 for Malayalam, and 0.5201 for Tamil. Our proposed architectures gave much higher scores than other architectures implemented for these languages.\",\"PeriodicalId\":36336,\"journal\":{\"name\":\"中国图象图形学报\",\"volume\":\"17 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"中国图象图形学报\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.18178/joig.11.3.294-301\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"中国图象图形学报","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.18178/joig.11.3.294-301","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

摘要

图像字幕是一个不断发展的研究课题，在过去几年中取得了许多进展。深度学习方法已被广泛用于生成图像数据的文本描述。此外，基于注意力的图像字幕机制也被提出，提供了最先进的图像字幕效果。然而，这些方法的许多应用和分析还没有在印度次大陆的语言中进行。本文提出了基于注意力的合并架构模型，以实现四种印度语言(马拉地语、卡纳达语、马拉雅拉姆语和泰米尔语)图像的准确字幕。这个项目使用了广为人知的Flickr8K数据集。实现了预训练卷积神经网络(CNN)模型和语言解码器注意模型，它们是本文提出的合并架构的组成部分。最后，使用双语评价替补(BLEU)作为评价指标，将生成的字幕与黄金字幕的准确性进行比较。我们观察到，由InceptionV3组成的合并体系结构为我们测试的语言提供了最好的结果，论文中讨论了分数。每种语言获得的最高布鲁-1分数为:马拉地语0.4939，卡纳达语0.4557，马拉雅拉姆语0.5082，泰米尔语0.5201。我们提出的体系结构比为这些语言实现的其他体系结构得分高得多。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating Performances of Attention-Based Merge Architecture Models for Image Captioning in Indian Languages

Image captioning is a growing topic of research in which numerous advancements have been made in the past few years. Deep learning methods have been used extensively for generating textual descriptions of image data. In addition, attention-based image captioning mechanisms have also been proposed, which give state-ofthe- art results in image captioning. However, many applications and analyses of these methodologies have not been made in the case of languages from the Indian subcontinent. This paper presents attention-based merge architecture models to achieve accurate captions of images in four Indian languages- Marathi, Kannada, Malayalam, and Tamil. The widely known Flickr8K dataset was used for this project. Pre-trained Convolutional Neural Network (CNN) models and language decoder attention models were implemented, which serve as the components of the mergearchitecture proposed here. Finally, the accuracy of the generated captions was compared against the gold captions using Bilingual Evaluation Understudy (BLEU) as an evaluation metric. It was observed that the merge architectures consisting of InceptionV3 give the best results for the languages we test on, the scores discussed in the paper. Highest BLEU-1 scores obtained for each language were: 0.4939 for Marathi, 0.4557 for Kannada, 0.5082 for Malayalam, and 0.5201 for Tamil. Our proposed architectures gave much higher scores than other architectures implemented for these languages.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

中国图象图形学报 Computer Science-Computer Graphics and Computer-Aided Design

CiteScore

1.20

自引率

0.00%

发文量

6776

期刊介绍： Journal of Image and Graphics (ISSN 1006-8961, CN 11-3758/TB, CODEN ZTTXFZ) is an authoritative academic journal supervised by the Chinese Academy of Sciences and co-sponsored by the Institute of Space and Astronautical Information Innovation of the Chinese Academy of Sciences (ISIAS), the Chinese Society of Image and Graphics (CSIG), and the Beijing Institute of Applied Physics and Computational Mathematics (BIAPM). The journal integrates high-tech theories, technical methods and industrialisation of applied research results in computer image graphics, and mainly publishes innovative and high-level scientific research papers on basic and applied research in image graphics science and its closely related fields. The form of papers includes reviews, technical reports, project progress, academic news, new technology reviews, new product introduction and industrialisation research. The content covers a wide range of fields such as image analysis and recognition, image understanding and computer vision, computer graphics, virtual reality and augmented reality, system simulation, animation, etc., and theme columns are opened according to the research hotspots and cutting-edge topics. Journal of Image and Graphics reaches a wide range of readers, including scientific and technical personnel, enterprise supervisors, and postgraduates and college students of colleges and universities engaged in the fields of national defence, military, aviation, aerospace, communications, electronics, automotive, agriculture, meteorology, environmental protection, remote sensing, mapping, oil field, construction, transportation, finance, telecommunications, education, medical care, film and television, and art. Journal of Image and Graphics is included in many important domestic and international scientific literature database systems, including EBSCO database in the United States, JST database in Japan, Scopus database in the Netherlands, China Science and Technology Thesis Statistics and Analysis (Annual Research Report), China Science Citation Database (CSCD), China Academic Journal Network Publishing Database (CAJD), and China Academic Journal Network Publishing Database (CAJD). China Science Citation Database (CSCD), China Academic Journals Network Publishing Database (CAJD), China Academic Journal Abstracts, Chinese Science Abstracts (Series A), China Electronic Science Abstracts, Chinese Core Journals Abstracts, Chinese Academic Journals on CD-ROM, and China Academic Journals Comprehensive Evaluation Database.