基于混合精确训练的视觉问答优化框架

Souvik Chowdhury, B. Soni
{"title":"基于混合精确训练的视觉问答优化框架","authors":"Souvik Chowdhury, B. Soni","doi":"10.1109/ICAIA57370.2023.10169318","DOIUrl":null,"url":null,"abstract":"Thanks to the emergence and continued devel-opment of machine learning, particularly deep learning, the research on visual question and answer, also known as VQA, has advanced dramatically, with great theoretical research significance and practical application value. This field of study makes use of multimodal learning, computer vision, and natural language processing techniques. Except for a few academics who presented different types of optimized bi-linear fusion approaches that integrate text and image characteristics in an efficient way, there haven’t been many efforts to optimize the VQA framework. In order to optimize the VQA problem, we offer a unique Visual Question Answering framework in this research. Because both 16-bit and 32-bit floating points provide automatic mixed precision, deep learning architectures can now be optimized with less computation and execution time. Using the VQA 2.0 and CLEVR datasets, the proposed framework has been tested against two models. In terms of overall accuracy and execution time, the experimental findings demonstrated a significant improvement.","PeriodicalId":196526,"journal":{"name":"2023 International Conference on Artificial Intelligence and Applications (ICAIA) Alliance Technology Conference (ATCON-1)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Visual Question Answering Optimized Framework using Mixed Precision Training\",\"authors\":\"Souvik Chowdhury, B. Soni\",\"doi\":\"10.1109/ICAIA57370.2023.10169318\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Thanks to the emergence and continued devel-opment of machine learning, particularly deep learning, the research on visual question and answer, also known as VQA, has advanced dramatically, with great theoretical research significance and practical application value. This field of study makes use of multimodal learning, computer vision, and natural language processing techniques. Except for a few academics who presented different types of optimized bi-linear fusion approaches that integrate text and image characteristics in an efficient way, there haven’t been many efforts to optimize the VQA framework. In order to optimize the VQA problem, we offer a unique Visual Question Answering framework in this research. Because both 16-bit and 32-bit floating points provide automatic mixed precision, deep learning architectures can now be optimized with less computation and execution time. Using the VQA 2.0 and CLEVR datasets, the proposed framework has been tested against two models. In terms of overall accuracy and execution time, the experimental findings demonstrated a significant improvement.\",\"PeriodicalId\":196526,\"journal\":{\"name\":\"2023 International Conference on Artificial Intelligence and Applications (ICAIA) Alliance Technology Conference (ATCON-1)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Artificial Intelligence and Applications (ICAIA) Alliance Technology Conference (ATCON-1)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAIA57370.2023.10169318\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Artificial Intelligence and Applications (ICAIA) Alliance Technology Conference (ATCON-1)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAIA57370.2023.10169318","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

由于机器学习特别是深度学习的出现和不断发展,视觉问答(visual question and answer,简称VQA)的研究有了长足的进步,具有很大的理论研究意义和实际应用价值。这个研究领域使用了多模态学习、计算机视觉和自然语言处理技术。除了少数学者提出了不同类型的优化的双线性融合方法,有效地整合了文本和图像的特征,对VQA框架进行优化的努力并不多。为了优化VQA问题,我们在本研究中提供了一个独特的可视化问答框架。因为16位和32位浮点都提供自动混合精度,深度学习架构现在可以用更少的计算和执行时间进行优化。使用VQA 2.0和CLEVR数据集,对所提出的框架进行了两个模型的测试。在总体精度和执行时间方面,实验结果显示了显着的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Visual Question Answering Optimized Framework using Mixed Precision Training
Thanks to the emergence and continued devel-opment of machine learning, particularly deep learning, the research on visual question and answer, also known as VQA, has advanced dramatically, with great theoretical research significance and practical application value. This field of study makes use of multimodal learning, computer vision, and natural language processing techniques. Except for a few academics who presented different types of optimized bi-linear fusion approaches that integrate text and image characteristics in an efficient way, there haven’t been many efforts to optimize the VQA framework. In order to optimize the VQA problem, we offer a unique Visual Question Answering framework in this research. Because both 16-bit and 32-bit floating points provide automatic mixed precision, deep learning architectures can now be optimized with less computation and execution time. Using the VQA 2.0 and CLEVR datasets, the proposed framework has been tested against two models. In terms of overall accuracy and execution time, the experimental findings demonstrated a significant improvement.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信