基于LSTM和YOLOv5融合注意机制的图像字幕研究

Xiaoliang Zhang, Qingtao Zeng, Yeli Li, Likun Lu, Weichun Yang
{"title":"基于LSTM和YOLOv5融合注意机制的图像字幕研究","authors":"Xiaoliang Zhang, Qingtao Zeng, Yeli Li, Likun Lu, Weichun Yang","doi":"10.1117/12.2667667","DOIUrl":null,"url":null,"abstract":"Humans can easily learn to recognize every object in life, every landscape, and describe the things around them in detail from the process of growing up, but computers cannot. How to make computers learn to describe things in pictures has become the research direction of many scholars. If this technology is mature, it will bring great boon to people with visual impairments. They can understand the things around them and the beautiful earth through hearing. Robots recognize objects and understand their surroundings. With the development of artificial intelligence, the power of convolutional neural networks is more and more comparable to that of the human brain. In recent years, many scholars have proposed different methods to seek better solutions to this problem, including generative adversarial networks. Based on the classic structure of Encoder-Decoder, this paper first compares the code implementation and results of ResNet101 as an Encoder on the COCO dataset, and then proposes a new solution that integrates YOLOv5 and LSTM, aiming to improve the model inference speed and inference accuracy.","PeriodicalId":128051,"journal":{"name":"Third International Seminar on Artificial Intelligence, Networking, and Information Technology","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on image captioning based on LSTM and YOLOv5 fusion attention mechanism\",\"authors\":\"Xiaoliang Zhang, Qingtao Zeng, Yeli Li, Likun Lu, Weichun Yang\",\"doi\":\"10.1117/12.2667667\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Humans can easily learn to recognize every object in life, every landscape, and describe the things around them in detail from the process of growing up, but computers cannot. How to make computers learn to describe things in pictures has become the research direction of many scholars. If this technology is mature, it will bring great boon to people with visual impairments. They can understand the things around them and the beautiful earth through hearing. Robots recognize objects and understand their surroundings. With the development of artificial intelligence, the power of convolutional neural networks is more and more comparable to that of the human brain. In recent years, many scholars have proposed different methods to seek better solutions to this problem, including generative adversarial networks. Based on the classic structure of Encoder-Decoder, this paper first compares the code implementation and results of ResNet101 as an Encoder on the COCO dataset, and then proposes a new solution that integrates YOLOv5 and LSTM, aiming to improve the model inference speed and inference accuracy.\",\"PeriodicalId\":128051,\"journal\":{\"name\":\"Third International Seminar on Artificial Intelligence, Networking, and Information Technology\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Third International Seminar on Artificial Intelligence, Networking, and Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2667667\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Third International Seminar on Artificial Intelligence, Networking, and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2667667","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

人类可以很容易地从成长的过程中学会识别生活中的每一个物体,每一个风景,并详细描述周围的事物,但计算机却做不到。如何让计算机学会用图片来描述事物,已经成为许多学者的研究方向。如果这项技术成熟,将会给视障人士带来巨大的福音。他们可以通过听力了解周围的事物和美丽的地球。机器人可以识别物体并了解周围环境。随着人工智能的发展,卷积神经网络的能力越来越能与人脑相媲美。近年来,许多学者提出了不同的方法来寻求更好的解决方案,其中包括生成对抗网络。基于经典的编码器-解码器结构,本文首先比较了ResNet101作为编码器在COCO数据集上的代码实现和结果,然后提出了一种将YOLOv5和LSTM相结合的新方案,旨在提高模型推理速度和推理精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Research on image captioning based on LSTM and YOLOv5 fusion attention mechanism
Humans can easily learn to recognize every object in life, every landscape, and describe the things around them in detail from the process of growing up, but computers cannot. How to make computers learn to describe things in pictures has become the research direction of many scholars. If this technology is mature, it will bring great boon to people with visual impairments. They can understand the things around them and the beautiful earth through hearing. Robots recognize objects and understand their surroundings. With the development of artificial intelligence, the power of convolutional neural networks is more and more comparable to that of the human brain. In recent years, many scholars have proposed different methods to seek better solutions to this problem, including generative adversarial networks. Based on the classic structure of Encoder-Decoder, this paper first compares the code implementation and results of ResNet101 as an Encoder on the COCO dataset, and then proposes a new solution that integrates YOLOv5 and LSTM, aiming to improve the model inference speed and inference accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信