基于计算机视觉的NLP多模态对象感知

Recent Research in Science and Technology Pub Date : 2023-01-10 DOI:10.25081/rrst.2023.15.8022

Sakib Hosen Himel, Mahidul Islam Rana

{"title":"基于计算机视觉的NLP多模态对象感知","authors":"Sakib Hosen Himel, Mahidul Islam Rana","doi":"10.25081/rrst.2023.15.8022","DOIUrl":null,"url":null,"abstract":"This project is based on voice interaction and object detecting properties. It will allow the users to do voice interaction with the artificial intelligence and it will reply with the system voice. That is how users will use their voice to command as a trigger to find out the category of any object by showing it using the camera module. At first, the user will show an object with the help of a camera and ask for identifying it in the system. The object detection system then captures a frame from the camera and predicts through the structure to identify which class the object belongs to by extracting the feature from there. The process of this application is to search the database to match the structural data to find out the exact category of the object. When this system approximately matches with the information of a category then the application will suggest the category for the object by mentioning the category name through voice. This application can also give some basic information by asking for it. Our general-purpose approach can be effective in interpreting the structure and properties of objects in different networks through natural language processing.","PeriodicalId":20870,"journal":{"name":"Recent Research in Science and Technology","volume":"61 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Perception of multimodal objects in NLP through computer vision\",\"authors\":\"Sakib Hosen Himel, Mahidul Islam Rana\",\"doi\":\"10.25081/rrst.2023.15.8022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This project is based on voice interaction and object detecting properties. It will allow the users to do voice interaction with the artificial intelligence and it will reply with the system voice. That is how users will use their voice to command as a trigger to find out the category of any object by showing it using the camera module. At first, the user will show an object with the help of a camera and ask for identifying it in the system. The object detection system then captures a frame from the camera and predicts through the structure to identify which class the object belongs to by extracting the feature from there. The process of this application is to search the database to match the structural data to find out the exact category of the object. When this system approximately matches with the information of a category then the application will suggest the category for the object by mentioning the category name through voice. This application can also give some basic information by asking for it. Our general-purpose approach can be effective in interpreting the structure and properties of objects in different networks through natural language processing.\",\"PeriodicalId\":20870,\"journal\":{\"name\":\"Recent Research in Science and Technology\",\"volume\":\"61 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Recent Research in Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.25081/rrst.2023.15.8022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recent Research in Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25081/rrst.2023.15.8022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

这个项目是基于语音交互和目标检测特性。它将允许用户与人工智能进行语音交互，它将以系统语音回复。这就是用户如何使用他们的声音来命令作为触发器，通过使用相机模块来显示任何物体的类别。首先，用户将在摄像头的帮助下显示一个物体，并要求在系统中识别它。然后，物体检测系统从相机中捕获一帧，并通过结构预测，通过提取其中的特征来识别物体属于哪一类。此应用程序的过程是搜索数据库以匹配结构数据，以找出对象的确切类别。当该系统与某个类别的信息大致匹配时，应用程序将通过语音提示该对象的类别名称。该应用程序还可以通过请求提供一些基本信息。我们的通用方法可以通过自然语言处理有效地解释不同网络中对象的结构和属性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Perception of multimodal objects in NLP through computer vision

This project is based on voice interaction and object detecting properties. It will allow the users to do voice interaction with the artificial intelligence and it will reply with the system voice. That is how users will use their voice to command as a trigger to find out the category of any object by showing it using the camera module. At first, the user will show an object with the help of a camera and ask for identifying it in the system. The object detection system then captures a frame from the camera and predicts through the structure to identify which class the object belongs to by extracting the feature from there. The process of this application is to search the database to match the structural data to find out the exact category of the object. When this system approximately matches with the information of a category then the application will suggest the category for the object by mentioning the category name through voice. This application can also give some basic information by asking for it. Our general-purpose approach can be effective in interpreting the structure and properties of objects in different networks through natural language processing.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Recent Research in Science and Technology

自引率

0.00%

发文量