{"title":"基于计算机视觉的NLP多模态对象感知","authors":"Sakib Hosen Himel, Mahidul Islam Rana","doi":"10.25081/rrst.2023.15.8022","DOIUrl":null,"url":null,"abstract":"This project is based on voice interaction and object detecting properties. It will allow the users to do voice interaction with the artificial intelligence and it will reply with the system voice. That is how users will use their voice to command as a trigger to find out the category of any object by showing it using the camera module. At first, the user will show an object with the help of a camera and ask for identifying it in the system. The object detection system then captures a frame from the camera and predicts through the structure to identify which class the object belongs to by extracting the feature from there. The process of this application is to search the database to match the structural data to find out the exact category of the object. When this system approximately matches with the information of a category then the application will suggest the category for the object by mentioning the category name through voice. This application can also give some basic information by asking for it. Our general-purpose approach can be effective in interpreting the structure and properties of objects in different networks through natural language processing.","PeriodicalId":20870,"journal":{"name":"Recent Research in Science and Technology","volume":"61 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Perception of multimodal objects in NLP through computer vision\",\"authors\":\"Sakib Hosen Himel, Mahidul Islam Rana\",\"doi\":\"10.25081/rrst.2023.15.8022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This project is based on voice interaction and object detecting properties. It will allow the users to do voice interaction with the artificial intelligence and it will reply with the system voice. That is how users will use their voice to command as a trigger to find out the category of any object by showing it using the camera module. At first, the user will show an object with the help of a camera and ask for identifying it in the system. The object detection system then captures a frame from the camera and predicts through the structure to identify which class the object belongs to by extracting the feature from there. The process of this application is to search the database to match the structural data to find out the exact category of the object. When this system approximately matches with the information of a category then the application will suggest the category for the object by mentioning the category name through voice. This application can also give some basic information by asking for it. Our general-purpose approach can be effective in interpreting the structure and properties of objects in different networks through natural language processing.\",\"PeriodicalId\":20870,\"journal\":{\"name\":\"Recent Research in Science and Technology\",\"volume\":\"61 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Recent Research in Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.25081/rrst.2023.15.8022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recent Research in Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25081/rrst.2023.15.8022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Perception of multimodal objects in NLP through computer vision
This project is based on voice interaction and object detecting properties. It will allow the users to do voice interaction with the artificial intelligence and it will reply with the system voice. That is how users will use their voice to command as a trigger to find out the category of any object by showing it using the camera module. At first, the user will show an object with the help of a camera and ask for identifying it in the system. The object detection system then captures a frame from the camera and predicts through the structure to identify which class the object belongs to by extracting the feature from there. The process of this application is to search the database to match the structural data to find out the exact category of the object. When this system approximately matches with the information of a category then the application will suggest the category for the object by mentioning the category name through voice. This application can also give some basic information by asking for it. Our general-purpose approach can be effective in interpreting the structure and properties of objects in different networks through natural language processing.