{"title":"Methodology for Translation of Video Content Activates into Text Description: Three Object Activities Action","authors":"Ramesh M. Kagalkar","doi":"10.5815/ijigsp.2022.04.05","DOIUrl":null,"url":null,"abstract":": This paper presents a natural language text description from video content activities. Here it analyzes the content of any video to identify the number of objects in that video content, what actions and activities are going on has to track and match the action model then based on that generate the grammatical correct text description in English is discussed. It uses two approaches, training, and testing. In the training, we need to maintain a database i.e. subject-verb and object are assigned to extract features of images, and the second approach called testing will automatically generate text descriptions from video content. The implemented system will translate complex video contents into text descriptions and by the duration of a one-minute video with three different object considerations. For this evaluation, a standard DB of YouTube is considered where 250 samples from 50 different domains. The overall system gives an accuracy of 93%.","PeriodicalId":378340,"journal":{"name":"International Journal of Image, Graphics and Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Image, Graphics and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijigsp.2022.04.05","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
: This paper presents a natural language text description from video content activities. Here it analyzes the content of any video to identify the number of objects in that video content, what actions and activities are going on has to track and match the action model then based on that generate the grammatical correct text description in English is discussed. It uses two approaches, training, and testing. In the training, we need to maintain a database i.e. subject-verb and object are assigned to extract features of images, and the second approach called testing will automatically generate text descriptions from video content. The implemented system will translate complex video contents into text descriptions and by the duration of a one-minute video with three different object considerations. For this evaluation, a standard DB of YouTube is considered where 250 samples from 50 different domains. The overall system gives an accuracy of 93%.