{"title":"基于图像字幕作为辅助内容的视频字幕","authors":"J. Vaishnavi, V. Narmatha","doi":"10.1109/ICAECT54875.2022.9807935","DOIUrl":null,"url":null,"abstract":"Video captioning is the more heuristic task of the combination of computer vision and Natural language processing while researchers are concentrated more in video related tasks. Dense video captioning is still considering the more challenging task as it needs to consider every event occurs in the video and provide optimal captions separately for all the events presents in the video with high diversity. Captioning process with less corpus leads to less performance. To avoid such issues, our proposed model constructed with the option of generating captions with high diversity. Image captions are taken as subsidiary content to enlarge the diversity for captioning the videos. Attention mechanism is utilized for the generation process. Generator and three different discriminators are utilized to contribute an appropriate caption which enriches the captioning process. ActivityNet caption dataset is used to demonstrate the proposed model. Microsoft coco image dataset is considered as subsidiary content for captioning. The benchmark metrics BLEU and METEOR are used to estimate the performance of the proposed model.","PeriodicalId":346658,"journal":{"name":"2022 Second International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Video Captioning based on Image Captioning as Subsidiary Content\",\"authors\":\"J. Vaishnavi, V. Narmatha\",\"doi\":\"10.1109/ICAECT54875.2022.9807935\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video captioning is the more heuristic task of the combination of computer vision and Natural language processing while researchers are concentrated more in video related tasks. Dense video captioning is still considering the more challenging task as it needs to consider every event occurs in the video and provide optimal captions separately for all the events presents in the video with high diversity. Captioning process with less corpus leads to less performance. To avoid such issues, our proposed model constructed with the option of generating captions with high diversity. Image captions are taken as subsidiary content to enlarge the diversity for captioning the videos. Attention mechanism is utilized for the generation process. Generator and three different discriminators are utilized to contribute an appropriate caption which enriches the captioning process. ActivityNet caption dataset is used to demonstrate the proposed model. Microsoft coco image dataset is considered as subsidiary content for captioning. The benchmark metrics BLEU and METEOR are used to estimate the performance of the proposed model.\",\"PeriodicalId\":346658,\"journal\":{\"name\":\"2022 Second International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Second International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAECT54875.2022.9807935\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Second International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAECT54875.2022.9807935","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Video Captioning based on Image Captioning as Subsidiary Content
Video captioning is the more heuristic task of the combination of computer vision and Natural language processing while researchers are concentrated more in video related tasks. Dense video captioning is still considering the more challenging task as it needs to consider every event occurs in the video and provide optimal captions separately for all the events presents in the video with high diversity. Captioning process with less corpus leads to less performance. To avoid such issues, our proposed model constructed with the option of generating captions with high diversity. Image captions are taken as subsidiary content to enlarge the diversity for captioning the videos. Attention mechanism is utilized for the generation process. Generator and three different discriminators are utilized to contribute an appropriate caption which enriches the captioning process. ActivityNet caption dataset is used to demonstrate the proposed model. Microsoft coco image dataset is considered as subsidiary content for captioning. The benchmark metrics BLEU and METEOR are used to estimate the performance of the proposed model.