{"title":"CMVCG: Non-autoregressive Conditional Masked Live Video Comments Generation Model","authors":"Zehua Zeng, Chenyang Tu, Neng Gao, Cong Xue, Cunqing Ma, Yiwei Shan","doi":"10.1109/IJCNN52387.2021.9533460","DOIUrl":null,"url":null,"abstract":"The blooming of live comment videos leads to the need of automatic live video comment generating task. Previous works focus on autoregressive live video comments generation and can only generate comments by giving the first word of the target comment. However, in some scenes, users need to generate comments by their given prompt keywords, which can't be solved by the traditional live video comment generation methods. In this paper, we propose a Transformer based non-autoregressive conditional masked live video comments generation model called CMVCG model. Our model considers not only the visual and textual context of the comments, but also time and color information. To predict the position of the given prompt keywords, we also introduce a keywords position predicting module. By leveraging the conditional masked language model, our model achieves non-autoregressive live video comment generation. Furthermore, we collect and introduce a large-scale real-world live video comment dataset called Bili-22 dataset. We evaluate our model in two live comment datasets and the experiment results present that our model outperforms the state-of-the-art models in most of the metrics.","PeriodicalId":396583,"journal":{"name":"2021 International Joint Conference on Neural Networks (IJCNN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN52387.2021.9533460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The blooming of live comment videos leads to the need of automatic live video comment generating task. Previous works focus on autoregressive live video comments generation and can only generate comments by giving the first word of the target comment. However, in some scenes, users need to generate comments by their given prompt keywords, which can't be solved by the traditional live video comment generation methods. In this paper, we propose a Transformer based non-autoregressive conditional masked live video comments generation model called CMVCG model. Our model considers not only the visual and textual context of the comments, but also time and color information. To predict the position of the given prompt keywords, we also introduce a keywords position predicting module. By leveraging the conditional masked language model, our model achieves non-autoregressive live video comment generation. Furthermore, we collect and introduce a large-scale real-world live video comment dataset called Bili-22 dataset. We evaluate our model in two live comment datasets and the experiment results present that our model outperforms the state-of-the-art models in most of the metrics.