ROC评论:行为视频的自动描述和主观字幕

Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing Pub Date : 2016-09-12 DOI:10.1145/2971648.2971743

M. R. Ali, Facundo Ciancio, Ru Zhao, Iftekhar Naim, Ehsan Hoque

{"title":"ROC评论:行为视频的自动描述和主观字幕","authors":"M. R. Ali, Facundo Ciancio, Ru Zhao, Iftekhar Naim, Ehsan Hoque","doi":"10.1145/2971648.2971743","DOIUrl":null,"url":null,"abstract":"We present an automated interface, ROC Comment, for generating natural language comments on behavioral videos. We focus on the domain of public speaking, which many people consider their greatest fear. We collect a dataset of 196 public speaking videos from 49 individuals and gather 12,173 comments, generated by more than 500 independent human judges. We then train a k-Nearest-Neighbor (k-NN) based model by extracting prosodic (e.g., volume) and facial (e.g., smiles) features. Given a new video, we extract features and select the closest comments using k-NN model. We further filter the comments by clustering them using DBScan, and eliminating the outliers. Evaluation of our system with 30 participants conclude that while the generated comments are helpful, there is room for improvement in further personalizing them. Our model has been deployed online, allowing individuals to upload their videos and receive open-ended and interpretative comments. Our system is available at http://tinyurl.com/roccomment.","PeriodicalId":303792,"journal":{"name":"Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"ROC comment: automated descriptive and subjective captioning of behavioral videos\",\"authors\":\"M. R. Ali, Facundo Ciancio, Ru Zhao, Iftekhar Naim, Ehsan Hoque\",\"doi\":\"10.1145/2971648.2971743\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present an automated interface, ROC Comment, for generating natural language comments on behavioral videos. We focus on the domain of public speaking, which many people consider their greatest fear. We collect a dataset of 196 public speaking videos from 49 individuals and gather 12,173 comments, generated by more than 500 independent human judges. We then train a k-Nearest-Neighbor (k-NN) based model by extracting prosodic (e.g., volume) and facial (e.g., smiles) features. Given a new video, we extract features and select the closest comments using k-NN model. We further filter the comments by clustering them using DBScan, and eliminating the outliers. Evaluation of our system with 30 participants conclude that while the generated comments are helpful, there is room for improvement in further personalizing them. Our model has been deployed online, allowing individuals to upload their videos and receive open-ended and interpretative comments. Our system is available at http://tinyurl.com/roccomment.\",\"PeriodicalId\":303792,\"journal\":{\"name\":\"Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2971648.2971743\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2971648.2971743","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

我们提出了一个自动界面，ROC评论，用于在行为视频上生成自然语言评论。我们关注的是公众演讲领域，许多人认为这是他们最大的恐惧。我们收集了来自49个人的196个公开演讲视频的数据集，并收集了由500多名独立的人类评委产生的12,173条评论。然后，我们通过提取韵律(例如，音量)和面部(例如，微笑)特征来训练基于k-最近邻(k-NN)的模型。给定一个新的视频，我们使用k-NN模型提取特征并选择最接近的评论。我们使用DBScan对评论进行聚类，并消除异常值，从而进一步过滤评论。30名参与者对我们的系统进行了评估，得出的结论是，虽然产生的评论很有帮助，但在进一步个性化方面仍有改进的余地。我们的模式已经被部署到网上，允许个人上传他们的视频，并接受开放式和解释性的评论。我们的系统可以在http://tinyurl.com/roccomment上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ROC comment: automated descriptive and subjective captioning of behavioral videos

We present an automated interface, ROC Comment, for generating natural language comments on behavioral videos. We focus on the domain of public speaking, which many people consider their greatest fear. We collect a dataset of 196 public speaking videos from 49 individuals and gather 12,173 comments, generated by more than 500 independent human judges. We then train a k-Nearest-Neighbor (k-NN) based model by extracting prosodic (e.g., volume) and facial (e.g., smiles) features. Given a new video, we extract features and select the closest comments using k-NN model. We further filter the comments by clustering them using DBScan, and eliminating the outliers. Evaluation of our system with 30 participants conclude that while the generated comments are helpful, there is room for improvement in further personalizing them. Our model has been deployed online, allowing individuals to upload their videos and receive open-ended and interpretative comments. Our system is available at http://tinyurl.com/roccomment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing

自引率

0.00%

发文量