Pavel Braslavski, Vladislav Blinov, Valeriia Bolotova-Baranova, Katya Pertsova
{"title":"How to Evaluate Humorous Response Generation, Seriously?","authors":"Pavel Braslavski, Vladislav Blinov, Valeriia Bolotova-Baranova, Katya Pertsova","doi":"10.1145/3176349.3176879","DOIUrl":null,"url":null,"abstract":"Nowadays natural language user interfaces, such as chatbots and conversational agents, are very common. A desirable trait of such applications is a sense of humor. It is, therefore, important to be able to measure quality of humorous responses. However, humor evaluation is hard since humor is highly subjective. To address this problem, we conducted an online evaluation of 30 dialog jokes from different sources by almost 300 participants -- volunteers and Mechanical Turk workers. We collected joke ratings along with participants» age, gender, and language proficiency. Results show that demographics and joke topics can partly explain variation in humor judgments. We expect that these insights will aid humor evaluation and interpretation. The findings can also be of interest for humor generation methods in conversational systems.","PeriodicalId":198379,"journal":{"name":"Proceedings of the 2018 Conference on Human Information Interaction & Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 Conference on Human Information Interaction & Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3176349.3176879","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24
Abstract
Nowadays natural language user interfaces, such as chatbots and conversational agents, are very common. A desirable trait of such applications is a sense of humor. It is, therefore, important to be able to measure quality of humorous responses. However, humor evaluation is hard since humor is highly subjective. To address this problem, we conducted an online evaluation of 30 dialog jokes from different sources by almost 300 participants -- volunteers and Mechanical Turk workers. We collected joke ratings along with participants» age, gender, and language proficiency. Results show that demographics and joke topics can partly explain variation in humor judgments. We expect that these insights will aid humor evaluation and interpretation. The findings can also be of interest for humor generation methods in conversational systems.