Towards the automatic detection and identification of English puns

Q2 Social Sciences

European Journal of Humour Research Pub Date : 2016-01-26 DOI:10.7592/EJHR2016.4.1.MILLER

Tristan Miller, M. Turkovic

{"title":"Towards the automatic detection and identification of English puns","authors":"Tristan Miller, M. Turkovic","doi":"10.7592/EJHR2016.4.1.MILLER","DOIUrl":null,"url":null,"abstract":"Lexical polysemy, a fundamental characteristic of all human languages, has long been regarded as a major challenge to machine translation, human–computer interaction, and other applications of computational natural language processing (NLP). Traditional approaches to automatic word sense disambiguation (WSD) rest on the assumption that there exists a single, unambiguous communicative intention underlying every word in a document. However, writers sometimes intend for a word to be interpreted as simultaneously carrying multiple distinct meanings. This deliberate use of lexical ambiguity — i.e. punning — is a particularly common source of humour, and therefore has important implications for how NLP systems process documents and interact with users. In this paper we make a case for research into computational methods for the detection of puns in running text and for the isolation of the intended meanings. We discuss the challenges involved in adapting principles and techniques from WSD to humorously ambiguous text, and outline our plans for evaluating WSD-inspired systems in a dedicated pun identification task. We describe the compilation of a large manually annotated corpus of puns and present an analysis of its properties. While our work is principally concerned with simple puns which are monolexemic and homographic (i.e. exploiting single words which have different meanings but are spelled identically), we touch on the challenges involved in processing other types .","PeriodicalId":37540,"journal":{"name":"European Journal of Humour Research","volume":"4 1","pages":"59-75"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Humour Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7592/EJHR2016.4.1.MILLER","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 29

Abstract

Lexical polysemy, a fundamental characteristic of all human languages, has long been regarded as a major challenge to machine translation, human–computer interaction, and other applications of computational natural language processing (NLP). Traditional approaches to automatic word sense disambiguation (WSD) rest on the assumption that there exists a single, unambiguous communicative intention underlying every word in a document. However, writers sometimes intend for a word to be interpreted as simultaneously carrying multiple distinct meanings. This deliberate use of lexical ambiguity — i.e. punning — is a particularly common source of humour, and therefore has important implications for how NLP systems process documents and interact with users. In this paper we make a case for research into computational methods for the detection of puns in running text and for the isolation of the intended meanings. We discuss the challenges involved in adapting principles and techniques from WSD to humorously ambiguous text, and outline our plans for evaluating WSD-inspired systems in a dedicated pun identification task. We describe the compilation of a large manually annotated corpus of puns and present an analysis of its properties. While our work is principally concerned with simple puns which are monolexemic and homographic (i.e. exploiting single words which have different meanings but are spelled identically), we touch on the challenges involved in processing other types .

查看原文本刊更多论文

实现英语双关语的自动检测与识别

词汇多义是所有人类语言的基本特征，长期以来一直被认为是机器翻译、人机交互和其他计算自然语言处理(NLP)应用的主要挑战。传统的自动词义消歧(WSD)方法基于一个假设，即文档中的每个单词都存在一个单一的、明确的交际意图。然而，作家有时希望一个词被解释为同时具有多种不同的含义。这种刻意使用的词汇歧义——即双关语——是幽默的一种特别常见的来源，因此对NLP系统如何处理文档和与用户交互具有重要意义。在本文中，我们对运行文本中双关语检测和意图意义分离的计算方法进行了研究。我们讨论了将WSD的原则和技术应用于幽默的歧义文本所涉及的挑战，并概述了我们在一个专门的双关语识别任务中评估WSD启发系统的计划。我们描述了一个大型人工标注双关语语料库的编译，并对其属性进行了分析。虽然我们的工作主要涉及单音节和同形双关语(即利用具有不同含义但拼写相同的单个单词)的简单双关语，但我们触及了处理其他类型双关语所涉及的挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Journal of Humour Research Social Sciences-Cultural Studies

CiteScore

1.10

自引率

0.00%

发文量

审稿时长

6 weeks

期刊介绍： The European Journal of Humour Research (EJHR) is a peer-reviewed quarterly journal with an international multidisciplinary editorial board. Although geographically-oriented towards the ˋold continentˊ, the European perspective aims at an international readership and contributors. EJHR covers the full range of work being done on all aspects of humour phenomenon. EJHR is designed to respond to the important changes that have affected the study of humour but particular predominance is given to the past events and current developments in Europe.