Automating the Detection of Poetic Features: The Limerick as Model Organism

Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature Pub Date : 1900-01-01 DOI:10.18653/v1/2021.latechclfl-1.9

Almas Abdibayev, Yohei Igarashi, A. Riddell, D. Rockmore

{"title":"Automating the Detection of Poetic Features: The Limerick as Model Organism","authors":"Almas Abdibayev, Yohei Igarashi, A. Riddell, D. Rockmore","doi":"10.18653/v1/2021.latechclfl-1.9","DOIUrl":null,"url":null,"abstract":"In this paper we take up the problem of “limerick detection” and describe a system to identify five-line poems as limericks or not. This turns out to be a surprisingly difficult challenge with many subtleties. More precisely, we produce an algorithm which focuses on the structural aspects of the limerick – rhyme scheme and rhythm (i.e., stress patterns) – and when tested on a a culled data set of 98,454 publicly available limericks, our “limerick filter” accepts 67% as limericks. The primary failure of our filter is on the detection of “non-standard” rhymes, which we highlight as an outstanding challenge in computational poetics. Our accent detection algorithm proves to be very robust. Our main contributions are (1) a novel rhyme detection algorithm that works on English words including rare proper nouns and made-up words (and thus, words not in the widely used CMUDict database); (2) a novel rhythm-identifying heuristic that is robust to language noise at moderate levels and comparable in accuracy to state-of-the-art scansion algorithms. As a third significant contribution (3) we make publicly available a large corpus of limericks that includes tags of “limerick” or “not-limerick” as determined by our identification software, thereby providing a benchmark for the community. The poetic tasks that we have identified as challenges for machines suggest that the limerick is a useful “model organism” for the study of machine capabilities in poetry and more broadly literature and language. We include a list of open challenges as well. Generally, we anticipate that this work will provide useful material and benchmarks for future explorations in the field.","PeriodicalId":441300,"journal":{"name":"Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2021.latechclfl-1.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In this paper we take up the problem of “limerick detection” and describe a system to identify five-line poems as limericks or not. This turns out to be a surprisingly difficult challenge with many subtleties. More precisely, we produce an algorithm which focuses on the structural aspects of the limerick – rhyme scheme and rhythm (i.e., stress patterns) – and when tested on a a culled data set of 98,454 publicly available limericks, our “limerick filter” accepts 67% as limericks. The primary failure of our filter is on the detection of “non-standard” rhymes, which we highlight as an outstanding challenge in computational poetics. Our accent detection algorithm proves to be very robust. Our main contributions are (1) a novel rhyme detection algorithm that works on English words including rare proper nouns and made-up words (and thus, words not in the widely used CMUDict database); (2) a novel rhythm-identifying heuristic that is robust to language noise at moderate levels and comparable in accuracy to state-of-the-art scansion algorithms. As a third significant contribution (3) we make publicly available a large corpus of limericks that includes tags of “limerick” or “not-limerick” as determined by our identification software, thereby providing a benchmark for the community. The poetic tasks that we have identified as challenges for machines suggest that the limerick is a useful “model organism” for the study of machine capabilities in poetry and more broadly literature and language. We include a list of open challenges as well. Generally, we anticipate that this work will provide useful material and benchmarks for future explorations in the field.

查看原文本刊更多论文

诗歌特征的自动化检测:作为模式生物的打油诗

本文探讨了“打油诗检测”问题，提出了一个五行诗是否为打油诗的识别系统。事实证明，这是一个非常困难的挑战，其中有许多微妙之处。更准确地说，我们产生了一个算法，专注于打油诗的结构方面——押韵方案和节奏(即重音模式)——当在98,454首公开可用的打油诗的精选数据集上进行测试时，我们的“打油诗过滤器”接受67%的打油诗。我们的过滤器的主要失败是在“非标准”押韵的检测上，我们强调这是计算诗学中的一个突出挑战。我们的算法被证明是非常鲁棒的。我们的主要贡献有:(1)一种新颖的押韵检测算法，该算法适用于英语单词，包括罕见的专有名词和合成词(因此，不在广泛使用的CMUDict数据库中的单词);(2)一种新颖的节奏识别启发式算法，它对中等水平的语言噪声具有鲁棒性，其准确性可与最先进的扫描算法相媲美。作为第三个重要贡献(3)，我们公开了大量的打油诗语料库，其中包括由我们的识别软件确定的“打油诗”或“非打油诗”标签，从而为社区提供了一个基准。我们已经确定为机器挑战的诗歌任务表明，打油诗是研究机器在诗歌和更广泛的文学和语言方面的能力的有用的“模式生物”。我们还列出了一系列公开的挑战。总的来说，我们预计这项工作将为该领域未来的探索提供有用的材料和基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

自引率

0.00%

发文量