PAN 2023上的ARC-NLP:触发检测的分层长文本分类

Conference and Labs of the Evaluation Forum Pub Date : 2023-07-27 DOI:10.48550/arXiv.2307.14912

Umitcan Sahin, Izzet Emre Kucukkaya, Cagri Toraman

{"title":"PAN 2023上的ARC-NLP:触发检测的分层长文本分类","authors":"Umitcan Sahin, Izzet Emre Kucukkaya, Cagri Toraman","doi":"10.48550/arXiv.2307.14912","DOIUrl":null,"url":null,"abstract":"Fanfiction, a popular form of creative writing set within established fictional universes, has gained a substantial online following. However, ensuring the well-being and safety of participants has become a critical concern in this community. The detection of triggering content, material that may cause emotional distress or trauma to readers, poses a significant challenge. In this paper, we describe our approach for the Trigger Detection shared task at PAN CLEF 2023, where we want to detect multiple triggering content in a given Fanfiction document. For this, we build a hierarchical model that uses recurrence over Transformer-based language models. In our approach, we first split long documents into smaller sized segments and use them to fine-tune a Transformer model. Then, we extract feature embeddings from the fine-tuned Transformer model, which are used as input in the training of multiple LSTM models for trigger detection in a multi-label setting. Our model achieves an F1-macro score of 0.372 and F1-micro score of 0.736 on the validation set, which are higher than the baseline results shared at PAN CLEF 2023.","PeriodicalId":232729,"journal":{"name":"Conference and Labs of the Evaluation Forum","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"ARC-NLP at PAN 2023: Hierarchical Long Text Classification for Trigger Detection\",\"authors\":\"Umitcan Sahin, Izzet Emre Kucukkaya, Cagri Toraman\",\"doi\":\"10.48550/arXiv.2307.14912\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fanfiction, a popular form of creative writing set within established fictional universes, has gained a substantial online following. However, ensuring the well-being and safety of participants has become a critical concern in this community. The detection of triggering content, material that may cause emotional distress or trauma to readers, poses a significant challenge. In this paper, we describe our approach for the Trigger Detection shared task at PAN CLEF 2023, where we want to detect multiple triggering content in a given Fanfiction document. For this, we build a hierarchical model that uses recurrence over Transformer-based language models. In our approach, we first split long documents into smaller sized segments and use them to fine-tune a Transformer model. Then, we extract feature embeddings from the fine-tuned Transformer model, which are used as input in the training of multiple LSTM models for trigger detection in a multi-label setting. Our model achieves an F1-macro score of 0.372 and F1-micro score of 0.736 on the validation set, which are higher than the baseline results shared at PAN CLEF 2023.\",\"PeriodicalId\":232729,\"journal\":{\"name\":\"Conference and Labs of the Evaluation Forum\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Conference and Labs of the Evaluation Forum\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2307.14912\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference and Labs of the Evaluation Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2307.14912","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

同人小说是一种流行的创作形式，以既定的虚构世界为背景，在网上获得了大量的追随者。然而，确保参与者的福祉和安全已经成为这个社区的一个关键问题。检测触发内容，即可能对读者造成情感困扰或创伤的材料，构成了重大挑战。在本文中，我们描述了我们在PAN CLEF 2023上的触发检测共享任务的方法，我们希望在给定的同人小说文档中检测多个触发内容。为此，我们构建了一个分层模型，该模型在基于transformer的语言模型上使用递归。在我们的方法中，我们首先将长文档分成较小的部分，并使用它们对Transformer模型进行微调。然后，我们从微调后的Transformer模型中提取特征嵌入，并将其用作多标签设置下多个LSTM模型训练的输入，用于触发检测。我们的模型在验证集上的f1 -宏观得分为0.372,f1 -微观得分为0.736，高于PAN CLEF 2023共享的基线结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ARC-NLP at PAN 2023: Hierarchical Long Text Classification for Trigger Detection

Fanfiction, a popular form of creative writing set within established fictional universes, has gained a substantial online following. However, ensuring the well-being and safety of participants has become a critical concern in this community. The detection of triggering content, material that may cause emotional distress or trauma to readers, poses a significant challenge. In this paper, we describe our approach for the Trigger Detection shared task at PAN CLEF 2023, where we want to detect multiple triggering content in a given Fanfiction document. For this, we build a hierarchical model that uses recurrence over Transformer-based language models. In our approach, we first split long documents into smaller sized segments and use them to fine-tune a Transformer model. Then, we extract feature embeddings from the fine-tuned Transformer model, which are used as input in the training of multiple LSTM models for trigger detection in a multi-label setting. Our model achieves an F1-macro score of 0.372 and F1-micro score of 0.736 on the validation set, which are higher than the baseline results shared at PAN CLEF 2023.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Conference and Labs of the Evaluation Forum

自引率

0.00%

发文量