使用主动学习辅助文本注释，以较少的努力实现高质量

2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) Pub Date : 2021-09-01 DOI:10.1109/JCDL52503.2021.00038

Franziska Weeber, Felix Hamborg, K. Donnay, Bela Gipp

{"title":"使用主动学习辅助文本注释，以较少的努力实现高质量","authors":"Franziska Weeber, Felix Hamborg, K. Donnay, Bela Gipp","doi":"10.1109/JCDL52503.2021.00038","DOIUrl":null,"url":null,"abstract":"Large amounts of annotated data have become more important than ever, especially since the rise of deep learning techniques. However, manual annotations are costly. We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations, thus strongly reducing annotation cost and effort. For this purpose, we combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories in the given text documents. To highlight our research direction's potential, we evaluate the approach on the task of identifying frames in news articles. Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even these complex and subtle frames. On the framing dataset, the AL approach needs only 16.3% of the annotations to reach the same performance as a model trained on the full dataset.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assisted Text Annotation Using Active Learning to Achieve High Quality with Little Effort\",\"authors\":\"Franziska Weeber, Felix Hamborg, K. Donnay, Bela Gipp\",\"doi\":\"10.1109/JCDL52503.2021.00038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large amounts of annotated data have become more important than ever, especially since the rise of deep learning techniques. However, manual annotations are costly. We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations, thus strongly reducing annotation cost and effort. For this purpose, we combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories in the given text documents. To highlight our research direction's potential, we evaluate the approach on the task of identifying frames in news articles. Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even these complex and subtle frames. On the framing dataset, the AL approach needs only 16.3% of the annotations to reach the same performance as a model trained on the full dataset.\",\"PeriodicalId\":112400,\"journal\":{\"name\":\"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/JCDL52503.2021.00038\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCDL52503.2021.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

大量带注释的数据变得比以往任何时候都更加重要，尤其是在深度学习技术兴起之后。然而，手工注释的成本很高。我们提出了一个工具，使研究人员能够创建大型的，高质量的，带注释的数据集，只需要少量的手动注释，从而大大降低了注释的成本和工作量。为此，我们将主动学习(AL)方法与预训练的语言模型结合起来，以半自动地识别给定文本文档中的注释类别。为了突出我们研究方向的潜力，我们评估了在新闻文章中识别框架的方法。我们的初步结果表明，使用人工智能大大减少了对这些复杂和微妙的框架进行正确分类的注释数量。在分帧数据集上，人工智能方法只需要16.3%的注释就能达到与在完整数据集上训练的模型相同的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Assisted Text Annotation Using Active Learning to Achieve High Quality with Little Effort

Large amounts of annotated data have become more important than ever, especially since the rise of deep learning techniques. However, manual annotations are costly. We propose a tool that enables researchers to create large, high-quality, annotated datasets with only a few manual annotations, thus strongly reducing annotation cost and effort. For this purpose, we combine an active learning (AL) approach with a pre-trained language model to semi-automatically identify annotation categories in the given text documents. To highlight our research direction's potential, we evaluate the approach on the task of identifying frames in news articles. Our preliminary results show that employing AL strongly reduces the number of annotations for correct classification of even these complex and subtle frames. On the framing dataset, the AL approach needs only 16.3% of the annotations to reach the same performance as a model trained on the full dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)

自引率

0.00%

发文量