基于预训练模型的意见持有人检测数据集构建与分类

2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI) Pub Date : 2022-07-01 DOI:10.1109/IIAIAAI55812.2022.00023

Al-Mahmud, Kazutaka Shimada

{"title":"基于预训练模型的意见持有人检测数据集构建与分类","authors":"Al-Mahmud, Kazutaka Shimada","doi":"10.1109/IIAIAAI55812.2022.00023","DOIUrl":null,"url":null,"abstract":"Nowadays, it is getting increased in the massive amount of internet users. People express subjective thinking (i.e., opinion) implicitly and explicitly on online platforms such as Facebook, Twitter, Amazon product reviews, etc. Opinion holders are the people or entities who express opinions implicitly and explicitly on the online platform. With the increasing trends of online opinion mass information, it is impossible to detect them manually. For this reason, an automatic approach for opinion holder detection is essential. Opinion holder detection is useful to detect specific person’s/entity’s concerns about a particular topic, product, or problem. Opinion holder detection consists of two steps: the presence of opinion holders in text and identification of opinion holders. In this paper, we focus on the first step, namely the presence of opinion holders in text. We handle this task as a binary classification problem: INSIDE or OUTSIDE. At first, we prepare a new English dataset for this task. Then, we apply two types of pre-trained models, BERT and DistilBERT, to the INSIDE/OUTSIDE classification task. BERT is a transformer-based pre-trained language model. DistilBERT is a small, fast, cheap, and light model based on knowledge distillation from the BERT architecture. As to the binary classification task, we employ a logistic regression model on the top layer of the pre-trained models. We compare the language models employed in our experiment in terms of the F1 score and accuracy. The experimental result shows that DistilBERT obtained superior performance among the models: 0.901 on the F1 score and 0.924 on the accuracy.","PeriodicalId":156230,"journal":{"name":"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dataset Construction and Classification Based on Pre-trained Models for Opinion Holder Detection\",\"authors\":\"Al-Mahmud, Kazutaka Shimada\",\"doi\":\"10.1109/IIAIAAI55812.2022.00023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, it is getting increased in the massive amount of internet users. People express subjective thinking (i.e., opinion) implicitly and explicitly on online platforms such as Facebook, Twitter, Amazon product reviews, etc. Opinion holders are the people or entities who express opinions implicitly and explicitly on the online platform. With the increasing trends of online opinion mass information, it is impossible to detect them manually. For this reason, an automatic approach for opinion holder detection is essential. Opinion holder detection is useful to detect specific person’s/entity’s concerns about a particular topic, product, or problem. Opinion holder detection consists of two steps: the presence of opinion holders in text and identification of opinion holders. In this paper, we focus on the first step, namely the presence of opinion holders in text. We handle this task as a binary classification problem: INSIDE or OUTSIDE. At first, we prepare a new English dataset for this task. Then, we apply two types of pre-trained models, BERT and DistilBERT, to the INSIDE/OUTSIDE classification task. BERT is a transformer-based pre-trained language model. DistilBERT is a small, fast, cheap, and light model based on knowledge distillation from the BERT architecture. As to the binary classification task, we employ a logistic regression model on the top layer of the pre-trained models. We compare the language models employed in our experiment in terms of the F1 score and accuracy. The experimental result shows that DistilBERT obtained superior performance among the models: 0.901 on the F1 score and 0.924 on the accuracy.\",\"PeriodicalId\":156230,\"journal\":{\"name\":\"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IIAIAAI55812.2022.00023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIAIAAI55812.2022.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

如今，互联网用户的数量越来越多。人们在Facebook、Twitter、亚马逊产品评论等网络平台上或隐或显地表达主观思维(即观点)。意见持有人是指在网络平台上含蓄或明确地表达意见的人或实体。随着网络舆论海量信息的日益增多，人工检测已不可能。因此，一种自动检测意见持有人的方法是必不可少的。意见持有者检测用于检测特定个人/实体对特定主题、产品或问题的关注。意见持有人检测包括两个步骤:意见持有人在文本中的存在和意见持有人的识别。在本文中，我们关注的是第一步，即意见持有者在文本中的存在。我们把这个任务当作一个二元分类问题来处理:INSIDE或OUTSIDE。首先，我们为这个任务准备了一个新的英语数据集。然后，我们将两种类型的预训练模型BERT和DistilBERT应用于INSIDE/OUTSIDE分类任务。BERT是一个基于转换器的预训练语言模型。蒸馏器是一种基于BERT体系结构的知识蒸馏的小型、快速、廉价和轻量级模型。对于二元分类任务，我们在预训练模型的顶层采用逻辑回归模型。我们比较了实验中使用的语言模型在F1分数和准确性方面的差异。实验结果表明，在所有模型中，蒸馏器模型的F1得分为0.901，准确率为0.924。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dataset Construction and Classification Based on Pre-trained Models for Opinion Holder Detection

Nowadays, it is getting increased in the massive amount of internet users. People express subjective thinking (i.e., opinion) implicitly and explicitly on online platforms such as Facebook, Twitter, Amazon product reviews, etc. Opinion holders are the people or entities who express opinions implicitly and explicitly on the online platform. With the increasing trends of online opinion mass information, it is impossible to detect them manually. For this reason, an automatic approach for opinion holder detection is essential. Opinion holder detection is useful to detect specific person’s/entity’s concerns about a particular topic, product, or problem. Opinion holder detection consists of two steps: the presence of opinion holders in text and identification of opinion holders. In this paper, we focus on the first step, namely the presence of opinion holders in text. We handle this task as a binary classification problem: INSIDE or OUTSIDE. At first, we prepare a new English dataset for this task. Then, we apply two types of pre-trained models, BERT and DistilBERT, to the INSIDE/OUTSIDE classification task. BERT is a transformer-based pre-trained language model. DistilBERT is a small, fast, cheap, and light model based on knowledge distillation from the BERT architecture. As to the binary classification task, we employ a logistic regression model on the top layer of the pre-trained models. We compare the language models employed in our experiment in terms of the F1 score and accuracy. The experimental result shows that DistilBERT obtained superior performance among the models: 0.901 on the F1 score and 0.924 on the accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)

自引率

0.00%

发文量