{"title":"基于预训练模型的意见持有人检测数据集构建与分类","authors":"Al-Mahmud, Kazutaka Shimada","doi":"10.1109/IIAIAAI55812.2022.00023","DOIUrl":null,"url":null,"abstract":"Nowadays, it is getting increased in the massive amount of internet users. People express subjective thinking (i.e., opinion) implicitly and explicitly on online platforms such as Facebook, Twitter, Amazon product reviews, etc. Opinion holders are the people or entities who express opinions implicitly and explicitly on the online platform. With the increasing trends of online opinion mass information, it is impossible to detect them manually. For this reason, an automatic approach for opinion holder detection is essential. Opinion holder detection is useful to detect specific person’s/entity’s concerns about a particular topic, product, or problem. Opinion holder detection consists of two steps: the presence of opinion holders in text and identification of opinion holders. In this paper, we focus on the first step, namely the presence of opinion holders in text. We handle this task as a binary classification problem: INSIDE or OUTSIDE. At first, we prepare a new English dataset for this task. Then, we apply two types of pre-trained models, BERT and DistilBERT, to the INSIDE/OUTSIDE classification task. BERT is a transformer-based pre-trained language model. DistilBERT is a small, fast, cheap, and light model based on knowledge distillation from the BERT architecture. As to the binary classification task, we employ a logistic regression model on the top layer of the pre-trained models. We compare the language models employed in our experiment in terms of the F1 score and accuracy. The experimental result shows that DistilBERT obtained superior performance among the models: 0.901 on the F1 score and 0.924 on the accuracy.","PeriodicalId":156230,"journal":{"name":"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dataset Construction and Classification Based on Pre-trained Models for Opinion Holder Detection\",\"authors\":\"Al-Mahmud, Kazutaka Shimada\",\"doi\":\"10.1109/IIAIAAI55812.2022.00023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, it is getting increased in the massive amount of internet users. People express subjective thinking (i.e., opinion) implicitly and explicitly on online platforms such as Facebook, Twitter, Amazon product reviews, etc. Opinion holders are the people or entities who express opinions implicitly and explicitly on the online platform. With the increasing trends of online opinion mass information, it is impossible to detect them manually. For this reason, an automatic approach for opinion holder detection is essential. Opinion holder detection is useful to detect specific person’s/entity’s concerns about a particular topic, product, or problem. Opinion holder detection consists of two steps: the presence of opinion holders in text and identification of opinion holders. In this paper, we focus on the first step, namely the presence of opinion holders in text. We handle this task as a binary classification problem: INSIDE or OUTSIDE. At first, we prepare a new English dataset for this task. Then, we apply two types of pre-trained models, BERT and DistilBERT, to the INSIDE/OUTSIDE classification task. BERT is a transformer-based pre-trained language model. DistilBERT is a small, fast, cheap, and light model based on knowledge distillation from the BERT architecture. As to the binary classification task, we employ a logistic regression model on the top layer of the pre-trained models. We compare the language models employed in our experiment in terms of the F1 score and accuracy. The experimental result shows that DistilBERT obtained superior performance among the models: 0.901 on the F1 score and 0.924 on the accuracy.\",\"PeriodicalId\":156230,\"journal\":{\"name\":\"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IIAIAAI55812.2022.00023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 12th International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIAIAAI55812.2022.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dataset Construction and Classification Based on Pre-trained Models for Opinion Holder Detection
Nowadays, it is getting increased in the massive amount of internet users. People express subjective thinking (i.e., opinion) implicitly and explicitly on online platforms such as Facebook, Twitter, Amazon product reviews, etc. Opinion holders are the people or entities who express opinions implicitly and explicitly on the online platform. With the increasing trends of online opinion mass information, it is impossible to detect them manually. For this reason, an automatic approach for opinion holder detection is essential. Opinion holder detection is useful to detect specific person’s/entity’s concerns about a particular topic, product, or problem. Opinion holder detection consists of two steps: the presence of opinion holders in text and identification of opinion holders. In this paper, we focus on the first step, namely the presence of opinion holders in text. We handle this task as a binary classification problem: INSIDE or OUTSIDE. At first, we prepare a new English dataset for this task. Then, we apply two types of pre-trained models, BERT and DistilBERT, to the INSIDE/OUTSIDE classification task. BERT is a transformer-based pre-trained language model. DistilBERT is a small, fast, cheap, and light model based on knowledge distillation from the BERT architecture. As to the binary classification task, we employ a logistic regression model on the top layer of the pre-trained models. We compare the language models employed in our experiment in terms of the F1 score and accuracy. The experimental result shows that DistilBERT obtained superior performance among the models: 0.901 on the F1 score and 0.924 on the accuracy.