真实推文多标签分类的两阶段提取方法

International Conference on Information Integration and Web-based Applications & Services Pub Date : 2013-12-02 DOI:10.1145/2539150.2539197

Shuhei Yamamoto, T. Satoh

{"title":"真实推文多标签分类的两阶段提取方法","authors":"Shuhei Yamamoto, T. Satoh","doi":"10.1145/2539150.2539197","DOIUrl":null,"url":null,"abstract":"Recently, many users share their daily events and opinions on Twitter. Some are beneficial and comment on several aspects of a user's real life, i.e., eating, traffic, weather, disasters, and so on. Such posts as \"The train is not coming!\" are categorized in the \"Traffic\" aspect and will support users who want to ride the train. Such tweets as \"The train is not coming due to heavy rain\" are categorized in both the \"Traffic\" and \"Weather\" aspects. In this paper, we propose a multi-label method that estimates appropriate aspects against unknown tweets by extending the two phase extraction method. In it, many topics are extracted from a sea of tweets using Latent Dirichlet Allocation (LDA). Associations among many topics and fewer aspects are built using a small set of labeled tweets. Aspect scores for unknown tweets are calculated using the associations among the topics and the aspects based on the extracted terms. Appropriate aspects are labeled for unknown tweets by averaging of the aspect scores. Using a large amount of actual tweets, our sophisticated experimental evaluations demonstrate the high efficiency of our proposed multi-label classification method. When an aspect score is much larger than others, that aspect is estimated against the tweet. When several aspect scores are large within similar values, these aspects are estimated. Based on the experimental evaluation results, our prototype system demonstrates that our proposed method can appropriately estimate some aspects of each unknown tweets.","PeriodicalId":424918,"journal":{"name":"International Conference on Information Integration and Web-based Applications & Services","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Two Phase Extraction Method for Multi-label Classification of Real Life Tweets\",\"authors\":\"Shuhei Yamamoto, T. Satoh\",\"doi\":\"10.1145/2539150.2539197\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, many users share their daily events and opinions on Twitter. Some are beneficial and comment on several aspects of a user's real life, i.e., eating, traffic, weather, disasters, and so on. Such posts as \\\"The train is not coming!\\\" are categorized in the \\\"Traffic\\\" aspect and will support users who want to ride the train. Such tweets as \\\"The train is not coming due to heavy rain\\\" are categorized in both the \\\"Traffic\\\" and \\\"Weather\\\" aspects. In this paper, we propose a multi-label method that estimates appropriate aspects against unknown tweets by extending the two phase extraction method. In it, many topics are extracted from a sea of tweets using Latent Dirichlet Allocation (LDA). Associations among many topics and fewer aspects are built using a small set of labeled tweets. Aspect scores for unknown tweets are calculated using the associations among the topics and the aspects based on the extracted terms. Appropriate aspects are labeled for unknown tweets by averaging of the aspect scores. Using a large amount of actual tweets, our sophisticated experimental evaluations demonstrate the high efficiency of our proposed multi-label classification method. When an aspect score is much larger than others, that aspect is estimated against the tweet. When several aspect scores are large within similar values, these aspects are estimated. Based on the experimental evaluation results, our prototype system demonstrates that our proposed method can appropriately estimate some aspects of each unknown tweets.\",\"PeriodicalId\":424918,\"journal\":{\"name\":\"International Conference on Information Integration and Web-based Applications & Services\",\"volume\":\"94 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Information Integration and Web-based Applications & Services\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2539150.2539197\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Information Integration and Web-based Applications & Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2539150.2539197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

最近，许多用户在Twitter上分享他们的日常事件和观点。有些是有益的，评论用户现实生活的几个方面，例如，饮食、交通、天气、灾难等等。像“火车还没来!”这样的帖子被分类在“交通”方面，将支持想要乘坐火车的用户。像“火车因为大雨而没有来”这样的推文被分为“交通”和“天气”两个方面。在本文中，我们提出了一种多标签方法，通过扩展两阶段提取方法来估计未知推文的适当方面。其中，使用潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)从大量推文中提取许多主题。许多主题和更少方面之间的关联是使用一小组标记的tweet建立的。未知tweet的方面得分是根据提取的术语使用主题和方面之间的关联来计算的。通过平均方面得分来标记未知tweet的适当方面。使用大量的实际推文，我们复杂的实验评估证明了我们提出的多标签分类方法的高效率。当一个方面的得分比其他方面大得多时，该方面就会根据推文进行估计。当几个方面的得分在相似值内较大时，对这些方面进行估计。基于实验评估结果，我们的原型系统表明，我们提出的方法可以适当地估计每个未知推文的某些方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Two Phase Extraction Method for Multi-label Classification of Real Life Tweets

Recently, many users share their daily events and opinions on Twitter. Some are beneficial and comment on several aspects of a user's real life, i.e., eating, traffic, weather, disasters, and so on. Such posts as "The train is not coming!" are categorized in the "Traffic" aspect and will support users who want to ride the train. Such tweets as "The train is not coming due to heavy rain" are categorized in both the "Traffic" and "Weather" aspects. In this paper, we propose a multi-label method that estimates appropriate aspects against unknown tweets by extending the two phase extraction method. In it, many topics are extracted from a sea of tweets using Latent Dirichlet Allocation (LDA). Associations among many topics and fewer aspects are built using a small set of labeled tweets. Aspect scores for unknown tweets are calculated using the associations among the topics and the aspects based on the extracted terms. Appropriate aspects are labeled for unknown tweets by averaging of the aspect scores. Using a large amount of actual tweets, our sophisticated experimental evaluations demonstrate the high efficiency of our proposed multi-label classification method. When an aspect score is much larger than others, that aspect is estimated against the tweet. When several aspect scores are large within similar values, these aspects are estimated. Based on the experimental evaluation results, our prototype system demonstrates that our proposed method can appropriately estimate some aspects of each unknown tweets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference on Information Integration and Web-based Applications & Services

自引率

0.00%

发文量