Automatic Mining of Human Activity Attributes from Weblogs

The-Minh Nguyen, Takahiro Kawamura, Hiroyuki Nakagawa, Yasuyuki Tahara, Akihiko Ohsuga
{"title":"Automatic Mining of Human Activity Attributes from Weblogs","authors":"The-Minh Nguyen, Takahiro Kawamura, Hiroyuki Nakagawa, Yasuyuki Tahara, Akihiko Ohsuga","doi":"10.1109/ICIS.2010.44","DOIUrl":null,"url":null,"abstract":"In this paper, we define an activity by five basic attributes: actor, action, object, time and location. The goal of this paper is to describe a method to automatically extract all attributes in each sentence retrieved from Japanese weblogs. Previous work had some limitations, such as high setup cost, inability of extracting all attributes, limitation on the types of sentences that can be handled, and insufficient consideration of interdependency among attributes. To resolve these problems, this paper proposes a novel approach that uses conditional random fields and self-supervised learning. This approach treats the activity extraction as a sequence labeling problem, and has advantages such as domain-independence, scalability, and does not require any hand-tagged data. Since it is unnecessary to fix the positions and the number of the attributes in activity sentences, this approach can extract all attributes by making only a single pass over its corpus. Additionally, by converting to simpler sentences, the proposed approach can deal with complex sentences retrieved from Japanese weblogs. In an experiment, this approach achieves high precision (activity: 88.87%, attributes: over 90%).","PeriodicalId":338038,"journal":{"name":"2010 IEEE/ACIS 9th International Conference on Computer and Information Science","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE/ACIS 9th International Conference on Computer and Information Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIS.2010.44","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In this paper, we define an activity by five basic attributes: actor, action, object, time and location. The goal of this paper is to describe a method to automatically extract all attributes in each sentence retrieved from Japanese weblogs. Previous work had some limitations, such as high setup cost, inability of extracting all attributes, limitation on the types of sentences that can be handled, and insufficient consideration of interdependency among attributes. To resolve these problems, this paper proposes a novel approach that uses conditional random fields and self-supervised learning. This approach treats the activity extraction as a sequence labeling problem, and has advantages such as domain-independence, scalability, and does not require any hand-tagged data. Since it is unnecessary to fix the positions and the number of the attributes in activity sentences, this approach can extract all attributes by making only a single pass over its corpus. Additionally, by converting to simpler sentences, the proposed approach can deal with complex sentences retrieved from Japanese weblogs. In an experiment, this approach achieves high precision (activity: 88.87%, attributes: over 90%).
从博客中自动挖掘人类活动属性
在本文中,我们用五个基本属性来定义活动:行动者、动作、对象、时间和地点。本文的目标是描述一种自动提取日语博客中每个句子的所有属性的方法。以往的工作存在一些局限性,如设置成本高、不能提取所有属性、可处理的句子类型有限、没有充分考虑属性之间的相互依赖性等。为了解决这些问题,本文提出了一种使用条件随机场和自监督学习的新方法。该方法将活动提取视为序列标记问题,具有领域独立性、可扩展性和不需要任何手工标记数据等优点。由于不需要固定活动句中属性的位置和数量,因此该方法只需对其语料库进行一次遍历即可提取所有属性。此外,通过转换为更简单的句子,所提出的方法可以处理从日语博客中检索到的复杂句子。在实验中,该方法达到了较高的精度(活动:88.87%,属性:90%以上)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信