{"title":"从博客中自动挖掘人类活动属性","authors":"The-Minh Nguyen, Takahiro Kawamura, Hiroyuki Nakagawa, Yasuyuki Tahara, Akihiko Ohsuga","doi":"10.1109/ICIS.2010.44","DOIUrl":null,"url":null,"abstract":"In this paper, we define an activity by five basic attributes: actor, action, object, time and location. The goal of this paper is to describe a method to automatically extract all attributes in each sentence retrieved from Japanese weblogs. Previous work had some limitations, such as high setup cost, inability of extracting all attributes, limitation on the types of sentences that can be handled, and insufficient consideration of interdependency among attributes. To resolve these problems, this paper proposes a novel approach that uses conditional random fields and self-supervised learning. This approach treats the activity extraction as a sequence labeling problem, and has advantages such as domain-independence, scalability, and does not require any hand-tagged data. Since it is unnecessary to fix the positions and the number of the attributes in activity sentences, this approach can extract all attributes by making only a single pass over its corpus. Additionally, by converting to simpler sentences, the proposed approach can deal with complex sentences retrieved from Japanese weblogs. In an experiment, this approach achieves high precision (activity: 88.87%, attributes: over 90%).","PeriodicalId":338038,"journal":{"name":"2010 IEEE/ACIS 9th International Conference on Computer and Information Science","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Automatic Mining of Human Activity Attributes from Weblogs\",\"authors\":\"The-Minh Nguyen, Takahiro Kawamura, Hiroyuki Nakagawa, Yasuyuki Tahara, Akihiko Ohsuga\",\"doi\":\"10.1109/ICIS.2010.44\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we define an activity by five basic attributes: actor, action, object, time and location. The goal of this paper is to describe a method to automatically extract all attributes in each sentence retrieved from Japanese weblogs. Previous work had some limitations, such as high setup cost, inability of extracting all attributes, limitation on the types of sentences that can be handled, and insufficient consideration of interdependency among attributes. To resolve these problems, this paper proposes a novel approach that uses conditional random fields and self-supervised learning. This approach treats the activity extraction as a sequence labeling problem, and has advantages such as domain-independence, scalability, and does not require any hand-tagged data. Since it is unnecessary to fix the positions and the number of the attributes in activity sentences, this approach can extract all attributes by making only a single pass over its corpus. Additionally, by converting to simpler sentences, the proposed approach can deal with complex sentences retrieved from Japanese weblogs. In an experiment, this approach achieves high precision (activity: 88.87%, attributes: over 90%).\",\"PeriodicalId\":338038,\"journal\":{\"name\":\"2010 IEEE/ACIS 9th International Conference on Computer and Information Science\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE/ACIS 9th International Conference on Computer and Information Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIS.2010.44\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE/ACIS 9th International Conference on Computer and Information Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIS.2010.44","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic Mining of Human Activity Attributes from Weblogs
In this paper, we define an activity by five basic attributes: actor, action, object, time and location. The goal of this paper is to describe a method to automatically extract all attributes in each sentence retrieved from Japanese weblogs. Previous work had some limitations, such as high setup cost, inability of extracting all attributes, limitation on the types of sentences that can be handled, and insufficient consideration of interdependency among attributes. To resolve these problems, this paper proposes a novel approach that uses conditional random fields and self-supervised learning. This approach treats the activity extraction as a sequence labeling problem, and has advantages such as domain-independence, scalability, and does not require any hand-tagged data. Since it is unnecessary to fix the positions and the number of the attributes in activity sentences, this approach can extract all attributes by making only a single pass over its corpus. Additionally, by converting to simpler sentences, the proposed approach can deal with complex sentences retrieved from Japanese weblogs. In an experiment, this approach achieves high precision (activity: 88.87%, attributes: over 90%).