Stijn De Saeger, Jun'ichi Kazama, Kentaro Torisawa, M. Murata, Ichiro Yamada, Kow Kuroda
{"title":"A web service for automatic word class acquisition","authors":"Stijn De Saeger, Jun'ichi Kazama, Kentaro Torisawa, M. Murata, Ichiro Yamada, Kow Kuroda","doi":"10.1145/1667780.1667806","DOIUrl":null,"url":null,"abstract":"In this paper we present a Web service for building NLP resources to construct semantic word classes in Japanese. The system takes a few seed words belonging to the target class as input and uses automatic class expansion to suggest semantically similar training samples for the user to label. The system automatically generates random negative training samples as well, and then trains a supervised classifier on this labeled data to generate the target word class from 107 candidate words extracted from a corpus of of 108 Web documents. This system eliminates the need for expert machine learning knowledge in creating semantic word classes, and we experimentally show that it significantly reduces the human effort required to build them.","PeriodicalId":103128,"journal":{"name":"Proceedings of the 3rd International Universal Communication Symposium","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Universal Communication Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1667780.1667806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In this paper we present a Web service for building NLP resources to construct semantic word classes in Japanese. The system takes a few seed words belonging to the target class as input and uses automatic class expansion to suggest semantically similar training samples for the user to label. The system automatically generates random negative training samples as well, and then trains a supervised classifier on this labeled data to generate the target word class from 107 candidate words extracted from a corpus of of 108 Web documents. This system eliminates the need for expert machine learning knowledge in creating semantic word classes, and we experimentally show that it significantly reduces the human effort required to build them.