{"title":"Hierarchical Persian Text Categorization in Absence of Labeled Data","authors":"S. Masoudian, V. Derhami, S. Zarifzadeh","doi":"10.1109/IranianCEE.2019.8786690","DOIUrl":null,"url":null,"abstract":"Hierarchical text categorization is used in many real-world applications, such as webpage topic classification and product categorization. Large quantities of labeled training data are needed to build an accurate hierarchical classification model. However, labeled samples are difficult and very time-consuming to obtain. On the other hand, due to the expansion of the Internet, plenty of unlabeled documents are available. In this paper, we propose a top-down method to hierarchically categorize partially labeled documents (having labeled documents only at first few levels of the hierarchy) using “local classifier per parent node” approach. We utilize a classification algorithm for the parent nodes with training data available. We use a labeling strategy for other parent nodes that do not have labeled data to be able to train a classifier. In our knowledge, this is the first study on hierarchical Persian text categorization, and our experiments show that the proposed approach achieves an acceptable performance.","PeriodicalId":6683,"journal":{"name":"2019 27th Iranian Conference on Electrical Engineering (ICEE)","volume":"53 1","pages":"1951-1955"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 27th Iranian Conference on Electrical Engineering (ICEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IranianCEE.2019.8786690","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Hierarchical text categorization is used in many real-world applications, such as webpage topic classification and product categorization. Large quantities of labeled training data are needed to build an accurate hierarchical classification model. However, labeled samples are difficult and very time-consuming to obtain. On the other hand, due to the expansion of the Internet, plenty of unlabeled documents are available. In this paper, we propose a top-down method to hierarchically categorize partially labeled documents (having labeled documents only at first few levels of the hierarchy) using “local classifier per parent node” approach. We utilize a classification algorithm for the parent nodes with training data available. We use a labeling strategy for other parent nodes that do not have labeled data to be able to train a classifier. In our knowledge, this is the first study on hierarchical Persian text categorization, and our experiments show that the proposed approach achieves an acceptable performance.