{"title":"基于协同训练和语义特征提取的正面和未标记文本分类","authors":"Na Luo, Fuyu Yuan, Wanli Zuo","doi":"10.1109/FITME.2008.81","DOIUrl":null,"url":null,"abstract":"This paper originally proposes a three-setp algorithm. First, CoTraining is employed for filtering out the likely positive data from the unlabeled dataset U. Second, we got vectors of documents in positive set using semantic-based feature extraction, then found the strong positive from likely positive set which is produced in first step. Those data picked out can be supplied to positive dataset P. Finally, a linear one-class SVM will learn from both the purified U as negative and the expanded P as positive. Because of the algorithm's characteristic of automatic expanding positive dataset, the proposed algorithm especially performs well in situations where given positive dataset P is insufficient. A comprehensive experiment had proved that our algorithm is preferable to the existing ones.","PeriodicalId":218182,"journal":{"name":"2008 International Seminar on Future Information Technology and Management Engineering","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Using CoTraining and Semantic Feature Extraction for Positive and Unlabeled Text Classification\",\"authors\":\"Na Luo, Fuyu Yuan, Wanli Zuo\",\"doi\":\"10.1109/FITME.2008.81\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper originally proposes a three-setp algorithm. First, CoTraining is employed for filtering out the likely positive data from the unlabeled dataset U. Second, we got vectors of documents in positive set using semantic-based feature extraction, then found the strong positive from likely positive set which is produced in first step. Those data picked out can be supplied to positive dataset P. Finally, a linear one-class SVM will learn from both the purified U as negative and the expanded P as positive. Because of the algorithm's characteristic of automatic expanding positive dataset, the proposed algorithm especially performs well in situations where given positive dataset P is insufficient. A comprehensive experiment had proved that our algorithm is preferable to the existing ones.\",\"PeriodicalId\":218182,\"journal\":{\"name\":\"2008 International Seminar on Future Information Technology and Management Engineering\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 International Seminar on Future Information Technology and Management Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FITME.2008.81\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Seminar on Future Information Technology and Management Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FITME.2008.81","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using CoTraining and Semantic Feature Extraction for Positive and Unlabeled Text Classification
This paper originally proposes a three-setp algorithm. First, CoTraining is employed for filtering out the likely positive data from the unlabeled dataset U. Second, we got vectors of documents in positive set using semantic-based feature extraction, then found the strong positive from likely positive set which is produced in first step. Those data picked out can be supplied to positive dataset P. Finally, a linear one-class SVM will learn from both the purified U as negative and the expanded P as positive. Because of the algorithm's characteristic of automatic expanding positive dataset, the proposed algorithm especially performs well in situations where given positive dataset P is insufficient. A comprehensive experiment had proved that our algorithm is preferable to the existing ones.