{"title":"使用高精度和高召回模型的自动文本分类系统","authors":"Dai Li, Y. Murphey","doi":"10.1109/CIDM.2014.7008692","DOIUrl":null,"url":null,"abstract":"This paper presents an automatic text document categorization system, HPHR. HPHR contains high precision, high recall and noise-filtered text categorization models. The text categorization models are generated through a suite of machine learning algorithms, a fast clustering algorithm that efficiently and effectively group documents into subcategories, and a text category generation algorithm that automatically generates text subcategories that represent high precision, high recall and noise-filtered text categorization models from a given set of training documents. The HPHR system was evaluated on documents drawn from two different applications, vehicle fault diagnostic documents, which are in a form of unstructured and verbatim text descriptions, and Reuters corpus. The performance of the proposed system, HPHR, on both document collections showed superiority over the systems commonly used in text document categorization.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Automatic text categorization using a system of high-precision and high-recall models\",\"authors\":\"Dai Li, Y. Murphey\",\"doi\":\"10.1109/CIDM.2014.7008692\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents an automatic text document categorization system, HPHR. HPHR contains high precision, high recall and noise-filtered text categorization models. The text categorization models are generated through a suite of machine learning algorithms, a fast clustering algorithm that efficiently and effectively group documents into subcategories, and a text category generation algorithm that automatically generates text subcategories that represent high precision, high recall and noise-filtered text categorization models from a given set of training documents. The HPHR system was evaluated on documents drawn from two different applications, vehicle fault diagnostic documents, which are in a form of unstructured and verbatim text descriptions, and Reuters corpus. The performance of the proposed system, HPHR, on both document collections showed superiority over the systems commonly used in text document categorization.\",\"PeriodicalId\":117542,\"journal\":{\"name\":\"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIDM.2014.7008692\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIDM.2014.7008692","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic text categorization using a system of high-precision and high-recall models
This paper presents an automatic text document categorization system, HPHR. HPHR contains high precision, high recall and noise-filtered text categorization models. The text categorization models are generated through a suite of machine learning algorithms, a fast clustering algorithm that efficiently and effectively group documents into subcategories, and a text category generation algorithm that automatically generates text subcategories that represent high precision, high recall and noise-filtered text categorization models from a given set of training documents. The HPHR system was evaluated on documents drawn from two different applications, vehicle fault diagnostic documents, which are in a form of unstructured and verbatim text descriptions, and Reuters corpus. The performance of the proposed system, HPHR, on both document collections showed superiority over the systems commonly used in text document categorization.