{"title":"Applying machine learning to subject classification and subject description for information retrieval","authors":"S. Cunningham, Brent Summers","doi":"10.1109/ANNES.1995.499481","DOIUrl":null,"url":null,"abstract":"This paper describes an experiment in applying a standard supervised machine learning algorithm (C4.5) to the problem of developing subject classification rules for documents. This algorithm is found to produce surprisingly concise models of document classifications. While the models are highly accurate on the training sets, evaluation over test sets or through cross-validation shows a significant decrease in classification accuracy. Given the difficult nature of the experimental task, however, the results of this investigation are promising and merit further study. An additional algorithm, 1R, is shown to be highly effective in generating lists of candidate terms for subject descriptions.","PeriodicalId":123427,"journal":{"name":"Proceedings 1995 Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 1995 Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ANNES.1995.499481","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
This paper describes an experiment in applying a standard supervised machine learning algorithm (C4.5) to the problem of developing subject classification rules for documents. This algorithm is found to produce surprisingly concise models of document classifications. While the models are highly accurate on the training sets, evaluation over test sets or through cross-validation shows a significant decrease in classification accuracy. Given the difficult nature of the experimental task, however, the results of this investigation are promising and merit further study. An additional algorithm, 1R, is shown to be highly effective in generating lists of candidate terms for subject descriptions.