{"title":"Classification of RSS feed news items using ontology","authors":"Shikha Agarwal, A. Singhal, Punam Bedi","doi":"10.1109/ISDA.2012.6416587","DOIUrl":null,"url":null,"abstract":"Explosive growth of data on the web demand techniques, which would enable the user to access desired information. In Information retrieval Document Classification is prerequisite. In practice many classification techniques were and are in use. Term Frequency-Inverse Document Frequency (TF-IDF) is an approach which represents documents based on the frequency of terms in documents. Limitation of this approach is high dimensionality of data. Moreover it does not consider the relations among the terms, resulting in less precise and noisy end result. In our approach we are using weighted Concept Frequency-Inverse Document Frequency (CF-IDF) with background knowledge of domain Ontology, for classification of RSS feed News Items. Metadata information of news items has been used to assign weight to the identified concepts. No trained classifiers are required as Ontology itself acts as a classifier. We have designed ontology based on news industry standards. This classification approach considers relations among the concepts and properties. It results in reduction of noise in final output. It considers only the key concepts of a domain for classification instead of all the terms, which curbs the problem of dimensionality. Evaluation of experimental results reveals that proposed approach gives better classification results.","PeriodicalId":370150,"journal":{"name":"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDA.2012.6416587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
Explosive growth of data on the web demand techniques, which would enable the user to access desired information. In Information retrieval Document Classification is prerequisite. In practice many classification techniques were and are in use. Term Frequency-Inverse Document Frequency (TF-IDF) is an approach which represents documents based on the frequency of terms in documents. Limitation of this approach is high dimensionality of data. Moreover it does not consider the relations among the terms, resulting in less precise and noisy end result. In our approach we are using weighted Concept Frequency-Inverse Document Frequency (CF-IDF) with background knowledge of domain Ontology, for classification of RSS feed News Items. Metadata information of news items has been used to assign weight to the identified concepts. No trained classifiers are required as Ontology itself acts as a classifier. We have designed ontology based on news industry standards. This classification approach considers relations among the concepts and properties. It results in reduction of noise in final output. It considers only the key concepts of a domain for classification instead of all the terms, which curbs the problem of dimensionality. Evaluation of experimental results reveals that proposed approach gives better classification results.