{"title":"An Adaptive Ontology Based Hierarchical Browsing System for CiteSeerx","authors":"N. Ye, Susan Gauch, Qiang Wang, H. Luong","doi":"10.1109/KSE.2010.32","DOIUrl":null,"url":null,"abstract":"As an indispensable technique in addition to the field of \\emph{Information Retrieval}, \\emph{Ontology based Retrieval System} (or Browsing Hierarchy) has been well studied and developed both in academia and industry. However, most of current systems suffer the following problems: (1) Constructing the mappings between documents and concepts in ontology requires the training of robust hierarchical classifiers, it's difficult to build such classifiers for large-scale documents corpus due to the time-efficiency and precision issues. (2) The traditional Browsing Hierarchical System ignores the distribution of documents over concepts, which is not realistic when a large number of documents distributed biasly on certain concepts. Browsing documents such concepts becomes time-consuming and unpractical for users. Therefore, further splitting these concepts into sub-categories is necessary and critical for organizing documents in the browsing system. Aiming at building the Hierarchical Browsing System more realistically and accurately, we propose an adpative Hierarchical Browsing System framework in this paper, which is designed to build a Browsing Hierarchy for $CiteSeer^x$. In this framework, we first investigate the supervised learning approaches to classify documents into existing predefined concepts of ontology and compare their performance on different datasets of $CiteSeer^x$. Then, we give a empirical analysis of unsupervised learning methods for adding new clusters to the existing browsing hierarchy. Experimental analysis on $CiteSeer^x$ corpus shows the effectiveness and the efficiency of our method.","PeriodicalId":158823,"journal":{"name":"2010 Second International Conference on Knowledge and Systems Engineering","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Second International Conference on Knowledge and Systems Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE.2010.32","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
As an indispensable technique in addition to the field of \emph{Information Retrieval}, \emph{Ontology based Retrieval System} (or Browsing Hierarchy) has been well studied and developed both in academia and industry. However, most of current systems suffer the following problems: (1) Constructing the mappings between documents and concepts in ontology requires the training of robust hierarchical classifiers, it's difficult to build such classifiers for large-scale documents corpus due to the time-efficiency and precision issues. (2) The traditional Browsing Hierarchical System ignores the distribution of documents over concepts, which is not realistic when a large number of documents distributed biasly on certain concepts. Browsing documents such concepts becomes time-consuming and unpractical for users. Therefore, further splitting these concepts into sub-categories is necessary and critical for organizing documents in the browsing system. Aiming at building the Hierarchical Browsing System more realistically and accurately, we propose an adpative Hierarchical Browsing System framework in this paper, which is designed to build a Browsing Hierarchy for $CiteSeer^x$. In this framework, we first investigate the supervised learning approaches to classify documents into existing predefined concepts of ontology and compare their performance on different datasets of $CiteSeer^x$. Then, we give a empirical analysis of unsupervised learning methods for adding new clusters to the existing browsing hierarchy. Experimental analysis on $CiteSeer^x$ corpus shows the effectiveness and the efficiency of our method.