{"title":"A survey of document clustering using semantic approach","authors":"Nagma Y. Saiyad, H. Prajapati, V. Dabhi","doi":"10.1109/ICEEOT.2016.7755154","DOIUrl":null,"url":null,"abstract":"Document clustering is the application of cluster analysis to textual documents. It is commonly used technique in data mining, information retrieval, knowledge discovery from data, pattern recognition, etc. In traditional document clustering, a document is considered as a bag of words; where semantic meaning of word is not taken into consideration. However, to achieve accurate document clustering, feature such as meanings of the words is important. Document clustering can be done using semantic approach because it takes semantic relationship among words into account. This paper highlights the problems in traditional approach as well as semantic approach. This paper identifies four major areas under semantic clustering and presents a survey of 23 papers that are studied, covering major significant work. Moreover, this paper also provides a survey of tools specifically used for text processing, and clustering algorithms, that help in applying and evaluating document clustering. The presented survey is used in preparing the proposed work in the same direction. This proposed work uses the sense of a word for text clustering system. Lexical chains will be used as features that are to be developed using the identity/synonymy relation from WordNet ontology as background knowledge. Later, clustering will be done using the lexical chains.","PeriodicalId":383674,"journal":{"name":"2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEEOT.2016.7755154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
Document clustering is the application of cluster analysis to textual documents. It is commonly used technique in data mining, information retrieval, knowledge discovery from data, pattern recognition, etc. In traditional document clustering, a document is considered as a bag of words; where semantic meaning of word is not taken into consideration. However, to achieve accurate document clustering, feature such as meanings of the words is important. Document clustering can be done using semantic approach because it takes semantic relationship among words into account. This paper highlights the problems in traditional approach as well as semantic approach. This paper identifies four major areas under semantic clustering and presents a survey of 23 papers that are studied, covering major significant work. Moreover, this paper also provides a survey of tools specifically used for text processing, and clustering algorithms, that help in applying and evaluating document clustering. The presented survey is used in preparing the proposed work in the same direction. This proposed work uses the sense of a word for text clustering system. Lexical chains will be used as features that are to be developed using the identity/synonymy relation from WordNet ontology as background knowledge. Later, clustering will be done using the lexical chains.