{"title":"A Novel Document and Query Similarity Indexing using VSM for Unstructured Documents","authors":"Reshma Pk, S. Rajagopal, L. V. L.","doi":"10.1109/ICACCS48705.2020.9074255","DOIUrl":null,"url":null,"abstract":"Most of the Natural Language applications deal with the automatic detection of semantic text similarity between documents. In this paper, unstructured documents and queries are used for information retrieval. Hence to retrieve the most similar document for the given user query, the documents are retrieved in the order of similarity ranking. In the Vector Space Model, the text corresponding to documents and queries are converted into a numeric vector. Definition and number of dimensions are the critical aspects of VSM. The objective of this paper is to find out the most similar document from the set for the given user query. Different representations of the Vector Space Model is described in detail and the various similarity measures are calculated.","PeriodicalId":439003,"journal":{"name":"2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACCS48705.2020.9074255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Most of the Natural Language applications deal with the automatic detection of semantic text similarity between documents. In this paper, unstructured documents and queries are used for information retrieval. Hence to retrieve the most similar document for the given user query, the documents are retrieved in the order of similarity ranking. In the Vector Space Model, the text corresponding to documents and queries are converted into a numeric vector. Definition and number of dimensions are the critical aspects of VSM. The objective of this paper is to find out the most similar document from the set for the given user query. Different representations of the Vector Space Model is described in detail and the various similarity measures are calculated.