{"title":"Concept Based Search Using LSI and Automatic Keyphrase Extraction","authors":"R. Rodrigues, K. Asnani","doi":"10.1109/ICETET.2010.100","DOIUrl":null,"url":null,"abstract":"Classic information retrieval model might lead to poor retrieval due to unrelated documents that might be included in the answer set or missed relevant documents that do not contain at least one index term. Retrieval based on index terms is vague and noisy. The user information need is more related to concepts and ideas than to index terms. Latent Semantic Indexing (LSI) model is a concept-based retrieval method which overcomes many of the problems evident in today's popular word-based retrieval systems. Most retrieval systems match words in the user’s queries with words in the text of documents in the corpus, whereas LSI model performs the match based on the concepts. In order to perform concept mapping, Singular Value Decomposition (SVD) is used. Also key phrases are an important means of document summarization, clustering and topic search. Key phrases give high level description of document contents that indeed makes it easy for perspective readers to decide whether or not it is relevant to them. In this paper, we first develop an automatic key phrase extraction model for extracting key phrases from documents and then use these key phrases as a corpus on which conceptual search will be performed using LSI.","PeriodicalId":175615,"journal":{"name":"2010 3rd International Conference on Emerging Trends in Engineering and Technology","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 3rd International Conference on Emerging Trends in Engineering and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICETET.2010.100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Classic information retrieval model might lead to poor retrieval due to unrelated documents that might be included in the answer set or missed relevant documents that do not contain at least one index term. Retrieval based on index terms is vague and noisy. The user information need is more related to concepts and ideas than to index terms. Latent Semantic Indexing (LSI) model is a concept-based retrieval method which overcomes many of the problems evident in today's popular word-based retrieval systems. Most retrieval systems match words in the user’s queries with words in the text of documents in the corpus, whereas LSI model performs the match based on the concepts. In order to perform concept mapping, Singular Value Decomposition (SVD) is used. Also key phrases are an important means of document summarization, clustering and topic search. Key phrases give high level description of document contents that indeed makes it easy for perspective readers to decide whether or not it is relevant to them. In this paper, we first develop an automatic key phrase extraction model for extracting key phrases from documents and then use these key phrases as a corpus on which conceptual search will be performed using LSI.