{"title":"Application Research on Latent Semantic Analysis for Information Retrieval","authors":"C. Wenli","doi":"10.1109/ICMTMA.2016.37","DOIUrl":null,"url":null,"abstract":"The basic principle of Classic traditional information retrieval model is the machine matching of the key word, namely retrieval based on keywords. This paper proposes a pre-clustering-based latent semantic analysis algorithm for document retrieval. The algorithm can solve the problem of time consuming computation of the similarity between the query vector and each text vector in the traditional latent semantic algorithm for document retrieval. It first clusters the documents using k-means clustering based on the latent semantic analysis, finds out the central point of each cluster, and then calculates the similarity between the query vector and each cluster's central points for retrieval. In view of the characteristics of document retrieval, it proposes a new method for calculating the feature weights and adopts the method of pre-clustering to preprocess document collection. The results of the experiment show that the new algorithm can reduce the search time, and improve the retrieval efficiency.","PeriodicalId":318523,"journal":{"name":"2016 Eighth International Conference on Measuring Technology and Mechatronics Automation (ICMTMA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Eighth International Conference on Measuring Technology and Mechatronics Automation (ICMTMA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMTMA.2016.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
The basic principle of Classic traditional information retrieval model is the machine matching of the key word, namely retrieval based on keywords. This paper proposes a pre-clustering-based latent semantic analysis algorithm for document retrieval. The algorithm can solve the problem of time consuming computation of the similarity between the query vector and each text vector in the traditional latent semantic algorithm for document retrieval. It first clusters the documents using k-means clustering based on the latent semantic analysis, finds out the central point of each cluster, and then calculates the similarity between the query vector and each cluster's central points for retrieval. In view of the characteristics of document retrieval, it proposes a new method for calculating the feature weights and adopts the method of pre-clustering to preprocess document collection. The results of the experiment show that the new algorithm can reduce the search time, and improve the retrieval efficiency.