{"title":"基于LDA主题模型和稀疏表示分类器的查询分类","authors":"Indrani Bhattacharya, J. Sil","doi":"10.1145/2888451.2888474","DOIUrl":null,"url":null,"abstract":"Users often seek for information by submitting query consisting of keywords may belong to multiple topics, representing overlapping concepts. Objective of the work is to classify the query into a topic class label by considering the query keywords distributed over various topics. The approach effectively reduces the search space in order to retrieve information computationally efficient way. First we apply Latent Dirichlet Allocation (LDA) on the entire corpus to group the documents into topics consisting of unique words. As a next step, a term vocabulary (TRV) has been built with unique words present in the topics. We develop a Topic-Vocabulary Matrix (TVM) by encoding the TRV with respect to each topic. The TVM expresses word distribution among the topics and presented as training data set, which is sparse. The query is encoded by the same way and submitted as test data. We apply sparse representation based classifier (SRC) to classify the query as a topic. The proposed approach shows satisfactory performance with 93% accuracy in classifying query.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Query Classification using LDA Topic Model and Sparse Representation Based Classifier\",\"authors\":\"Indrani Bhattacharya, J. Sil\",\"doi\":\"10.1145/2888451.2888474\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Users often seek for information by submitting query consisting of keywords may belong to multiple topics, representing overlapping concepts. Objective of the work is to classify the query into a topic class label by considering the query keywords distributed over various topics. The approach effectively reduces the search space in order to retrieve information computationally efficient way. First we apply Latent Dirichlet Allocation (LDA) on the entire corpus to group the documents into topics consisting of unique words. As a next step, a term vocabulary (TRV) has been built with unique words present in the topics. We develop a Topic-Vocabulary Matrix (TVM) by encoding the TRV with respect to each topic. The TVM expresses word distribution among the topics and presented as training data set, which is sparse. The query is encoded by the same way and submitted as test data. We apply sparse representation based classifier (SRC) to classify the query as a topic. The proposed approach shows satisfactory performance with 93% accuracy in classifying query.\",\"PeriodicalId\":136431,\"journal\":{\"name\":\"Proceedings of the 3rd IKDD Conference on Data Science, 2016\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd IKDD Conference on Data Science, 2016\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2888451.2888474\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2888451.2888474","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Query Classification using LDA Topic Model and Sparse Representation Based Classifier
Users often seek for information by submitting query consisting of keywords may belong to multiple topics, representing overlapping concepts. Objective of the work is to classify the query into a topic class label by considering the query keywords distributed over various topics. The approach effectively reduces the search space in order to retrieve information computationally efficient way. First we apply Latent Dirichlet Allocation (LDA) on the entire corpus to group the documents into topics consisting of unique words. As a next step, a term vocabulary (TRV) has been built with unique words present in the topics. We develop a Topic-Vocabulary Matrix (TVM) by encoding the TRV with respect to each topic. The TVM expresses word distribution among the topics and presented as training data set, which is sparse. The query is encoded by the same way and submitted as test data. We apply sparse representation based classifier (SRC) to classify the query as a topic. The proposed approach shows satisfactory performance with 93% accuracy in classifying query.