Nurul Syeilla Syazhween Binti Zulkefli, N. A. Abdul Rahman, Mazidah Puteh, Zainab Binti Abu Bakar
{"title":"Effectiveness of Latent Dirichlet Allocation Model for Semantic Information Retrieval on Malay Document","authors":"Nurul Syeilla Syazhween Binti Zulkefli, N. A. Abdul Rahman, Mazidah Puteh, Zainab Binti Abu Bakar","doi":"10.1109/INFRKM.2018.8464782","DOIUrl":null,"url":null,"abstract":"Current research usually adopts the standard process of Vector Space Model (VSM) in searching and retrieving information on Malay documents. However, this technique is less effective for semantic information retrieval from the collection. The system will only retrieve documents which contain the user's query terms and ignore semantic information among those terms. Therefore, several documents that have similar context are ignored and several document context that share a single term are retrieved. Due to this problem, Latent Dirichlet Allocation (LDA) model is applied for semantic information retrieval on Malay documents. An experiment was illustrated based on 6 queries text and 50 Hadith documents translated in Malay language, composed of Shahih Bukhari collections. Experimental results proved that the LDA model gives promising results in retrieving semantic information in Malay translated Hadith documents compare to existing techniques. Some limitation from this study can be explored for future work in order to improve the effectiveness of the retrieval results. Overall, LDA is an effective method for semantic information retrieval on Malay document, thus, it can help people to easily search and retrieve semantic information from Malay documents.","PeriodicalId":196731,"journal":{"name":"2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP)","volume":"254 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFRKM.2018.8464782","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Current research usually adopts the standard process of Vector Space Model (VSM) in searching and retrieving information on Malay documents. However, this technique is less effective for semantic information retrieval from the collection. The system will only retrieve documents which contain the user's query terms and ignore semantic information among those terms. Therefore, several documents that have similar context are ignored and several document context that share a single term are retrieved. Due to this problem, Latent Dirichlet Allocation (LDA) model is applied for semantic information retrieval on Malay documents. An experiment was illustrated based on 6 queries text and 50 Hadith documents translated in Malay language, composed of Shahih Bukhari collections. Experimental results proved that the LDA model gives promising results in retrieving semantic information in Malay translated Hadith documents compare to existing techniques. Some limitation from this study can be explored for future work in order to improve the effectiveness of the retrieval results. Overall, LDA is an effective method for semantic information retrieval on Malay document, thus, it can help people to easily search and retrieve semantic information from Malay documents.