R. Ramya, T. Ganeshsingh, D. Sejal, K. Venugopal, S. S. Iyengar, L. Patnaik
{"title":"DRDLC:利用潜在狄利克雷分配和余弦相似度发现相关文档","authors":"R. Ramya, T. Ganeshsingh, D. Sejal, K. Venugopal, S. S. Iyengar, L. Patnaik","doi":"10.1145/3301326.3301342","DOIUrl":null,"url":null,"abstract":"In recent years, the availability of digital documents over web is increased drastically and there is a need for effective methods to retrieve and organize the digital documents. Since data is dispersed globally and is unorganized, it is a challenging task to develop an effective methods that can generate high quality features in these documents. It is necessary to reduce the gap between users search intention and the retrieved results known as semantic gap. In this paper, Discovering Relevant Documents using Latent Dirichlet Allocation and Cosine Similarity (DRDLC) is proposed. Word similarity is computed using CS Cosine Similarity present in search results documents. LDA is applied on extracted patterns and documents. Hashing is used to extract high relevant documents efficiently. Further, term synonyms are identified using word net and the documents are re-ranked. Experiments using the model Relevance Feature Discovery (RFD) on Reuters Corpus Volume-1 (RCV-1) show that the proposed DRDLC framework results in improved performance by providing more relevant documents to the user input query.","PeriodicalId":294040,"journal":{"name":"Proceedings of the 2018 VII International Conference on Network, Communication and Computing","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"DRDLC: Discovering Relevant Documents Using Latent Dirichlet Allocation and Cosine Similarity\",\"authors\":\"R. Ramya, T. Ganeshsingh, D. Sejal, K. Venugopal, S. S. Iyengar, L. Patnaik\",\"doi\":\"10.1145/3301326.3301342\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the availability of digital documents over web is increased drastically and there is a need for effective methods to retrieve and organize the digital documents. Since data is dispersed globally and is unorganized, it is a challenging task to develop an effective methods that can generate high quality features in these documents. It is necessary to reduce the gap between users search intention and the retrieved results known as semantic gap. In this paper, Discovering Relevant Documents using Latent Dirichlet Allocation and Cosine Similarity (DRDLC) is proposed. Word similarity is computed using CS Cosine Similarity present in search results documents. LDA is applied on extracted patterns and documents. Hashing is used to extract high relevant documents efficiently. Further, term synonyms are identified using word net and the documents are re-ranked. Experiments using the model Relevance Feature Discovery (RFD) on Reuters Corpus Volume-1 (RCV-1) show that the proposed DRDLC framework results in improved performance by providing more relevant documents to the user input query.\",\"PeriodicalId\":294040,\"journal\":{\"name\":\"Proceedings of the 2018 VII International Conference on Network, Communication and Computing\",\"volume\":\"85 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 VII International Conference on Network, Communication and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3301326.3301342\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 VII International Conference on Network, Communication and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3301326.3301342","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DRDLC: Discovering Relevant Documents Using Latent Dirichlet Allocation and Cosine Similarity
In recent years, the availability of digital documents over web is increased drastically and there is a need for effective methods to retrieve and organize the digital documents. Since data is dispersed globally and is unorganized, it is a challenging task to develop an effective methods that can generate high quality features in these documents. It is necessary to reduce the gap between users search intention and the retrieved results known as semantic gap. In this paper, Discovering Relevant Documents using Latent Dirichlet Allocation and Cosine Similarity (DRDLC) is proposed. Word similarity is computed using CS Cosine Similarity present in search results documents. LDA is applied on extracted patterns and documents. Hashing is used to extract high relevant documents efficiently. Further, term synonyms are identified using word net and the documents are re-ranked. Experiments using the model Relevance Feature Discovery (RFD) on Reuters Corpus Volume-1 (RCV-1) show that the proposed DRDLC framework results in improved performance by providing more relevant documents to the user input query.