基于使用和领域知识的语义丰富关键词预取

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Web Engineering Pub Date : 2024-03-01 DOI:10.13052/jwe1540-9589.2332

Sonia Setia;Jyoti;Neelam Duhan;Aman Anand;Nikita Verma

{"title":"基于使用和领域知识的语义丰富关键词预取","authors":"Sonia Setia;Jyoti;Neelam Duhan;Aman Anand;Nikita Verma","doi":"10.13052/jwe1540-9589.2332","DOIUrl":null,"url":null,"abstract":"In intelligent web systems [2], web prefetching [27] plays a crucial role. In order to make accurate predictions for web prefetching, it is important but challenging to uncover valuable information from web use statistics [16]. Using statistics and domain expertise, this study presents a new approach dubbed SPUDK for efficient prefetching. In this paper, it is shown how web access logs can be used efficiently for browsing prediction. Our main focus is on the technique needed to manage the queries found in web access logs so that valuable information can be attained. We further process these access logs using a taxonomy and a thesaurus, WordNet, to find the semantics of queries. SPUDK, a system that organises use data into semantic clusters, is one example of this approach. Our contributions in this paper are as follows: (1) A technique to exploit query keywords from access logs. (2) An approach to enrich queries with semantic information. (3) A new similarity measure for finding similarity among URLs present in access logs. (4) A novel clustering technique to find semantic clusters of URLs. (5) Experimental evaluation of the proposed system. The proposed SPUDK system is evaluated using American Online (AOL) logs, which gives improvement of 39% in precision of prediction, 35% in hit ratio and reduction of 50.6% in latency on average as compared to other prediction techniques in the literature.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"23 3","pages":"341-375"},"PeriodicalIF":1.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10547277","citationCount":"0","resultStr":"{\"title\":\"Semantically Enriched Keyword Prefetching Based on Usage and Domain Knowledge\",\"authors\":\"Sonia Setia;Jyoti;Neelam Duhan;Aman Anand;Nikita Verma\",\"doi\":\"10.13052/jwe1540-9589.2332\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In intelligent web systems [2], web prefetching [27] plays a crucial role. In order to make accurate predictions for web prefetching, it is important but challenging to uncover valuable information from web use statistics [16]. Using statistics and domain expertise, this study presents a new approach dubbed SPUDK for efficient prefetching. In this paper, it is shown how web access logs can be used efficiently for browsing prediction. Our main focus is on the technique needed to manage the queries found in web access logs so that valuable information can be attained. We further process these access logs using a taxonomy and a thesaurus, WordNet, to find the semantics of queries. SPUDK, a system that organises use data into semantic clusters, is one example of this approach. Our contributions in this paper are as follows: (1) A technique to exploit query keywords from access logs. (2) An approach to enrich queries with semantic information. (3) A new similarity measure for finding similarity among URLs present in access logs. (4) A novel clustering technique to find semantic clusters of URLs. (5) Experimental evaluation of the proposed system. The proposed SPUDK system is evaluated using American Online (AOL) logs, which gives improvement of 39% in precision of prediction, 35% in hit ratio and reduction of 50.6% in latency on average as compared to other prediction techniques in the literature.\",\"PeriodicalId\":49952,\"journal\":{\"name\":\"Journal of Web Engineering\",\"volume\":\"23 3\",\"pages\":\"341-375\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10547277\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Web Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10547277/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10547277/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

在智能网络系统[2]中，网络预取[27]起着至关重要的作用。为了对网络预取进行准确预测，从网络使用统计数据中挖掘有价值的信息非常重要，但也极具挑战性[16]。本研究利用统计数据和领域专业知识，提出了一种高效预取的新方法，称为 SPUDK。本文展示了如何有效利用网络访问日志进行浏览预测。我们主要关注管理网络访问日志中的查询所需的技术，以便获得有价值的信息。我们使用分类法和词库 WordNet 进一步处理这些访问日志，以查找查询的语义。SPUDK 就是这种方法的一个例子，它是一个将使用数据组织成语义集群的系统。我们在本文中的贡献如下：(1) 利用访问日志中的查询关键词的技术。(2) 利用语义信息丰富查询的方法。(3) 一种新的相似性测量方法，用于发现访问日志中存在的 URL 之间的相似性。(4) 一种新的聚类技术，用于查找 URL 的语义聚类。(5) 拟议系统的实验评估。使用美国在线（AOL）日志对所提出的 SPUDK 系统进行了评估，与文献中的其他预测技术相比，该系统的预测精度提高了 39%，命中率提高了 35%，平均延迟时间缩短了 50.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semantically Enriched Keyword Prefetching Based on Usage and Domain Knowledge

In intelligent web systems [2], web prefetching [27] plays a crucial role. In order to make accurate predictions for web prefetching, it is important but challenging to uncover valuable information from web use statistics [16]. Using statistics and domain expertise, this study presents a new approach dubbed SPUDK for efficient prefetching. In this paper, it is shown how web access logs can be used efficiently for browsing prediction. Our main focus is on the technique needed to manage the queries found in web access logs so that valuable information can be attained. We further process these access logs using a taxonomy and a thesaurus, WordNet, to find the semantics of queries. SPUDK, a system that organises use data into semantic clusters, is one example of this approach. Our contributions in this paper are as follows: (1) A technique to exploit query keywords from access logs. (2) An approach to enrich queries with semantic information. (3) A new similarity measure for finding similarity among URLs present in access logs. (4) A novel clustering technique to find semantic clusters of URLs. (5) Experimental evaluation of the proposed system. The proposed SPUDK system is evaluated using American Online (AOL) logs, which gives improvement of 39% in precision of prediction, 35% in hit ratio and reduction of 50.6% in latency on average as compared to other prediction techniques in the literature.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Web Engineering 工程技术-计算机：理论方法

CiteScore

1.80

自引率

12.50%

发文量

审稿时长

9 months

期刊介绍： The World Wide Web and its associated technologies have become a major implementation and delivery platform for a large variety of applications, ranging from simple institutional information Web sites to sophisticated supply-chain management systems, financial applications, e-government, distance learning, and entertainment, among others. Such applications, in addition to their intrinsic functionality, also exhibit the more complex behavior of distributed applications.