Embedding a Microblog Context in Ephemeral Queries for Document Retrieval

IF 0.7 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Web Engineering Pub Date : 2023-06-01 DOI:10.13052/jwe1540-9589.2245

Shilpa Sethi

{"title":"Embedding a Microblog Context in Ephemeral Queries for Document Retrieval","authors":"Shilpa Sethi","doi":"10.13052/jwe1540-9589.2245","DOIUrl":null,"url":null,"abstract":"With the proliferation of information globally, the search engine had become an indispensable tool that helps the user to search for information in a simple, easy and quick way. These search engines employ sophisticated document ranking algorithms based on query context, link structure and user behavior characterization. However, all these features keep changing in the real scenario. Ideally, ranking algorithms must be robust enough to time-sensitive queries. Microblog content is typically short-lived as it is often intended to provide quick updates or share brief information in a concise manner. The technique first determines if a query is currently in high demand, then it automatically appends a time-sensitive context to the query by mining those microblogs whose torrent matches with query-in-demand. The extracted contextual terms are further used in re-ranking the search results. The experimental results reveal the existence of a strong correlation between ephemeral search queries and microblog volumes. These volumes are analyzed to identify the temporal proximity of their torrents. It is observed that approximately 70% of search torrents occurred one day before or after blog torrents for lower threshold values. When the threshold is increased, the match ratio of torrent is raised to ~90%. In addition, the performance of the proposed model is analyzed for different combining principles namely, aggregate relevance (AR) and disjunctive relevance (DR). It is found that the DR variant of the proposed model outperforms the AR variant of the proposed model in terms of relevance and interest scores. Further, the proposed model's performance is compared with three categories of retrieval models: log-logistic model, sequential dependence model (SDM) and embedding based query expansion model (EQE1). The experimental results reveal the effectiveness of the proposed technique in terms of result relevancy and user satisfaction. There is a significant improvement of ~25% in the result relevance score and ~35% in the user satisfaction score compared to underlying retrieval models. The work can be expanded in many directions in the future as various researchers can combine these strategies to build a recommendation system, auto query reformulation system, Chatbot, and NLP professional toolkit.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"22 4","pages":"679-700"},"PeriodicalIF":0.7000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10301468/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

With the proliferation of information globally, the search engine had become an indispensable tool that helps the user to search for information in a simple, easy and quick way. These search engines employ sophisticated document ranking algorithms based on query context, link structure and user behavior characterization. However, all these features keep changing in the real scenario. Ideally, ranking algorithms must be robust enough to time-sensitive queries. Microblog content is typically short-lived as it is often intended to provide quick updates or share brief information in a concise manner. The technique first determines if a query is currently in high demand, then it automatically appends a time-sensitive context to the query by mining those microblogs whose torrent matches with query-in-demand. The extracted contextual terms are further used in re-ranking the search results. The experimental results reveal the existence of a strong correlation between ephemeral search queries and microblog volumes. These volumes are analyzed to identify the temporal proximity of their torrents. It is observed that approximately 70% of search torrents occurred one day before or after blog torrents for lower threshold values. When the threshold is increased, the match ratio of torrent is raised to ~90%. In addition, the performance of the proposed model is analyzed for different combining principles namely, aggregate relevance (AR) and disjunctive relevance (DR). It is found that the DR variant of the proposed model outperforms the AR variant of the proposed model in terms of relevance and interest scores. Further, the proposed model's performance is compared with three categories of retrieval models: log-logistic model, sequential dependence model (SDM) and embedding based query expansion model (EQE1). The experimental results reveal the effectiveness of the proposed technique in terms of result relevancy and user satisfaction. There is a significant improvement of ~25% in the result relevance score and ~35% in the user satisfaction score compared to underlying retrieval models. The work can be expanded in many directions in the future as various researchers can combine these strategies to build a recommendation system, auto query reformulation system, Chatbot, and NLP professional toolkit.

查看原文本刊更多论文

在文档检索的短暂查询中嵌入微博上下文

随着信息在全球范围内的激增，搜索引擎已成为一种不可或缺的工具，可以帮助用户以简单、简单、快速的方式搜索信息。这些搜索引擎采用了基于查询上下文、链接结构和用户行为特征的复杂文档排名算法。然而，所有这些功能在真实场景中都在不断变化。理想情况下，排名算法必须对时间敏感的查询具有足够的鲁棒性。微博内容通常是短暂的，因为它通常旨在提供快速更新或以简洁的方式共享简短信息。该技术首先确定查询当前的需求量是否很大，然后通过挖掘torrent与需求量查询匹配的微博，自动将时间敏感的上下文添加到查询中。所提取的上下文术语被进一步用于对搜索结果进行重新排序。实验结果表明，短时间搜索量与微博浏览量之间存在很强的相关性。对这些体积进行分析，以确定其洪流在时间上的接近程度。据观察，对于较低阈值，大约70%的搜索洪流发生在博客洪流之前或之后的一天。当阈值增加时，torrent的匹配率提高到~90%。此外，针对不同的组合原则，即聚合相关性（AR）和析取相关性（DR），分析了所提出模型的性能。研究发现，所提出的模型的DR变体在相关性和兴趣得分方面优于所提出模型的AR变体。此外，将所提出的模型与三类检索模型（日志逻辑模型、序列依赖模型（SDM）和基于嵌入的查询扩展模型（EQE1））的性能进行了比较。实验结果表明，该技术在结果相关性和用户满意度方面是有效的。与底层检索模型相比，结果相关性得分和用户满意度得分分别显著提高了约25%和35%。未来，这项工作可以向多个方向扩展，因为各种研究人员可以将这些策略结合起来，构建推荐系统、自动查询重新制定系统、聊天机器人和NLP专业工具包。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Web Engineering 工程技术-计算机：理论方法

CiteScore

1.80

自引率

12.50%

发文量

审稿时长

9 months

期刊介绍： The World Wide Web and its associated technologies have become a major implementation and delivery platform for a large variety of applications, ranging from simple institutional information Web sites to sophisticated supply-chain management systems, financial applications, e-government, distance learning, and entertainment, among others. Such applications, in addition to their intrinsic functionality, also exhibit the more complex behavior of distributed applications.