Boolean interpretation, matching, and ranking of natural language queries in product selection systems

IF 1.9 3区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Retrieval Journal Pub Date : 2024-04-03 DOI:10.1007/s10791-024-09432-x

Matthew Moulton, Yiu-Kai Ng

{"title":"Boolean interpretation, matching, and ranking of natural language queries in product selection systems","authors":"Matthew Moulton, Yiu-Kai Ng","doi":"10.1007/s10791-024-09432-x","DOIUrl":null,"url":null,"abstract":"<h3>Abstract</h3> E-commerce is a massive sector in the US economy, generating $767.7 billion in revenue in 2021. E-commerce sites maximize their revenue by helping customers find, examine, and purchase products. To help users easily find the most relevant products in the database for their individual needs, e-commerce sites are equipped with a product retrieval system. Many of these modern retrieval systems parse user-specified constraints or keywords embedded in a simple natural language query, which is generally easier and faster for the customer to specify their needs than navigating a product specification form, and does not require the seller to design or develop such a form. These natural language product retrieval systems, however, suffer from low relevance in retrieved products, especially for complex constraints specified on products. The reduced accuracy is in part due to under-utilizing the rich semantics of natural language, specifically queries that include Boolean operators, and lacking of the ranking on partially-matched relevant results that could be of interest to the customers. This undesirable effect costs e-commerce vendors to lose sales on their merchandise. In solving this problem, we propose a novel product retrieval system, called \${\\textit{QuePR}}\$ , that parses arbitrarily simple and complex natural language queries with(out) Boolean operators, utilizes combinatorial numeric and content-based matching to extract relevant products from a database, and ranks retrieved resultant products by relevance before presenting them to the end-user. The advantages of \${\\textit{QuePR}}\$ are its ability to process explicit and implicit Boolean operators in queries, handle natural language queries using similarity measures on partially-matched records, and perform best guess or match on ambiguous or incomplete queries. \${\\textit{QuePR}}\$ is unique, easy to use, and scalable to all product categories. To verify the accuracy of \${\\textit{QuePR}}\$ in retrieving relevant products on different product domains, we have conducted different performance analyses and compared \${\\textit{QuePR}}\$ with other ranking and retrieval systems. The empirical results verify that \${\\textit{QuePR}}\$ outperforms others while maintaining an optimal runtime speed.","PeriodicalId":54352,"journal":{"name":"Information Retrieval Journal","volume":"7 1","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Retrieval Journal","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10791-024-09432-x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

E-commerce is a massive sector in the US economy, generating $767.7 billion in revenue in 2021. E-commerce sites maximize their revenue by helping customers find, examine, and purchase products. To help users easily find the most relevant products in the database for their individual needs, e-commerce sites are equipped with a product retrieval system. Many of these modern retrieval systems parse user-specified constraints or keywords embedded in a simple natural language query, which is generally easier and faster for the customer to specify their needs than navigating a product specification form, and does not require the seller to design or develop such a form. These natural language product retrieval systems, however, suffer from low relevance in retrieved products, especially for complex constraints specified on products. The reduced accuracy is in part due to under-utilizing the rich semantics of natural language, specifically queries that include Boolean operators, and lacking of the ranking on partially-matched relevant results that could be of interest to the customers. This undesirable effect costs e-commerce vendors to lose sales on their merchandise. In solving this problem, we propose a novel product retrieval system, called ${\textit{QuePR}}$ , that parses arbitrarily simple and complex natural language queries with(out) Boolean operators, utilizes combinatorial numeric and content-based matching to extract relevant products from a database, and ranks retrieved resultant products by relevance before presenting them to the end-user. The advantages of ${\textit{QuePR}}$ are its ability to process explicit and implicit Boolean operators in queries, handle natural language queries using similarity measures on partially-matched records, and perform best guess or match on ambiguous or incomplete queries. ${\textit{QuePR}}$ is unique, easy to use, and scalable to all product categories. To verify the accuracy of ${\textit{QuePR}}$ in retrieving relevant products on different product domains, we have conducted different performance analyses and compared ${\textit{QuePR}}$ with other ranking and retrieval systems. The empirical results verify that ${\textit{QuePR}}$ outperforms others while maintaining an optimal runtime speed.

查看原文本刊更多论文

产品选择系统中自然语言查询的布尔解释、匹配和排序

摘要电子商务是美国经济中的一个庞大行业，2021 年将创造 7 677 亿美元的收入。电子商务网站通过帮助客户查找、检查和购买产品来实现收入最大化。为了帮助用户在数据库中轻松找到与其个人需求最相关的产品，电子商务网站配备了产品检索系统。许多现代检索系统都能解析用户指定的限制条件或嵌入在简单自然语言查询中的关键字，这通常比浏览产品说明表单更方便快捷，也不需要卖方设计或开发这样的表单。然而，这些自然语言产品检索系统存在检索产品相关性低的问题，特别是对产品指定的复杂限制条件。准确性降低的部分原因是没有充分利用自然语言的丰富语义，特别是包含布尔运算符的查询，以及缺乏对客户可能感兴趣的部分匹配相关结果的排序。这种不良后果导致电子商务供应商的商品销售损失。为了解决这个问题，我们提出了一种新颖的产品检索系统，称为（{\textit{QuePR}}\），它可以解析任意简单和复杂的带有布尔运算符的自然语言查询，利用组合式数字匹配和基于内容的匹配从数据库中提取相关产品，并在向最终用户展示之前根据相关性对检索结果的产品进行排名。${\textit{QuePR}}$的优势在于它能够处理查询中的显式和隐式布尔运算符，在部分匹配的记录上使用相似性度量处理自然语言查询，并在模棱两可或不完整的查询上执行最佳猜测或匹配。 ${\textit{QuePR}}$是独一无二的，易于使用，并且可以扩展到所有产品类别。为了验证${\textit{QuePR}}$在不同产品领域检索相关产品的准确性，我们进行了不同的性能分析，并将${\textit{QuePR}}$与其他排名和检索系统进行了比较。实证结果验证了${\textit{QuePR}}$的性能优于其他系统，同时还保持了最佳的运行速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Retrieval Journal 工程技术-计算机：信息系统

CiteScore

6.20

自引率

0.00%

发文量

审稿时长

13.5 months

期刊介绍： The journal provides an international forum for the publication of theory, algorithms, analysis and experiments across the broad area of information retrieval. Topics of interest include search, indexing, analysis, and evaluation for applications such as the web, social and streaming media, recommender systems, and text archives. This includes research on human factors in search, bridging artificial intelligence and information retrieval, and domain-specific search applications.