{"title":"Boolean interpretation, matching, and ranking of natural language queries in product selection systems","authors":"Matthew Moulton, Yiu-Kai Ng","doi":"10.1007/s10791-024-09432-x","DOIUrl":null,"url":null,"abstract":"<h3>Abstract</h3> <p>E-commerce is a massive sector in the US economy, generating $767.7 billion in revenue in 2021. E-commerce sites maximize their revenue by helping customers find, examine, and purchase products. To help users easily find the most relevant products in the database for their individual needs, e-commerce sites are equipped with a product retrieval system. Many of these modern retrieval systems parse user-specified constraints or keywords embedded in a simple natural language query, which is generally easier and faster for the customer to specify their needs than navigating a product specification form, and does not require the seller to design or develop such a form. These natural language product retrieval systems, however, suffer from <em>low</em> relevance in retrieved products, especially for <em>complex</em> constraints specified on products. The reduced accuracy is in part due to under-utilizing the rich semantics of natural language, specifically queries that include Boolean operators, and lacking of the ranking on partially-matched relevant results that could be of interest to the customers. This undesirable effect costs e-commerce vendors to lose sales on their merchandise. In solving this problem, we propose a novel product retrieval system, called <span> <span>\\({\\textit{QuePR}}\\)</span> </span>, that parses arbitrarily simple and complex natural language queries with(out) Boolean operators, utilizes combinatorial numeric and content-based matching to extract relevant products from a database, and ranks retrieved resultant products by relevance before presenting them to the end-user. The advantages of <span> <span>\\({\\textit{QuePR}}\\)</span> </span> are its ability to process explicit and implicit Boolean operators in queries, handle natural language queries using similarity measures on partially-matched records, and perform best guess or match on ambiguous or incomplete queries. <span> <span>\\({\\textit{QuePR}}\\)</span> </span> is unique, easy to use, and scalable to all product categories. To verify the accuracy of <span> <span>\\({\\textit{QuePR}}\\)</span> </span> in retrieving relevant products on different product domains, we have conducted different performance analyses and compared <span> <span>\\({\\textit{QuePR}}\\)</span> </span> with other ranking and retrieval systems. The empirical results verify that <span> <span>\\({\\textit{QuePR}}\\)</span> </span> outperforms others while maintaining an optimal runtime speed.</p>","PeriodicalId":54352,"journal":{"name":"Information Retrieval Journal","volume":"7 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Retrieval Journal","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10791-024-09432-x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
E-commerce is a massive sector in the US economy, generating $767.7 billion in revenue in 2021. E-commerce sites maximize their revenue by helping customers find, examine, and purchase products. To help users easily find the most relevant products in the database for their individual needs, e-commerce sites are equipped with a product retrieval system. Many of these modern retrieval systems parse user-specified constraints or keywords embedded in a simple natural language query, which is generally easier and faster for the customer to specify their needs than navigating a product specification form, and does not require the seller to design or develop such a form. These natural language product retrieval systems, however, suffer from low relevance in retrieved products, especially for complex constraints specified on products. The reduced accuracy is in part due to under-utilizing the rich semantics of natural language, specifically queries that include Boolean operators, and lacking of the ranking on partially-matched relevant results that could be of interest to the customers. This undesirable effect costs e-commerce vendors to lose sales on their merchandise. In solving this problem, we propose a novel product retrieval system, called \({\textit{QuePR}}\), that parses arbitrarily simple and complex natural language queries with(out) Boolean operators, utilizes combinatorial numeric and content-based matching to extract relevant products from a database, and ranks retrieved resultant products by relevance before presenting them to the end-user. The advantages of \({\textit{QuePR}}\) are its ability to process explicit and implicit Boolean operators in queries, handle natural language queries using similarity measures on partially-matched records, and perform best guess or match on ambiguous or incomplete queries. \({\textit{QuePR}}\) is unique, easy to use, and scalable to all product categories. To verify the accuracy of \({\textit{QuePR}}\) in retrieving relevant products on different product domains, we have conducted different performance analyses and compared \({\textit{QuePR}}\) with other ranking and retrieval systems. The empirical results verify that \({\textit{QuePR}}\) outperforms others while maintaining an optimal runtime speed.
期刊介绍:
The journal provides an international forum for the publication of theory, algorithms, analysis and experiments across the broad area of information retrieval. Topics of interest include search, indexing, analysis, and evaluation for applications such as the web, social and streaming media, recommender systems, and text archives. This includes research on human factors in search, bridging artificial intelligence and information retrieval, and domain-specific search applications.