Approximate search engine optimization for directory service

Proceedings International Parallel and Distributed Processing Symposium Pub Date : 2003-04-22 DOI:10.1109/IPDPS.2003.1213439

Kai-Hsiang Yang, Chi-Chien Pan, Tzao-Lin Lee

{"title":"Approximate search engine optimization for directory service","authors":"Kai-Hsiang Yang, Chi-Chien Pan, Tzao-Lin Lee","doi":"10.1109/IPDPS.2003.1213439","DOIUrl":null,"url":null,"abstract":"Today, in many practical e-commerce systems, the real stored data usually are short strings, such as names, addresses, or other information. Searching data within these short strings is not the same as searching within longer strings. General search engines try their best to scan all long strings (or articles) quickly, and find out the places that match the search conditions. Some great online search algorithms (such as \"agrep\" as used inside glimpse, or \"cgrep \" as used inside compressed indices, or 'NR-grep') are proposed for searching without any indices in the sub-linear time O(n). However, for short strings (n is small), the practical performance of algorithms of O(n) and O(n) are much the same. Therefore, suitable indices are necessary to optimize the performance of the search engine. On the other hand, directory services are more and more important because of its optimization for searching data. The data stored in directory servers are almost short strings. The approximate search engine for directory service must take the properties of short strings into considerations. In our previous research, we have designed one approximate search engine especially for short strings by using filters to filter out the possible short strings, and then checking for the answers. However the performance of the previous search engine needs to be enhanced. In this paper, we propose new architecture and algorithm to optimize the performance of searching for directory service.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings International Parallel and Distributed Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2003.1213439","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Today, in many practical e-commerce systems, the real stored data usually are short strings, such as names, addresses, or other information. Searching data within these short strings is not the same as searching within longer strings. General search engines try their best to scan all long strings (or articles) quickly, and find out the places that match the search conditions. Some great online search algorithms (such as "agrep" as used inside glimpse, or "cgrep " as used inside compressed indices, or 'NR-grep') are proposed for searching without any indices in the sub-linear time O(n). However, for short strings (n is small), the practical performance of algorithms of O(n) and O(n) are much the same. Therefore, suitable indices are necessary to optimize the performance of the search engine. On the other hand, directory services are more and more important because of its optimization for searching data. The data stored in directory servers are almost short strings. The approximate search engine for directory service must take the properties of short strings into considerations. In our previous research, we have designed one approximate search engine especially for short strings by using filters to filter out the possible short strings, and then checking for the answers. However the performance of the previous search engine needs to be enhanced. In this paper, we propose new architecture and algorithm to optimize the performance of searching for directory service.

查看原文本刊更多论文

近似搜索引擎优化目录服务

今天，在许多实际的电子商务系统中，实际存储的数据通常是短字符串，如姓名、地址或其他信息。在这些短字符串中搜索数据与在较长字符串中搜索数据不同。一般的搜索引擎都尽量快速扫描所有的长字符串(或文章)，找出符合搜索条件的地方。提出了一些很棒的在线搜索算法(例如在glance中使用的“agrep”，或在压缩索引中使用的“cgrep”，或“NR-grep”)，用于在次线性时间O(n)内进行没有任何索引的搜索。然而，对于短字符串(n很小)，O(n)和O(n)算法的实际性能大致相同。因此，合适的索引是优化搜索引擎性能的必要条件。另一方面，目录服务由于其对数据搜索的优化而变得越来越重要。存储在目录服务器中的数据几乎都是短字符串。目录服务的近似搜索引擎必须考虑短字符串的特性。在我们之前的研究中，我们设计了一个专门针对短字符串的近似搜索引擎，通过过滤器过滤掉可能的短字符串，然后检查答案。然而，以前的搜索引擎的性能需要加强。在本文中，我们提出了新的架构和算法来优化目录服务的搜索性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings International Parallel and Distributed Processing Symposium

自引率

0.00%

发文量