Approximate search engine optimization for directory service

Kai-Hsiang Yang, Chi-Chien Pan, Tzao-Lin Lee
{"title":"Approximate search engine optimization for directory service","authors":"Kai-Hsiang Yang, Chi-Chien Pan, Tzao-Lin Lee","doi":"10.1109/IPDPS.2003.1213439","DOIUrl":null,"url":null,"abstract":"Today, in many practical e-commerce systems, the real stored data usually are short strings, such as names, addresses, or other information. Searching data within these short strings is not the same as searching within longer strings. General search engines try their best to scan all long strings (or articles) quickly, and find out the places that match the search conditions. Some great online search algorithms (such as \"agrep\" as used inside glimpse, or \"cgrep \" as used inside compressed indices, or 'NR-grep') are proposed for searching without any indices in the sub-linear time O(n). However, for short strings (n is small), the practical performance of algorithms of O(n) and O(n) are much the same. Therefore, suitable indices are necessary to optimize the performance of the search engine. On the other hand, directory services are more and more important because of its optimization for searching data. The data stored in directory servers are almost short strings. The approximate search engine for directory service must take the properties of short strings into considerations. In our previous research, we have designed one approximate search engine especially for short strings by using filters to filter out the possible short strings, and then checking for the answers. However the performance of the previous search engine needs to be enhanced. In this paper, we propose new architecture and algorithm to optimize the performance of searching for directory service.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings International Parallel and Distributed Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2003.1213439","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Today, in many practical e-commerce systems, the real stored data usually are short strings, such as names, addresses, or other information. Searching data within these short strings is not the same as searching within longer strings. General search engines try their best to scan all long strings (or articles) quickly, and find out the places that match the search conditions. Some great online search algorithms (such as "agrep" as used inside glimpse, or "cgrep " as used inside compressed indices, or 'NR-grep') are proposed for searching without any indices in the sub-linear time O(n). However, for short strings (n is small), the practical performance of algorithms of O(n) and O(n) are much the same. Therefore, suitable indices are necessary to optimize the performance of the search engine. On the other hand, directory services are more and more important because of its optimization for searching data. The data stored in directory servers are almost short strings. The approximate search engine for directory service must take the properties of short strings into considerations. In our previous research, we have designed one approximate search engine especially for short strings by using filters to filter out the possible short strings, and then checking for the answers. However the performance of the previous search engine needs to be enhanced. In this paper, we propose new architecture and algorithm to optimize the performance of searching for directory service.
近似搜索引擎优化目录服务
今天,在许多实际的电子商务系统中,实际存储的数据通常是短字符串,如姓名、地址或其他信息。在这些短字符串中搜索数据与在较长字符串中搜索数据不同。一般的搜索引擎都尽量快速扫描所有的长字符串(或文章),找出符合搜索条件的地方。提出了一些很棒的在线搜索算法(例如在glance中使用的“agrep”,或在压缩索引中使用的“cgrep”,或“NR-grep”),用于在次线性时间O(n)内进行没有任何索引的搜索。然而,对于短字符串(n很小),O(n)和O(n)算法的实际性能大致相同。因此,合适的索引是优化搜索引擎性能的必要条件。另一方面,目录服务由于其对数据搜索的优化而变得越来越重要。存储在目录服务器中的数据几乎都是短字符串。目录服务的近似搜索引擎必须考虑短字符串的特性。在我们之前的研究中,我们设计了一个专门针对短字符串的近似搜索引擎,通过过滤器过滤掉可能的短字符串,然后检查答案。然而,以前的搜索引擎的性能需要加强。在本文中,我们提出了新的架构和算法来优化目录服务的搜索性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信