{"title":"Statistical properties of a class of randomized binary search algorithms","authors":"Ye Xia","doi":"10.1016/j.peva.2025.102478","DOIUrl":null,"url":null,"abstract":"<div><div>In this paper, we analyze the statistical properties of a randomized binary search algorithm and its variants. These algorithms have applications in caching and load balancing in distributed environments such as peer-to-peer networks, cloud storage, data centers, and content distribution networks. The basic discrete version of the problem is as follows. Suppose there are <span><math><mi>m</mi></math></span> servers, numbered 1, 2, …, <span><math><mi>m</mi></math></span>, out of which the first <span><math><mi>k</mi></math></span> servers are marked as special, where <span><math><mi>k</mi></math></span> is unknown. These <span><math><mi>k</mi></math></span> servers may contain a particular file or service that clients want. The objective is to select one of the marked servers uniformly at random. Considering the intended applications, we impose the constraint that there is no central controller to facilitate the selection process. We start with a basic algorithm: In each step, the client requesting the service chooses a number <span><math><mi>y</mi></math></span> uniformly at random from <span><math><mrow><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mo>…</mo><mo>,</mo><mi>x</mi></mrow></math></span>, where <span><math><mi>x</mi></math></span> is the number chosen in the previous step, initially set to <span><math><mi>m</mi></math></span> in the first step. A query is then sent to server <span><math><mi>y</mi></math></span> asking whether <span><math><mi>y</mi></math></span> is marked. If the answer is yes, the algorithm returns <span><math><mi>y</mi></math></span>; otherwise, the process is repeated with <span><math><mrow><mi>x</mi><mo>←</mo><mi>y</mi></mrow></math></span>. In this paper, we primarily consider two batch versions of this algorithm in which multiple numbers are chosen in each step and multiple queries are made in parallel. We derive the mean and variance (exact and/or asymptotic) for the number of search steps in each version of the algorithm, and when possible, we give its distribution. Additionally, we analyze the access pattern of queries across the entire search space.</div></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"168 ","pages":"Article 102478"},"PeriodicalIF":1.0000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Performance Evaluation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0166531625000124","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we analyze the statistical properties of a randomized binary search algorithm and its variants. These algorithms have applications in caching and load balancing in distributed environments such as peer-to-peer networks, cloud storage, data centers, and content distribution networks. The basic discrete version of the problem is as follows. Suppose there are servers, numbered 1, 2, …, , out of which the first servers are marked as special, where is unknown. These servers may contain a particular file or service that clients want. The objective is to select one of the marked servers uniformly at random. Considering the intended applications, we impose the constraint that there is no central controller to facilitate the selection process. We start with a basic algorithm: In each step, the client requesting the service chooses a number uniformly at random from , where is the number chosen in the previous step, initially set to in the first step. A query is then sent to server asking whether is marked. If the answer is yes, the algorithm returns ; otherwise, the process is repeated with . In this paper, we primarily consider two batch versions of this algorithm in which multiple numbers are chosen in each step and multiple queries are made in parallel. We derive the mean and variance (exact and/or asymptotic) for the number of search steps in each version of the algorithm, and when possible, we give its distribution. Additionally, we analyze the access pattern of queries across the entire search space.
期刊介绍:
Performance Evaluation functions as a leading journal in the area of modeling, measurement, and evaluation of performance aspects of computing and communication systems. As such, it aims to present a balanced and complete view of the entire Performance Evaluation profession. Hence, the journal is interested in papers that focus on one or more of the following dimensions:
-Define new performance evaluation tools, including measurement and monitoring tools as well as modeling and analytic techniques
-Provide new insights into the performance of computing and communication systems
-Introduce new application areas where performance evaluation tools can play an important role and creative new uses for performance evaluation tools.
More specifically, common application areas of interest include the performance of:
-Resource allocation and control methods and algorithms (e.g. routing and flow control in networks, bandwidth allocation, processor scheduling, memory management)
-System architecture, design and implementation
-Cognitive radio
-VANETs
-Social networks and media
-Energy efficient ICT
-Energy harvesting
-Data centers
-Data centric networks
-System reliability
-System tuning and capacity planning
-Wireless and sensor networks
-Autonomic and self-organizing systems
-Embedded systems
-Network science