Efficient Vector Store System for Python using Shared Memory

Proceedings of the Second International Conference on AI-ML Systems Pub Date : 2022-10-12 DOI:10.1145/3564121.3564799

Dhruv Patel, S. Pandey, Abhishek Sharma

{"title":"Efficient Vector Store System for Python using Shared Memory","authors":"Dhruv Patel, S. Pandey, Abhishek Sharma","doi":"10.1145/3564121.3564799","DOIUrl":null,"url":null,"abstract":"Many e-commerce companies use machine learning to make customer experience better. Even within a single company, there will be generally many independent services running, each specializing in some aspect of customer experience. Since machine learning models work on abstract vectors representing users and/or items, each such service needs a way to store these vectors. A common approach nowadays is to save them in in-memory caches like Memcached. As these caches run in their own processes, and Machine Learning services generally run as Python services, there is a communication overhead involved for each request that ML service serves. One can reduce this overhead by directly storing these vectors in a Python dictionary within the service. To support concurrency and scale, a single node runs multiple instances of the same service. Thus, we also want to avoid duplicating these vectors across multiple processes. In this paper, we propose a system to store vectors in shared memory and efficiently serve all concurrent instances of the service, without replicating the vectors themselves. We achieve up to 4.5x improvements in latency compared to Memcached. Additionally, due to availability of more memory, we can increase the number of server processes running in each node, translating into greater throughput. We also discuss the impact of the proposed method (towards increasing the throughput) in live production scenario.","PeriodicalId":166150,"journal":{"name":"Proceedings of the Second International Conference on AI-ML Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second International Conference on AI-ML Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3564121.3564799","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Many e-commerce companies use machine learning to make customer experience better. Even within a single company, there will be generally many independent services running, each specializing in some aspect of customer experience. Since machine learning models work on abstract vectors representing users and/or items, each such service needs a way to store these vectors. A common approach nowadays is to save them in in-memory caches like Memcached. As these caches run in their own processes, and Machine Learning services generally run as Python services, there is a communication overhead involved for each request that ML service serves. One can reduce this overhead by directly storing these vectors in a Python dictionary within the service. To support concurrency and scale, a single node runs multiple instances of the same service. Thus, we also want to avoid duplicating these vectors across multiple processes. In this paper, we propose a system to store vectors in shared memory and efficiently serve all concurrent instances of the service, without replicating the vectors themselves. We achieve up to 4.5x improvements in latency compared to Memcached. Additionally, due to availability of more memory, we can increase the number of server processes running in each node, translating into greater throughput. We also discuss the impact of the proposed method (towards increasing the throughput) in live production scenario.

查看原文本刊更多论文

使用共享内存的高效Python矢量存储系统

许多电子商务公司使用机器学习来改善客户体验。即使在一个公司内部，通常也会有许多独立的服务在运行，每个服务都专注于客户体验的某些方面。由于机器学习模型处理代表用户和/或项目的抽象向量，因此每个这样的服务都需要一种存储这些向量的方法。现在一种常见的方法是将它们保存在内存缓存中，比如Memcached。由于这些缓存在它们自己的进程中运行，而机器学习服务通常作为Python服务运行，因此ML服务所处理的每个请求都涉及通信开销。可以通过直接将这些向量存储在服务中的Python字典中来减少这种开销。为了支持并发性和可伸缩性，单个节点运行同一服务的多个实例。因此，我们还希望避免在多个进程中重复这些向量。在本文中，我们提出了一种将向量存储在共享内存中并有效地服务于所有并发服务实例的系统，而无需复制向量本身。与Memcached相比，我们的延迟提高了4.5倍。此外，由于更多内存的可用性，我们可以增加每个节点上运行的服务器进程的数量，从而转化为更大的吞吐量。我们还讨论了所提出的方法(对提高吞吐量)在现场生产场景中的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Second International Conference on AI-ML Systems

自引率

0.00%

发文量