arXiv - CS - Performance最新文献_第10页

Comment on paper: Position: Rethinking Post-Hoc Search-Based Neural Approaches for Solving Large-Scale Traveling Salesman Problems 对文件发表评论：立场：反思解决大规模旅行推销员问题的基于事后搜索的神经方法

arXiv - CS - Performance Pub Date : 2024-06-11 DOI: arxiv-2406.09441

Yimeng Min

引用次数: 0

Comments on "Federated Learning with Differential Privacy: Algorithms and Performance Analysis" 关于 "具有差异隐私的联合学习：算法和性能分析"

arXiv - CS - Performance Pub Date : 2024-06-09 DOI: arxiv-2406.05858

Mahtab Talaei, Iman Izadi

引用次数: 0

QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead QJL：用于 KV 缓存量化的 1 位量化 JL 变换，零开销

arXiv - CS - Performance Pub Date : 2024-06-05 DOI: arxiv-2406.03482

Amir Zandieh, Majid Daliri, Insu Han

{"title":"QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead","authors":"Amir Zandieh, Majid Daliri, Insu Han","doi":"arxiv-2406.03482","DOIUrl":"https://doi.org/arxiv-2406.03482","url":null,"abstract":"Serving LLMs requires substantial memory due to the storage requirements of\u0000Key-Value (KV) embeddings in the KV cache, which grows with sequence length. An\u0000effective approach to compress KV cache is quantization. However, traditional\u0000quantization methods face significant memory overhead due to the need to store\u0000quantization constants (at least a zero point and a scale) in full precision\u0000per data block. Depending on the block size, this overhead can add 1 or 2 bits\u0000per quantized number. We introduce QJL, a new quantization approach that\u0000consists of a Johnson-Lindenstrauss (JL) transform followed by sign-bit\u0000quantization. In contrast to existing methods, QJL eliminates memory overheads\u0000by removing the need for storing quantization constants. We propose an\u0000asymmetric estimator for the inner product of two vectors and demonstrate that\u0000applying QJL to one vector and a standard JL transform without quantization to\u0000the other provides an unbiased estimator with minimal distortion. We have\u0000developed an efficient implementation of the QJL sketch and its corresponding\u0000inner product estimator, incorporating a lightweight CUDA kernel for optimized\u0000computation. When applied across various LLMs and NLP tasks to quantize the KV\u0000cache to only 3 bits, QJL demonstrates a more than fivefold reduction in KV\u0000cache memory usage without compromising accuracy, all while achieving faster\u0000runtime. Codes are available at url{https://github.com/amirzandieh/QJL}.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Impact of Generative AI (Large Language Models) on the PRA model construction and maintenance, observations 生成式人工智能（大型语言模型）对 PRA 模型构建和维护的影响，观察结果

arXiv - CS - Performance Pub Date : 2024-06-03 DOI: arxiv-2406.01133

Valentin RychkovEDF R&D, Claudia PicocoEDF R&D, Emilie CalecaEDF R&D

引用次数: 0

Ranking with Ties based on Noisy Performance Data 基于噪声性能数据的并列排名

arXiv - CS - Performance Pub Date : 2024-05-28 DOI: arxiv-2405.18259

Aravind Sankaran, Lars Karlsson, Paolo Bientinesi

{"title":"Ranking with Ties based on Noisy Performance Data","authors":"Aravind Sankaran, Lars Karlsson, Paolo Bientinesi","doi":"arxiv-2405.18259","DOIUrl":"https://doi.org/arxiv-2405.18259","url":null,"abstract":"We consider the problem of ranking a set of objects based on their\u0000performance when the measurement of said performance is subject to noise. In\u0000this scenario, the performance is measured repeatedly, resulting in a range of\u0000measurements for each object. If the ranges of two objects do not overlap, then\u0000we consider one object as 'better' than the other, and we expect it to receive\u0000a higher rank; if, however, the ranges overlap, then the objects are\u0000incomparable, and we wish them to be assigned the same rank. Unfortunately, the\u0000incomparability relation of ranges is in general not transitive; as a\u0000consequence, in general the two requirements cannot be satisfied\u0000simultaneously, i.e., it is not possible to guarantee both distinct ranks for\u0000objects with separated ranges, and same rank for objects with overlapping\u0000ranges. This conflict leads to more than one reasonable way to rank a set of\u0000objects. In this paper, we explore the ambiguities that arise when ranking with\u0000ties, and define a set of reasonable rankings, which we call partial rankings.\u0000We develop and analyse three different methodologies to compute a partial\u0000ranking. Finally, we show how performance differences among objects can be\u0000investigated with the help of partial ranking.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Analysis of Performance Bottlenecks in MRI Pre-Processing 核磁共振成像预处理性能瓶颈分析

arXiv - CS - Performance Pub Date : 2024-05-27 DOI: arxiv-2405.17650

Mathieu Dugré, Yohan Chatelain, Tristan Glatard

{"title":"An Analysis of Performance Bottlenecks in MRI Pre-Processing","authors":"Mathieu Dugré, Yohan Chatelain, Tristan Glatard","doi":"arxiv-2405.17650","DOIUrl":"https://doi.org/arxiv-2405.17650","url":null,"abstract":"Magnetic Resonance Image (MRI) pre-processing is a critical step for\u0000neuroimaging analysis. However, the computational cost of MRI pre-processing\u0000pipelines is a major bottleneck for large cohort studies and some clinical\u0000applications. While High-Performance Computing (HPC) and, more recently, Deep\u0000Learning have been adopted to accelerate the computations, these techniques\u0000require costly hardware and are not accessible to all researchers. Therefore,\u0000it is important to understand the performance bottlenecks of MRI pre-processing\u0000pipelines to improve their performance. Using Intel VTune profiler, we\u0000characterized the bottlenecks of several commonly used MRI-preprocessing\u0000pipelines from the ANTs, FSL, and FreeSurfer toolboxes. We found that few\u0000functions contributed to most of the CPU time, and that linear interpolation\u0000was the largest contributor. Data access was also a substantial bottleneck. We\u0000identified a bug in the ITK library that impacts the performance of ANTs\u0000pipeline in single-precision and a potential issue with the OpenMP scaling in\u0000FreeSurfer recon-all. Our results provide a reference for future efforts to\u0000optimize MRI pre-processing pipelines.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"129 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141166526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluation of Resource-Efficient Crater Detectors on Embedded Systems 评估嵌入式系统上的资源节约型弹坑探测器

arXiv - CS - Performance Pub Date : 2024-05-27 DOI: arxiv-2405.16953

Simon Vellas, Bill Psomas, Kalliopi Karadima, Dimitrios Danopoulos, Alexandros Paterakis, George Lentaris, Dimitrios Soudris, Konstantinos Karantzalos

引用次数: 0

AmBC-NOMA-Aided Short-Packet Communication for High Mobility V2X Transmissions 用于高移动性 V2X 传输的 AmBC-NOMA 辅助短包通信

arXiv - CS - Performance Pub Date : 2024-05-26 DOI: arxiv-2405.16502

Xinyue Pei, Xingwei Wang, Yingyang Chen, Tingrui Pei, Miaowen Wen

引用次数: 0

Graph neural networks with configuration cross-attention for tensor compilers 为张量编译器配置交叉注意的图神经网络

arXiv - CS - Performance Pub Date : 2024-05-26 DOI: arxiv-2405.16623

Dmitrii Khizbullin, Eduardo Rocha de Andrade, Thanh Hau Nguyen, Matheus Pedroza Ferreira, David R. Pugh

引用次数: 0

An Experimental Study of Different Aggregation Schemes in Semi-Asynchronous Federated Learning 半同步联合学习中不同聚合方案的实验研究

arXiv - CS - Performance Pub Date : 2024-05-25 DOI: arxiv-2405.16086

Yunbo Li, Jiaping Gui, Yue Wu

引用次数: 0