{"title":"利用 nVIDIA H100 GPU 进行机密计算:性能基准研究","authors":"Jianwei Zhu, Hang Yin, Shunfan Zhou","doi":"arxiv-2409.03992","DOIUrl":null,"url":null,"abstract":"This report evaluates the performance impact of enabling Trusted Execution\nEnvironments (TEE) on NVIDIA H100 GPUs for large language model (LLM) inference\ntasks. We benchmark the overhead introduced by TEE mode across various models\nand token lengths, focusing on the bottleneck caused by CPU-GPU data transfers\nvia PCIe. Our results show that while there is minimal computational overhead\nwithin the GPU, the overall performance penalty is primarily due to data\ntransfer. For most typical LLM queries, the overhead remains below 5%, with\nlarger models and longer sequences experiencing near-zero overhead.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"176 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Confidential Computing on nVIDIA H100 GPU: A Performance Benchmark Study\",\"authors\":\"Jianwei Zhu, Hang Yin, Shunfan Zhou\",\"doi\":\"arxiv-2409.03992\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This report evaluates the performance impact of enabling Trusted Execution\\nEnvironments (TEE) on NVIDIA H100 GPUs for large language model (LLM) inference\\ntasks. We benchmark the overhead introduced by TEE mode across various models\\nand token lengths, focusing on the bottleneck caused by CPU-GPU data transfers\\nvia PCIe. Our results show that while there is minimal computational overhead\\nwithin the GPU, the overall performance penalty is primarily due to data\\ntransfer. For most typical LLM queries, the overhead remains below 5%, with\\nlarger models and longer sequences experiencing near-zero overhead.\",\"PeriodicalId\":501291,\"journal\":{\"name\":\"arXiv - CS - Performance\",\"volume\":\"176 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Performance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.03992\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.03992","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Confidential Computing on nVIDIA H100 GPU: A Performance Benchmark Study
This report evaluates the performance impact of enabling Trusted Execution
Environments (TEE) on NVIDIA H100 GPUs for large language model (LLM) inference
tasks. We benchmark the overhead introduced by TEE mode across various models
and token lengths, focusing on the bottleneck caused by CPU-GPU data transfers
via PCIe. Our results show that while there is minimal computational overhead
within the GPU, the overall performance penalty is primarily due to data
transfer. For most typical LLM queries, the overhead remains below 5%, with
larger models and longer sequences experiencing near-zero overhead.