arXiv - CS - Performance最新文献_第8页

Scaling Analog Photonic Accelerators for Byte-Size, Integer General Matrix Multiply (GEMM) Kernels 为字节级整数通用矩阵乘法 (GEMM) 内核扩展模拟光子加速器

arXiv - CS - Performance Pub Date : 2024-07-08 DOI: arxiv-2407.06134

Oluwaseun Adewunmi Alo, Sairam Sri Vatsavai, Ishan Thakkar

{"title":"Scaling Analog Photonic Accelerators for Byte-Size, Integer General Matrix Multiply (GEMM) Kernels","authors":"Oluwaseun Adewunmi Alo, Sairam Sri Vatsavai, Ishan Thakkar","doi":"arxiv-2407.06134","DOIUrl":"https://doi.org/arxiv-2407.06134","url":null,"abstract":"Deep Neural Networks (DNNs) predominantly rely on General Matrix Multiply\u0000(GEMM) kernels, which are often accelerated using specialized hardware\u0000architectures. Recently, analog photonic GEMM accelerators have emerged as a\u0000promising alternative, offering vastly superior speed and energy efficiency\u0000compared to traditional electronic accelerators. However, these photonic cannot\u0000support wider than 4-bit integer operands due to their inherent trade-offs\u0000between analog dynamic range and parallelism. This is often inadequate for DNN\u0000training as at least 8-bit wide operands are deemed necessary to prevent\u0000significant accuracy drops. To address these limitations, we introduce a\u0000scalable photonic GEMM accelerator named SPOGA. SPOGA utilizes enhanced\u0000features such as analog summation of homodyne optical signals and\u0000in-transduction positional weighting of operands. By employing an extended\u0000optical-analog dataflow that minimizes overheads associated with bit-sliced\u0000integer arithmetic, SPOGA supports byte-size integer GEMM kernels, achieving\u0000significant improvements in throughput, latency, and energy efficiency.\u0000Specifically, SPOGA demonstrates up to 14.4$times$, 2$times$, and\u000028.5$times$ improvements in frames-per-second (FPS), FPS/Watt, and\u0000FPS/Watt/mm$^2$ respectively, compared to existing state-of-the-art photonic\u0000solutions.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141573207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NCPP: Nova CPU Performance Predictor on a Novel Dataset NCPP：新数据集上的 Nova CPU 性能预测器

arXiv - CS - Performance Pub Date : 2024-07-03 DOI: arxiv-2407.03385

Xiaoman Liu

{"title":"NCPP: Nova CPU Performance Predictor on a Novel Dataset","authors":"Xiaoman Liu","doi":"arxiv-2407.03385","DOIUrl":"https://doi.org/arxiv-2407.03385","url":null,"abstract":"CPU performance prediction, which involves forecasting the performance scores\u0000of a CPU based on its hardware characteristics during its operation, is a\u0000critical technology for computational system design and resource management in\u0000the big data era. However, this research field currently faces two significant\u0000challenges. First, collecting real-world data is challenging due to the wide\u0000variety of CPU products on the market and the highly specialized nature of\u0000relevant hardware characteristics. In the research process, this field lacks a\u0000standard dataset with unified hardware characteristics, wide data coverage, and\u0000comprehensive benchmarks. Second, existing methods based on hardware simulation\u0000models or machine learning exhibit notable shortcomings, such as lengthy\u0000simulation test cycles and low prediction accuracy. To bridge these gaps, we\u0000first collect, preprocess, and standardize historical data from the 4th\u0000Generation Intel Xeon Scalable Processors across multiple benchmark suites to\u0000create a new dataset, named PerfCastDB. Subsequently, we design a deep learning\u0000based model called Nova CPU Performance Predictor (NCPP) as the baseline for\u0000this new dataset. The NCPP network is designed based on group attention\u0000mechanism. It effectively quantifies the implicit relationships between\u0000hardware characteristics within and across groups and comprehensively models\u0000the impact of various hardware characteristics on CPU performance prediction.\u0000We conduct comparative experiments using the proposed PerfCastDB dataset.\u0000Compared to existing approaches, NCPP achieves superior evaluation results,\u0000demonstrating its effectiveness. Furthermore, we have open-sourced part of the\u0000dataset and the NCPP network code to facilitate subsequent research. The\u0000resources can be accessed at https://github.com/xiaoman-liu/NCPP.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141577704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance Optimizations MIREncoder：基于预训练嵌入的多模态红外编码器的性能优化

arXiv - CS - Performance Pub Date : 2024-07-02 DOI: arxiv-2407.02238

Akash Dutta, Ali Jannesari

{"title":"MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance Optimizations","authors":"Akash Dutta, Ali Jannesari","doi":"arxiv-2407.02238","DOIUrl":"https://doi.org/arxiv-2407.02238","url":null,"abstract":"One of the primary areas of interest in High Performance Computing is the\u0000improvement of performance of parallel workloads. Nowadays, compilable source\u0000code-based optimization tasks that employ deep learning often exploit LLVM\u0000Intermediate Representations (IRs) for extracting features from source code.\u0000Most such works target specific tasks, or are designed with a pre-defined set\u0000of heuristics. So far, pre-trained models are rare in this domain, but the\u0000possibilities have been widely discussed. Especially approaches mimicking\u0000large-language models (LLMs) have been proposed. But these have prohibitively\u0000large training costs. In this paper, we propose MIREncoder, a M}ulti-modal\u0000IR-based Auto-Encoder that can be pre-trained to generate a learned embedding\u0000space to be used for downstream tasks by machine learning-based approaches. A\u0000multi-modal approach enables us to better extract features from compilable\u0000programs. It allows us to better model code syntax, semantics and structure.\u0000For code-based performance optimizations, these features are very important\u0000while making optimization decisions. A pre-trained model/embedding implicitly\u0000enables the usage of transfer learning, and helps move away from task-specific\u0000trained models. Additionally, a pre-trained model used for downstream\u0000performance optimization should itself have reduced overhead, and be easily\u0000usable. These considerations have led us to propose a modeling approach that i)\u0000understands code semantics and structure, ii) enables use of transfer learning,\u0000and iii) is small and simple enough to be easily re-purposed or reused even\u0000with low resource availability. Our evaluations will show that our proposed\u0000approach can outperform the state of the art while reducing overhead.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141516732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LLload: Simplifying Real-Time Job Monitoring for HPC Users LLload：简化高性能计算用户的实时作业监控

arXiv - CS - Performance Pub Date : 2024-07-01 DOI: arxiv-2407.01481

Chansup Byun, Julia Mullen, Albert Reuther, William Arcand, William Bergeron, David Bestor, Daniel Burrill, Vijay Gadepally, Michael Houle, Matthew Hubbell, Hayden Jananthan, Michael Jones, Peter Michaleas, Guillermo Morales, Andrew Prout, Antonio Rosa, Charles Yee, Jeremy Kepner, Lauren Milechin

{"title":"LLload: Simplifying Real-Time Job Monitoring for HPC Users","authors":"Chansup Byun, Julia Mullen, Albert Reuther, William Arcand, William Bergeron, David Bestor, Daniel Burrill, Vijay Gadepally, Michael Houle, Matthew Hubbell, Hayden Jananthan, Michael Jones, Peter Michaleas, Guillermo Morales, Andrew Prout, Antonio Rosa, Charles Yee, Jeremy Kepner, Lauren Milechin","doi":"arxiv-2407.01481","DOIUrl":"https://doi.org/arxiv-2407.01481","url":null,"abstract":"One of the more complex tasks for researchers using HPC systems is\u0000performance monitoring and tuning of their applications. Developing a practice\u0000of continuous performance improvement, both for speed-up and efficient use of\u0000resources is essential to the long term success of both the HPC practitioner\u0000and the research project. Profiling tools provide a nice view of the\u0000performance of an application but often have a steep learning curve and rarely\u0000provide an easy to interpret view of resource utilization. Lower level tools\u0000such as top and htop provide a view of resource utilization for those familiar\u0000and comfortable with Linux but a barrier for newer HPC practitioners. To expand\u0000the existing profiling and job monitoring options, the MIT Lincoln Laboratory\u0000Supercomputing Center created LLoad, a tool that captures a snapshot of the\u0000resources being used by a job on a per user basis. LLload is a tool built from\u0000standard HPC tools that provides an easy way for a researcher to track resource\u0000usage of active jobs. We explain how the tool was designed and implemented and\u0000provide insight into how it is used to aid new researchers in developing their\u0000performance monitoring skills as well as guide researchers in their resource\u0000requests.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Power-Consumption Analysis for Different IPoWDM Network Architectures with ZR/ZR+ and Long-Haul Muxponders 采用 ZR/ZR+ 和长距离多路寻址器的不同 IPoWDM 网络架构的功耗分析

arXiv - CS - Performance Pub Date : 2024-06-30 DOI: arxiv-2407.00643

Qiaolun Zhang, Annalisa Morea, Patricia Layec, Memedhe Ibrahimi, Francesco Musumeci, Massimo Tornatore

{"title":"A Power-Consumption Analysis for Different IPoWDM Network Architectures with ZR/ZR+ and Long-Haul Muxponders","authors":"Qiaolun Zhang, Annalisa Morea, Patricia Layec, Memedhe Ibrahimi, Francesco Musumeci, Massimo Tornatore","doi":"arxiv-2407.00643","DOIUrl":"https://doi.org/arxiv-2407.00643","url":null,"abstract":"Operators are constantly faced with the need to increase optical-network\u0000capacity to accommodate rapid traffic growth while minimizing the cost-per-bit\u0000and power-per-bit. The drastic reduction of power consumption of IP routers and\u0000ZR/ZR+ pluggable transponders seen in the last years has renewed the interest\u0000in \"opaque\" optical-network architectures, where no optical bypassing is\u0000allowed. In this work, we aim to quantify and compare the power consumption of\u0000four \"IP over Wavelength Division Multiplexing\" (IPoWDM) transport network\u0000architectures employing ZR/ZR+ modules vs. long-haul muxponders, considering\u0000different grooming, regeneration, and optical bypassing capabilities. We first\u0000propose a power consumption model for different IPoWDM node architectures with\u0000ZR/ZR+ modules and long-haul muxponders. Then, to obtain the power consumption\u0000of different architectures, we propose a compact auxiliary-graph-based\u0000network-design algorithm extensible to different network architectures.\u0000Moreover, we investigate how the continuous decrease in the power consumption\u0000of ZR/ZR+ and IP routers can impact the power consumption of different\u0000architectures through a sensitivity analysis. Illustrative numerical results on\u0000networks of different sizes show that, despite drastic reductions of power\u0000consumption at IP layer, optical bypassing is still the most power-efficient\u0000solution, reducing consumption by up to 48%.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141516733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AR-PPF: Advanced Resolution-Based Pixel Preemption Data Filtering for Efficient Time-Series Data Analysis AR-PPF：基于分辨率的高级像素抢先数据过滤，用于高效的时间序列数据分析

arXiv - CS - Performance Pub Date : 2024-06-27 DOI: arxiv-2406.19575

Taewoong Kim, Kukjin Choi, Sungjun Kim

{"title":"AR-PPF: Advanced Resolution-Based Pixel Preemption Data Filtering for Efficient Time-Series Data Analysis","authors":"Taewoong Kim, Kukjin Choi, Sungjun Kim","doi":"arxiv-2406.19575","DOIUrl":"https://doi.org/arxiv-2406.19575","url":null,"abstract":"With the advent of automation, many manufacturing industries have\u0000transitioned to data-centric methodologies, giving rise to an unprecedented\u0000influx of data during the manufacturing process. This data has become\u0000instrumental in analyzing the quality of manufacturing process and equipment.\u0000Engineers and data analysts, in particular, require extensive time-series data\u0000for seasonal cycle analysis. However, due to computational resource\u0000constraints, they are often limited to querying short-term data multiple times\u0000or resorting to the use of summarized data in which key patterns may be\u0000overlooked. This study proposes a novel solution to overcome these limitations;\u0000the advanced resolution-based pixel preemption data filtering (AR-PPF)\u0000algorithm. This technology allows for efficient visualization of time-series\u0000charts over long periods while significantly reducing the time required to\u0000retrieve data. We also demonstrates how this approach not only enhances the\u0000efficiency of data analysis but also ensures that key feature is not lost,\u0000thereby providing a more accurate and comprehensive understanding of the data.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"152 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141530846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Autotuning-based Optimization Framework for Mixed-kernel SVM Classifications in Smart Pixel Datasets and Heterojunction Transistors 基于自动调整的优化框架，用于智能像素数据集和异质结晶体管中的混合核 SVM 分类

arXiv - CS - Performance Pub Date : 2024-06-26 DOI: arxiv-2406.18445

Xingfu Wu, Tupendra Oli, ustin H. Qian, Valerie Taylor, Mark C. Hersam, Vinod K. Sangwan

{"title":"An Autotuning-based Optimization Framework for Mixed-kernel SVM Classifications in Smart Pixel Datasets and Heterojunction Transistors","authors":"Xingfu Wu, Tupendra Oli, ustin H. Qian, Valerie Taylor, Mark C. Hersam, Vinod K. Sangwan","doi":"arxiv-2406.18445","DOIUrl":"https://doi.org/arxiv-2406.18445","url":null,"abstract":"Support Vector Machine (SVM) is a state-of-the-art classification method\u0000widely used in science and engineering due to its high accuracy, its ability to\u0000deal with high dimensional data, and its flexibility in modeling diverse\u0000sources of data. In this paper, we propose an autotuning-based optimization\u0000framework to quantify the ranges of hyperparameters in SVMs to identify their\u0000optimal choices, and apply the framework to two SVMs with the mixed-kernel\u0000between Sigmoid and Gaussian kernels for smart pixel datasets in high energy\u0000physics (HEP) and mixed-kernel heterojunction transistors (MKH). Our\u0000experimental results show that the optimal selection of hyperparameters in the\u0000SVMs and the kernels greatly varies for different applications and datasets,\u0000and choosing their optimal choices is critical for a high classification\u0000accuracy of the mixed kernel SVMs. Uninformed choices of hyperparameters C and\u0000coef0 in the mixed-kernel SVMs result in severely low accuracy, and the\u0000proposed framework effectively quantifies the proper ranges for the\u0000hyperparameters in the SVMs to identify their optimal choices to achieve the\u0000highest accuracy 94.6% for the HEP application and the highest average\u0000accuracy 97.2% with far less tuning time for the MKH application.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"86 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

How Good Is It? Evaluating the Efficacy of Common versus Domain-Specific Prompts on Foundational Large Language Models 有多好？评估通用提示与特定领域提示在基础大语言模型上的效果

arXiv - CS - Performance Pub Date : 2024-06-25 DOI: arxiv-2407.11006

Oluyemi Enoch Amujo, Shanchieh Jay Yang

{"title":"How Good Is It? Evaluating the Efficacy of Common versus Domain-Specific Prompts on Foundational Large Language Models","authors":"Oluyemi Enoch Amujo, Shanchieh Jay Yang","doi":"arxiv-2407.11006","DOIUrl":"https://doi.org/arxiv-2407.11006","url":null,"abstract":"Recently, large language models (LLMs) have expanded into various domains.\u0000However, there remains a need to evaluate how these models perform when\u0000prompted with commonplace queries compared to domain-specific queries, which\u0000may be useful for benchmarking prior to fine-tuning domain-specific downstream\u0000tasks. This study evaluates LLMs, specifically Gemma-2B and Gemma-7B, across\u0000diverse domains, including cybersecurity, medicine, and finance, compared to\u0000common knowledge queries. This study employs a comprehensive methodology to\u0000evaluate foundational models, encompassing problem formulation, data analysis,\u0000and the development of novel outlier detection techniques. This methodological\u0000rigor enhances the credibility of the presented evaluation frameworks. This\u0000study focused on assessing inference time, response length, throughput,\u0000quality, and resource utilization and investigated the correlations between\u0000these factors. The results indicate that model size and types of prompts used\u0000for inference significantly influenced response length and quality. In\u0000addition, common prompts, which include various types of queries, generate\u0000diverse and inconsistent responses at irregular intervals. In contrast,\u0000domain-specific prompts consistently generate concise responses within a\u0000reasonable time. Overall, this study underscores the need for comprehensive\u0000evaluation frameworks to enhance the reliability of benchmarking procedures in\u0000multidomain AI research.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141718839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments 利用 Collective Mind、虚拟化 MLOps、MLPerf、Collective Knowledge Playground 和可重现的优化锦标赛，打造更高效、更具成本效益的人工智能/人工智能系统

arXiv - CS - Performance Pub Date : 2024-06-24 DOI: arxiv-2406.16791

Grigori Fursin

{"title":"Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments","authors":"Grigori Fursin","doi":"arxiv-2406.16791","DOIUrl":"https://doi.org/arxiv-2406.16791","url":null,"abstract":"In this white paper, I present my community effort to automatically co-design\u0000cheaper, faster and more energy-efficient software and hardware for AI, ML and\u0000other popular workloads with the help of the Collective Mind framework (CM),\u0000virtualized MLOps, MLPerf benchmarks and reproducible optimization tournaments.\u0000I developed CM to modularize, automate and virtualize the tedious process of\u0000building, running, profiling and optimizing complex applications across rapidly\u0000evolving open-source and proprietary AI/ML models, datasets, software and\u0000hardware. I achieved that with the help of portable, reusable and\u0000technology-agnostic automation recipes (ResearchOps) for MLOps and DevOps\u0000(CM4MLOps) discovered in close collaboration with academia and industry when\u0000reproducing more than 150 research papers and organizing the 1st mass-scale\u0000community benchmarking of ML and AI systems using CM and MLPerf. I donated CM and CM4MLOps to MLCommons to help connect academia and industry\u0000to learn how to build and run AI and other emerging workloads in the most\u0000efficient and cost-effective way using a common and technology-agnostic\u0000automation, virtualization and reproducibility framework while unifying\u0000knowledge exchange, protecting everyone's intellectual property, enabling\u0000portable skills, and accelerating transfer of the state-of-the-art research to\u0000production. My long-term vision is to make AI accessible to everyone by making\u0000it a commodity automatically produced from the most suitable open-source and\u0000proprietary components from different vendors based on user demand,\u0000requirements and constraints such as cost, latency, throughput, accuracy,\u0000energy, size and other important characteristics.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Real-Time Neural Volumetric Rendering on Mobile Devices: A Measurement Study 在移动设备上实现实时神经体积渲染：测量研究

arXiv - CS - Performance Pub Date : 2024-06-23 DOI: arxiv-2406.16068

Zhe Wang, Yifei Zhu

{"title":"Towards Real-Time Neural Volumetric Rendering on Mobile Devices: A Measurement Study","authors":"Zhe Wang, Yifei Zhu","doi":"arxiv-2406.16068","DOIUrl":"https://doi.org/arxiv-2406.16068","url":null,"abstract":"Neural Radiance Fields (NeRF) is an emerging technique to synthesize 3D\u0000objects from 2D images with a wide range of potential applications. However,\u0000rendering existing NeRF models is extremely computation intensive, making it\u0000challenging to support real-time interaction on mobile devices. In this paper,\u0000we take the first initiative to examine the state-of-the-art real-time NeRF\u0000rendering technique from a system perspective. We first define the entire\u0000working pipeline of the NeRF serving system. We then identify possible control\u0000knobs that are critical to the system from the communication, computation, and\u0000visual performance perspective. Furthermore, an extensive measurement study is\u0000conducted to reveal the effects of these control knobs on system performance.\u0000Our measurement results reveal that different control knobs contribute\u0000differently towards improving the system performance, with the mesh granularity\u0000being the most effective knob and the quantization being the least effective\u0000knob. In addition, diverse hardware device settings and network conditions have\u0000to be considered to fully unleash the benefit of operating under the\u0000appropriate knobs","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141507193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0