arXiv - CS - Performance最新文献

筛选
英文 中文
HRA: A Multi-Criteria Framework for Ranking Metaheuristic Optimization Algorithms HRA:元搜索优化算法排序的多标准框架
arXiv - CS - Performance Pub Date : 2024-09-18 DOI: arxiv-2409.11617
Evgenia-Maria K. Goula, Dimitris G. Sotiropoulos
{"title":"HRA: A Multi-Criteria Framework for Ranking Metaheuristic Optimization Algorithms","authors":"Evgenia-Maria K. Goula, Dimitris G. Sotiropoulos","doi":"arxiv-2409.11617","DOIUrl":"https://doi.org/arxiv-2409.11617","url":null,"abstract":"Metaheuristic algorithms are essential for solving complex optimization\u0000problems in different fields. However, the difficulty in comparing and rating\u0000these algorithms remains due to the wide range of performance metrics and\u0000problem dimensions usually involved. On the other hand, nonparametric\u0000statistical methods and post hoc tests are time-consuming, especially when we\u0000only need to identify the top performers among many algorithms. The\u0000Hierarchical Rank Aggregation (HRA) algorithm aims to efficiently rank\u0000metaheuristic algorithms based on their performance across many criteria and\u0000dimensions. The HRA employs a hierarchical framework that begins with\u0000collecting performance metrics on various benchmark functions and dimensions.\u0000Rank-based normalization is employed for each performance measure to ensure\u0000comparability and the robust TOPSIS aggregation is applied to combine these\u0000rankings at several hierarchical levels, resulting in a comprehensive ranking\u0000of the algorithms. Our study uses data from the CEC 2017 competition to\u0000demonstrate the robustness and efficacy of the HRA framework. It examines 30\u0000benchmark functions and evaluates the performance of 13 metaheuristic\u0000algorithms across five performance indicators in four distinct dimensions. This\u0000presentation highlights the potential of the HRA to enhance the interpretation\u0000of the comparative advantages and disadvantages of various algorithms by\u0000simplifying practitioners' choices of the most appropriate algorithm for\u0000certain optimization problems.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can Graph Reordering Speed Up Graph Neural Network Training? An Experimental Study 图重排能加速图神经网络训练吗?实验研究
arXiv - CS - Performance Pub Date : 2024-09-17 DOI: arxiv-2409.11129
Nikolai Merkel, Pierre Toussing, Ruben Mayer, Hans-Arno Jacobsen
{"title":"Can Graph Reordering Speed Up Graph Neural Network Training? An Experimental Study","authors":"Nikolai Merkel, Pierre Toussing, Ruben Mayer, Hans-Arno Jacobsen","doi":"arxiv-2409.11129","DOIUrl":"https://doi.org/arxiv-2409.11129","url":null,"abstract":"Graph neural networks (GNNs) are a type of neural network capable of learning\u0000on graph-structured data. However, training GNNs on large-scale graphs is\u0000challenging due to iterative aggregations of high-dimensional features from\u0000neighboring vertices within sparse graph structures combined with neural\u0000network operations. The sparsity of graphs frequently results in suboptimal\u0000memory access patterns and longer training time. Graph reordering is an\u0000optimization strategy aiming to improve the graph data layout. It has shown to\u0000be effective to speed up graph analytics workloads, but its effect on the\u0000performance of GNN training has not been investigated yet. The generalization\u0000of reordering to GNN performance is nontrivial, as multiple aspects must be\u0000considered: GNN hyper-parameters such as the number of layers, the number of\u0000hidden dimensions, and the feature size used in the GNN model, neural network\u0000operations, large intermediate vertex states, and GPU acceleration. In our work, we close this gap by performing an empirical evaluation of 12\u0000reordering strategies in two state-of-the-art GNN systems, PyTorch Geometric\u0000and Deep Graph Library. Our results show that graph reordering is effective in\u0000reducing training time for CPU- and GPU-based training, respectively. Further,\u0000we find that GNN hyper-parameters influence the effectiveness of reordering,\u0000that reordering metrics play an important role in selecting a reordering\u0000strategy, that lightweight reordering performs better for GPU-based than for\u0000CPU-based training, and that invested reordering time can in many cases be\u0000amortized.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal Load Imbalance on Ondes3D Seismic Simulator for Different Multicore Architectures 不同多核架构 Ondes3D 地震模拟器上的时间负载失衡问题
arXiv - CS - Performance Pub Date : 2024-09-17 DOI: arxiv-2409.11392
Ana Luisa Veroneze Solórzano, Philippe Olivier Alexandre Navaux, Lucas Mello Schnorr
{"title":"Temporal Load Imbalance on Ondes3D Seismic Simulator for Different Multicore Architectures","authors":"Ana Luisa Veroneze Solórzano, Philippe Olivier Alexandre Navaux, Lucas Mello Schnorr","doi":"arxiv-2409.11392","DOIUrl":"https://doi.org/arxiv-2409.11392","url":null,"abstract":"The variety of today's multicore architectures motivates researchers to\u0000explore parallel scientific applications on different platforms. Load imbalance\u0000is one performance issue that can prejudice parallel applications from\u0000exploiting the computational power of these platforms. Ondes3D is a scientific\u0000application for seismic wave simulation used to assess the geological impact of\u0000earthquakes. Its parallelism relies on applying a regular domain decomposition\u0000in the geological domain provided and distributing each sub-domain to MPI\u0000ranks. Previous works investigate the significant spatial and temporal\u0000imbalance in Ondes3D and suggest new parallelization and load balancing\u0000techniques to minimize them. However, none explored its execution on different\u0000architectures. Our paper evaluates the performance of Ondes3D for two\u0000earthquake scenarios on eight different multicore architectures, including\u0000Intel, AMD, and ARM processors. We measure the load distribution per MPI rank,\u0000evaluate the temporal load imbalance, and compare the execution of the\u0000application's kernels. Our results show that the temporal load imbalance in\u0000Ondes3D depends on the architecture chosen, with some platforms minimizing such\u0000imbalance more effectively.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Landscape of GPU-Centric Communication 以 GPU 为中心的通信格局
arXiv - CS - Performance Pub Date : 2024-09-15 DOI: arxiv-2409.09874
Didem Unat, Ilyas Turimbetov, Mohammed Kefah Taha Issa, Doğan Sağbili, Flavio Vella, Daniele De Sensi, Ismayil Ismayilov
{"title":"The Landscape of GPU-Centric Communication","authors":"Didem Unat, Ilyas Turimbetov, Mohammed Kefah Taha Issa, Doğan Sağbili, Flavio Vella, Daniele De Sensi, Ismayil Ismayilov","doi":"arxiv-2409.09874","DOIUrl":"https://doi.org/arxiv-2409.09874","url":null,"abstract":"n recent years, GPUs have become the preferred accelerators for HPC and ML\u0000applications due to their parallelism and fast memory bandwidth. While GPUs\u0000boost computation, inter-GPU communication can create scalability bottlenecks,\u0000especially as the number of GPUs per node and cluster grows. Traditionally, the\u0000CPU managed multi-GPU communication, but advancements in GPU-centric\u0000communication now challenge this CPU dominance by reducing its involvement,\u0000granting GPUs more autonomy in communication tasks, and addressing mismatches\u0000in multi-GPU communication and computation. This paper provides a landscape of GPU-centric communication, focusing on\u0000vendor mechanisms and user-level library supports. It aims to clarify the\u0000complexities and diverse options in this field, define the terminology, and\u0000categorize existing approaches within and across nodes. The paper discusses\u0000vendor-provided mechanisms for communication and memory management in multi-GPU\u0000execution and reviews major communication libraries, their benefits,\u0000challenges, and performance insights. Then, it explores key research paradigms,\u0000future outlooks, and open research questions. By extensively describing\u0000GPU-centric communication techniques across the software and hardware stacks,\u0000we provide researchers, programmers, engineers, and library designers insights\u0000on how to exploit multi-GPU systems at their best.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Global Perspective on the Past, Present, and Future of Video Streaming over Starlink 从全球视角看星际链路视频流的过去、现在和未来
arXiv - CS - Performance Pub Date : 2024-09-15 DOI: arxiv-2409.09846
Liz Izhikevich, Reese Enghardt, Te-Yuan Huang, Renata Teixeira
{"title":"A Global Perspective on the Past, Present, and Future of Video Streaming over Starlink","authors":"Liz Izhikevich, Reese Enghardt, Te-Yuan Huang, Renata Teixeira","doi":"arxiv-2409.09846","DOIUrl":"https://doi.org/arxiv-2409.09846","url":null,"abstract":"This study presents the first global analysis of on-demand video streaming\u0000over Low Earth Orbit (LEO) satellite networks, using data from over one million\u0000households across 85 countries. We highlight Starlink's role as a major LEO\u0000provider, enhancing connectivity in underserved regions. Our findings reveal\u0000that while overall video quality on Starlink matches that of traditional\u0000networks, the inherent variability in LEO conditions -- such as throughput\u0000fluctuations and packet loss -- leads to an increase in bitrate switches and\u0000rebuffers. To further improve the quality of experience for the LEO community,\u0000we manipulate existing congestion control and adaptive bitrate streaming\u0000algorithms using simulation and real A/B tests deployed on over one million\u0000households. Our results underscore the need for video streaming and congestion\u0000control algorithms to adapt to rapidly evolving network landscapes, ensuring\u0000high-quality service across diverse and dynamic network types.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"211 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators 为深度神经网络加速器自动生成快速准确的性能模型
arXiv - CS - Performance Pub Date : 2024-09-13 DOI: arxiv-2409.08595
Konstantin Lübeck, Alexander Louis-Ferdinand Jung, Felix Wedlich, Mika Markus Müller, Federico Nicolás Peccia, Felix Thömmes, Jannik Steinmetz, Valentin Biermaier, Adrian Frischknecht, Paul Palomero Bernardo, Oliver Bringmann
{"title":"Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators","authors":"Konstantin Lübeck, Alexander Louis-Ferdinand Jung, Felix Wedlich, Mika Markus Müller, Federico Nicolás Peccia, Felix Thömmes, Jannik Steinmetz, Valentin Biermaier, Adrian Frischknecht, Paul Palomero Bernardo, Oliver Bringmann","doi":"arxiv-2409.08595","DOIUrl":"https://doi.org/arxiv-2409.08595","url":null,"abstract":"Implementing Deep Neural Networks (DNNs) on resource-constrained edge devices\u0000is a challenging task that requires tailored hardware accelerator architectures\u0000and a clear understanding of their performance characteristics when executing\u0000the intended AI workload. To facilitate this, we present an automated\u0000generation approach for fast performance models to accurately estimate the\u0000latency of a DNN mapped onto systematically modeled and concisely described\u0000accelerator architectures. Using our accelerator architecture description\u0000method, we modeled representative DNN accelerators such as Gemmini, UltraTrail,\u0000Plasticine-derived, and a parameterizable systolic array. Together with DNN\u0000mappings for those modeled architectures, we perform a combined DNN/hardware\u0000dependency graph analysis, which enables us, in the best case, to evaluate only\u0000154 loop kernel iterations to estimate the performance for 4.19 billion\u0000instructions achieving a significant speedup. We outperform regression and\u0000analytical models in terms of mean absolute percentage error (MAPE) compared to\u0000simulation results, while being several magnitudes faster than an RTL\u0000simulation.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"75 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computational Algorithms for the Product Form Solution of Closed Queuing Networks with Finite Buffers and Skip-Over Policy 具有有限缓冲区和跳过策略的封闭排队网络乘积形式求解计算算法
arXiv - CS - Performance Pub Date : 2024-09-12 DOI: arxiv-2409.08075
Gianfranco Balbo, Andrea Marin, Diletta Olliaro, Matteo Sereno
{"title":"Computational Algorithms for the Product Form Solution of Closed Queuing Networks with Finite Buffers and Skip-Over Policy","authors":"Gianfranco Balbo, Andrea Marin, Diletta Olliaro, Matteo Sereno","doi":"arxiv-2409.08075","DOIUrl":"https://doi.org/arxiv-2409.08075","url":null,"abstract":"Closed queuing networks with finite capacity buffers and skip-over policies\u0000are fundamental models in the performance evaluation of computer and\u0000communication systems. This technical report presents the details of\u0000computational algorithms to derive the key performance metrics for such\u0000networks. The primary focus is on the efficient computation of the\u0000normalization constant, which is critical for determining the steady-state\u0000probabilities of the network states under investigation. A convolution\u0000algorithm is proposed, which paves the way for the computation of key\u0000performance indices, such as queue length distribution and throughput,\u0000accommodating the intricacies introduced by finite capacity constraints and\u0000skip-over mechanisms. Finally, an extension of the traditional Mean Value\u0000Analysis algorithm addressing numerical stability is provided. The approaches\u0000discussed here allow make the investigation of large-scale networks feasible\u0000and enable the development of robust implementations of these techniques for\u0000practical use.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"84 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142195484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Microarchitectural comparison and in-core modeling of state-of-the-art CPUs: Grace, Sapphire Rapids, and Genoa 最先进 CPU 的微架构比较和内核建模:格雷斯、蓝宝石急流和热那亚
arXiv - CS - Performance Pub Date : 2024-09-12 DOI: arxiv-2409.08108
Jan Laukemann, Georg Hager, Gerhard Wellein
{"title":"Microarchitectural comparison and in-core modeling of state-of-the-art CPUs: Grace, Sapphire Rapids, and Genoa","authors":"Jan Laukemann, Georg Hager, Gerhard Wellein","doi":"arxiv-2409.08108","DOIUrl":"https://doi.org/arxiv-2409.08108","url":null,"abstract":"With Nvidia's release of the Grace Superchip, all three big semiconductor\u0000companies in HPC (AMD, Intel, Nvidia) are currently competing in the race for\u0000the best CPU. In this work we analyze the performance of these state-of-the-art\u0000CPUs and create an accurate in-core performance model for their\u0000microarchitectures Zen 4, Golden Cove, and Neoverse V2, extending the Open\u0000Source Architecture Code Analyzer (OSACA) tool and comparing it with LLVM-MCA.\u0000Starting from the peculiarities and up- and downsides of a single core, we\u0000extend our comparison by a variety of microbenchmarks and the capabilities of a\u0000full node. The \"write-allocate (WA) evasion\" feature, which can automatically\u0000reduce the memory traffic caused by write misses, receives special attention;\u0000we show that the Grace Superchip has a next-to-optimal implementation of WA\u0000evasion, and that the only way to avoid write allocates on Zen 4 is the\u0000explicit use of non-temporal stores.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
E-QUARTIC: Energy Efficient Edge Ensemble of Convolutional Neural Networks for Resource-Optimized Learning E-QUARTIC:用于资源优化学习的高能效边缘卷积神经网络集合
arXiv - CS - Performance Pub Date : 2024-09-12 DOI: arxiv-2409.08369
Le Zhang, Onat Gungor, Flavio Ponzina, Tajana Rosing
{"title":"E-QUARTIC: Energy Efficient Edge Ensemble of Convolutional Neural Networks for Resource-Optimized Learning","authors":"Le Zhang, Onat Gungor, Flavio Ponzina, Tajana Rosing","doi":"arxiv-2409.08369","DOIUrl":"https://doi.org/arxiv-2409.08369","url":null,"abstract":"Ensemble learning is a meta-learning approach that combines the predictions\u0000of multiple learners, demonstrating improved accuracy and robustness.\u0000Nevertheless, ensembling models like Convolutional Neural Networks (CNNs)\u0000result in high memory and computing overhead, preventing their deployment in\u0000embedded systems. These devices are usually equipped with small batteries that\u0000provide power supply and might include energy-harvesting modules that extract\u0000energy from the environment. In this work, we propose E-QUARTIC, a novel Energy\u0000Efficient Edge Ensembling framework to build ensembles of CNNs targeting\u0000Artificial Intelligence (AI)-based embedded systems. Our design outperforms\u0000single-instance CNN baselines and state-of-the-art edge AI solutions, improving\u0000accuracy and adapting to varying energy conditions while maintaining similar\u0000memory requirements. Then, we leverage the multi-CNN structure of the designed\u0000ensemble to implement an energy-aware model selection policy in\u0000energy-harvesting AI systems. We show that our solution outperforms the\u0000state-of-the-art by reducing system failure rate by up to 40% while ensuring\u0000higher average output qualities. Ultimately, we show that the proposed design\u0000enables concurrent on-device training and high-quality inference execution at\u0000the edge, limiting the performance and energy overheads to less than 0.04%.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU Inf-MLLM:在单个 GPU 上实现多模态大型语言模型的高效流推理
arXiv - CS - Performance Pub Date : 2024-09-11 DOI: arxiv-2409.09086
Zhenyu Ning, Jieru Zhao, Qihao Jin, Wenchao Ding, Minyi Guo
{"title":"Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU","authors":"Zhenyu Ning, Jieru Zhao, Qihao Jin, Wenchao Ding, Minyi Guo","doi":"arxiv-2409.09086","DOIUrl":"https://doi.org/arxiv-2409.09086","url":null,"abstract":"Multimodal Large Language Models (MLLMs) are distinguished by their\u0000multimodal comprehensive ability and widely used in many real-world\u0000applications including GPT-4o, autonomous driving and robotics. Despite their\u0000impressive performance, the multimodal inputs always incur long context. The\u0000inference under long context requires caching massive Key and Value states (KV\u0000cache) of previous tokens, which introduces high latency and excessive memory\u0000consumption. Due to this reason, it is challenging to deploy streaming\u0000inference of MLLMs on edge devices, which largely constrains the power and\u0000usage of MLLMs in real-world applications. In this paper, we introduce\u0000Inf-MLLM, an efficient inference framework for MLLMs, which enable streaming\u0000inference of MLLM on a single GPU with infinite context. Inf-MLLM is based on\u0000our key observation of the attention pattern in both LLMs and MLLMs called\u0000\"attention saddles\". Thanks to the newly discovered attention pattern, Inf-MLLM\u0000maintains a size-constrained KV cache by dynamically caching recent tokens and\u0000relevant tokens. Furthermore, Inf-MLLM proposes attention bias, a novel\u0000approach to enable MLLMs to capture long-term dependency. We show that Inf-MLLM\u0000enables multiple LLMs and MLLMs to achieve stable performance over 4M-token\u0000long texts and multi-round conversations with 1-hour-long videos on a single\u0000GPU. In addition, Inf-MLLM exhibits superior streaming reasoning quality than\u0000existing methods such as StreamingLLM and 2x speedup than H2O.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142256006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信