Transcoding unicode characters with AVX‐512 instructions

Robert Clausecker, Daniel Lemire
{"title":"Transcoding unicode characters with AVX‐512 instructions","authors":"Robert Clausecker, Daniel Lemire","doi":"10.1002/spe.3261","DOIUrl":null,"url":null,"abstract":"Intel includes in its recent processors a powerful set of instructions capable of processing 512‐bit registers with a single instruction (AVX‐512). Some of these instructions have no equivalent in earlier instruction sets. We leverage these instructions to efficiently transcode strings between the most common formats: UTF‐8 and UTF‐16. With our novel algorithms, we are often twice as fast as the previous best solutions. For example, we transcode Chinese text from UTF‐8 to UTF‐16 at more than 5 GiB s−1$$ {\\mathrm{s}}^{-1} $$ using fewer than 2 CPU instructions per character. To ensure reproducibility, we make our software freely available as an open‐source library. Our library is part of the popular Node.js JavaScript runtime.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spe.3261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Intel includes in its recent processors a powerful set of instructions capable of processing 512‐bit registers with a single instruction (AVX‐512). Some of these instructions have no equivalent in earlier instruction sets. We leverage these instructions to efficiently transcode strings between the most common formats: UTF‐8 and UTF‐16. With our novel algorithms, we are often twice as fast as the previous best solutions. For example, we transcode Chinese text from UTF‐8 to UTF‐16 at more than 5 GiB s−1$$ {\mathrm{s}}^{-1} $$ using fewer than 2 CPU instructions per character. To ensure reproducibility, we make our software freely available as an open‐source library. Our library is part of the popular Node.js JavaScript runtime.
用AVX‐512指令转码unicode字符
英特尔在其最新的处理器中包含了一组功能强大的指令,能够用一条指令(AVX‐512)处理512位寄存器。其中一些指令在早期的指令集中没有等价的。我们利用这些指令在最常见的格式:UTF‐8和UTF‐16之间有效地转码字符串。使用我们的新算法,我们的速度通常是之前最佳解决方案的两倍。例如,我们将中文文本从UTF - 8转码到UTF - 16的速度超过5 GiB,每个字符使用的CPU指令少于2条。为了确保可复制性,我们将我们的软件作为开源库免费提供。我们的库是流行的Node.js JavaScript运行时的一部分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信