SIMD k-ary Search Based Chinese Word Segmentation

Yu Jia, Yong-mei Lei, Zhuo Zhang, Yun Fang
{"title":"SIMD k-ary Search Based Chinese Word Segmentation","authors":"Yu Jia, Yong-mei Lei, Zhuo Zhang, Yun Fang","doi":"10.1109/ICIII.2011.375","DOIUrl":null,"url":null,"abstract":"Chinese word segmentation is a widely used algorithm in many fields, such as information management and search engine. Modern processors contain multiple cores which provide tremendous compute power, and each core has a SIMD processing unit. This paper reviews the traditional Chinese Word Segmentation based on dictionary query and the research on exploiting SIMD capacity of processors to improve the performance of linear search. Using SIMD has two performance benefits: one is that it can provide high parallelism, the other is to reduce the branch mis-predictions. Then we proposed a new method, which use SIMD based K-ary search algorithm to improve the performance of Chinese Word Segmentation. The SIMD based approach can accelerate the linear search process by doing 4 compare operations at one time. The average performance improvement of the new algorithm, compared to the traditional one, is 5.4% and 7.0% in two different dictionary sizes.","PeriodicalId":229533,"journal":{"name":"2011 International Conference on Information Management, Innovation Management and Industrial Engineering","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Information Management, Innovation Management and Industrial Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIII.2011.375","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Chinese word segmentation is a widely used algorithm in many fields, such as information management and search engine. Modern processors contain multiple cores which provide tremendous compute power, and each core has a SIMD processing unit. This paper reviews the traditional Chinese Word Segmentation based on dictionary query and the research on exploiting SIMD capacity of processors to improve the performance of linear search. Using SIMD has two performance benefits: one is that it can provide high parallelism, the other is to reduce the branch mis-predictions. Then we proposed a new method, which use SIMD based K-ary search algorithm to improve the performance of Chinese Word Segmentation. The SIMD based approach can accelerate the linear search process by doing 4 compare operations at one time. The average performance improvement of the new algorithm, compared to the traditional one, is 5.4% and 7.0% in two different dictionary sizes.
基于SIMD k-ary搜索的中文分词
中文分词是一种广泛应用于信息管理、搜索引擎等领域的算法。现代处理器包含多个内核,这些内核提供了巨大的计算能力,每个内核都有一个SIMD处理单元。本文综述了传统的基于字典查询的中文分词方法,以及利用处理器的SIMD容量来提高线性搜索性能的研究。使用SIMD有两个性能优势:一个是它可以提供高并行性,另一个是减少分支错误预测。在此基础上,提出了一种利用基于SIMD的K-ary搜索算法提高中文分词性能的新方法。基于SIMD的方法可以通过一次执行4个比较操作来加速线性搜索过程。与传统算法相比,新算法在两种不同字典大小下的平均性能提升分别为5.4%和7.0%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信