{"title":"基于SIMD k-ary搜索的中文分词","authors":"Yu Jia, Yong-mei Lei, Zhuo Zhang, Yun Fang","doi":"10.1109/ICIII.2011.375","DOIUrl":null,"url":null,"abstract":"Chinese word segmentation is a widely used algorithm in many fields, such as information management and search engine. Modern processors contain multiple cores which provide tremendous compute power, and each core has a SIMD processing unit. This paper reviews the traditional Chinese Word Segmentation based on dictionary query and the research on exploiting SIMD capacity of processors to improve the performance of linear search. Using SIMD has two performance benefits: one is that it can provide high parallelism, the other is to reduce the branch mis-predictions. Then we proposed a new method, which use SIMD based K-ary search algorithm to improve the performance of Chinese Word Segmentation. The SIMD based approach can accelerate the linear search process by doing 4 compare operations at one time. The average performance improvement of the new algorithm, compared to the traditional one, is 5.4% and 7.0% in two different dictionary sizes.","PeriodicalId":229533,"journal":{"name":"2011 International Conference on Information Management, Innovation Management and Industrial Engineering","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SIMD k-ary Search Based Chinese Word Segmentation\",\"authors\":\"Yu Jia, Yong-mei Lei, Zhuo Zhang, Yun Fang\",\"doi\":\"10.1109/ICIII.2011.375\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Chinese word segmentation is a widely used algorithm in many fields, such as information management and search engine. Modern processors contain multiple cores which provide tremendous compute power, and each core has a SIMD processing unit. This paper reviews the traditional Chinese Word Segmentation based on dictionary query and the research on exploiting SIMD capacity of processors to improve the performance of linear search. Using SIMD has two performance benefits: one is that it can provide high parallelism, the other is to reduce the branch mis-predictions. Then we proposed a new method, which use SIMD based K-ary search algorithm to improve the performance of Chinese Word Segmentation. The SIMD based approach can accelerate the linear search process by doing 4 compare operations at one time. The average performance improvement of the new algorithm, compared to the traditional one, is 5.4% and 7.0% in two different dictionary sizes.\",\"PeriodicalId\":229533,\"journal\":{\"name\":\"2011 International Conference on Information Management, Innovation Management and Industrial Engineering\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Information Management, Innovation Management and Industrial Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIII.2011.375\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Information Management, Innovation Management and Industrial Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIII.2011.375","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Chinese word segmentation is a widely used algorithm in many fields, such as information management and search engine. Modern processors contain multiple cores which provide tremendous compute power, and each core has a SIMD processing unit. This paper reviews the traditional Chinese Word Segmentation based on dictionary query and the research on exploiting SIMD capacity of processors to improve the performance of linear search. Using SIMD has two performance benefits: one is that it can provide high parallelism, the other is to reduce the branch mis-predictions. Then we proposed a new method, which use SIMD based K-ary search algorithm to improve the performance of Chinese Word Segmentation. The SIMD based approach can accelerate the linear search process by doing 4 compare operations at one time. The average performance improvement of the new algorithm, compared to the traditional one, is 5.4% and 7.0% in two different dictionary sizes.