基于字形形状相似性的辅助unicode汉字符查找服务

Jeng-Wei Lin, Feng-Sheng Lin
{"title":"基于字形形状相似性的辅助unicode汉字符查找服务","authors":"Jeng-Wei Lin, Feng-Sheng Lin","doi":"10.1109/ISCIT.2011.6092155","DOIUrl":null,"url":null,"abstract":"Most legacy computer systems only well support input and display of 20,902 Han characters (Hanzis for short) encoded in Unicode 1.0. In 2010, Unicode 6.0 has encoded 75,616 Hanzis. However, it is not easy to use these newly encoded Hanzis, even in the latest computers. Most of these newly encoded Hanzis are rarely used in daily lives. Some are only used in ancient literature or individual Sinospherical countries. Users may have confusion of their glyph shapes, pronunciations, meanings, and usages. Most Chinese IMEs (input method editors) require users to have good knowledge of Hanzis. As a result, users cannot input these Hanzis. We present an auxiliary Unicode Hanzi lookup service based on glyph shape similarity. One can key in a similar Hanzi by any IME to look up the wanted Hanzi. Each Unicode Hanzi is decomposed as a glyph expression. The similarity of glyph shapes of two Hanzis is calculated based on a derived edit distance on their glyph expressions. As a result, the system provides users a convenient way to look up unfamiliar Hanzis.","PeriodicalId":226552,"journal":{"name":"2011 11th International Symposium on Communications & Information Technologies (ISCIT)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An auxiliary unicode Han character lookup service based on glyph shape similarity\",\"authors\":\"Jeng-Wei Lin, Feng-Sheng Lin\",\"doi\":\"10.1109/ISCIT.2011.6092155\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most legacy computer systems only well support input and display of 20,902 Han characters (Hanzis for short) encoded in Unicode 1.0. In 2010, Unicode 6.0 has encoded 75,616 Hanzis. However, it is not easy to use these newly encoded Hanzis, even in the latest computers. Most of these newly encoded Hanzis are rarely used in daily lives. Some are only used in ancient literature or individual Sinospherical countries. Users may have confusion of their glyph shapes, pronunciations, meanings, and usages. Most Chinese IMEs (input method editors) require users to have good knowledge of Hanzis. As a result, users cannot input these Hanzis. We present an auxiliary Unicode Hanzi lookup service based on glyph shape similarity. One can key in a similar Hanzi by any IME to look up the wanted Hanzi. Each Unicode Hanzi is decomposed as a glyph expression. The similarity of glyph shapes of two Hanzis is calculated based on a derived edit distance on their glyph expressions. As a result, the system provides users a convenient way to look up unfamiliar Hanzis.\",\"PeriodicalId\":226552,\"journal\":{\"name\":\"2011 11th International Symposium on Communications & Information Technologies (ISCIT)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 11th International Symposium on Communications & Information Technologies (ISCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCIT.2011.6092155\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 11th International Symposium on Communications & Information Technologies (ISCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCIT.2011.6092155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

大多数传统计算机系统只支持以Unicode 1.0编码的20,902个汉字(简称汉字)的输入和显示。在2010年,Unicode 6.0已经编码了75,616个汉字。然而,即使在最新的计算机中,使用这些新编码的汉字也不容易。这些新编码的汉字大多在日常生活中很少使用。有些仅用于古代文学或个别中国国家。用户可能会混淆它们的字形形状、发音、含义和用法。大多数中文输入法编辑器都要求用户对汉字有很好的了解。因此,用户无法输入这些汉字。提出了一种基于字形相似度的辅助Unicode汉字查找服务。用户可以通过任意输入输入类似的汉字来查找通缉的汉字。每个Unicode汉字被分解为一个字形表达式。根据两个汉字字形表达式的编辑距离,计算两个汉字字形形状的相似度。因此,该系统为用户提供了一种方便的方式来查找不熟悉的汉字。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An auxiliary unicode Han character lookup service based on glyph shape similarity
Most legacy computer systems only well support input and display of 20,902 Han characters (Hanzis for short) encoded in Unicode 1.0. In 2010, Unicode 6.0 has encoded 75,616 Hanzis. However, it is not easy to use these newly encoded Hanzis, even in the latest computers. Most of these newly encoded Hanzis are rarely used in daily lives. Some are only used in ancient literature or individual Sinospherical countries. Users may have confusion of their glyph shapes, pronunciations, meanings, and usages. Most Chinese IMEs (input method editors) require users to have good knowledge of Hanzis. As a result, users cannot input these Hanzis. We present an auxiliary Unicode Hanzi lookup service based on glyph shape similarity. One can key in a similar Hanzi by any IME to look up the wanted Hanzi. Each Unicode Hanzi is decomposed as a glyph expression. The similarity of glyph shapes of two Hanzis is calculated based on a derived edit distance on their glyph expressions. As a result, the system provides users a convenient way to look up unfamiliar Hanzis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信