Improving Accessibility of Archived Raster Dictionaries of Complex Script Languages

Sawood Alam, Fateh ud din B. Mehmood, Michael L. Nelson
{"title":"Improving Accessibility of Archived Raster Dictionaries of Complex Script Languages","authors":"Sawood Alam, Fateh ud din B. Mehmood, Michael L. Nelson","doi":"10.1145/2756406.2756926","DOIUrl":null,"url":null,"abstract":"We propose an approach to index raster images of dictionary pages which in turn would require very little manual effort to enable direct access to the appropriate pages of the dictionary for lookup. Accessibility is further improved by feedback and crowdsourcing that enables highlighting of the specific location on the page where the lookup word is found, annotation, digitization, and fielded searching. This approach is equally applicable on simple scripts as well as complex writing systems. Using our proposed approach, we have built a Web application called \"Dictionary Explorer\" which supports word indexes in various languages and every language can have multiple dictionaries associated with it. Word lookup gives direct access to appropriate pages of all the dictionaries of that language simultaneously. The application has exploration features like searching, pagination, and navigating the word index through a tree-like interface. The application also supports feedback, annotation, and digitization features. Apart from the scanned images, \"Dictionary Explorer\" aggregates results from various sources and user contributions in Unicode. We have evaluated the time required for indexing dictionaries of different sizes and complexities in the Urdu language and examined various trade-offs in our implementation. Using our approach, a single person can make a dictionary of 1,000 pages searchable in less than an hour.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2756406.2756926","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

We propose an approach to index raster images of dictionary pages which in turn would require very little manual effort to enable direct access to the appropriate pages of the dictionary for lookup. Accessibility is further improved by feedback and crowdsourcing that enables highlighting of the specific location on the page where the lookup word is found, annotation, digitization, and fielded searching. This approach is equally applicable on simple scripts as well as complex writing systems. Using our proposed approach, we have built a Web application called "Dictionary Explorer" which supports word indexes in various languages and every language can have multiple dictionaries associated with it. Word lookup gives direct access to appropriate pages of all the dictionaries of that language simultaneously. The application has exploration features like searching, pagination, and navigating the word index through a tree-like interface. The application also supports feedback, annotation, and digitization features. Apart from the scanned images, "Dictionary Explorer" aggregates results from various sources and user contributions in Unicode. We have evaluated the time required for indexing dictionaries of different sizes and complexities in the Urdu language and examined various trade-offs in our implementation. Using our approach, a single person can make a dictionary of 1,000 pages searchable in less than an hour.
改进复杂脚本语言的栅格字典存档的可访问性
我们提出了一种索引字典页面的栅格图像的方法,这种方法反过来只需要很少的手工工作,就可以直接访问字典的适当页面进行查找。可访问性通过反馈和众包进一步得到改善,这些反馈和众包支持在找到查找词的页面上突出显示特定位置、注释、数字化和字段搜索。这种方法同样适用于简单的脚本和复杂的书写系统。使用我们提出的方法,我们已经构建了一个名为“字典资源管理器”的Web应用程序,它支持各种语言的单词索引,每种语言都可以有多个与之关联的字典。单词查找可以同时直接访问该语言的所有字典的适当页面。该应用程序具有搜索、分页和通过树状界面导航单词索引等探索特性。该应用程序还支持反馈、注释和数字化功能。除了扫描的图像外,“字典资源管理器”还聚合了来自不同来源的结果和Unicode中的用户贡献。我们已经评估了索引不同大小和复杂性的Urdu语言字典所需的时间,并检查了实现中的各种权衡。使用我们的方法,一个人可以在不到一个小时的时间里完成一本1000页的词典的搜索。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信