Improving Accessibility of Archived Raster Dictionaries of Complex Script Languages

Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries Pub Date : 2014-09-03 DOI:10.1145/2756406.2756926

Sawood Alam, Fateh ud din B. Mehmood, Michael L. Nelson

{"title":"Improving Accessibility of Archived Raster Dictionaries of Complex Script Languages","authors":"Sawood Alam, Fateh ud din B. Mehmood, Michael L. Nelson","doi":"10.1145/2756406.2756926","DOIUrl":null,"url":null,"abstract":"We propose an approach to index raster images of dictionary pages which in turn would require very little manual effort to enable direct access to the appropriate pages of the dictionary for lookup. Accessibility is further improved by feedback and crowdsourcing that enables highlighting of the specific location on the page where the lookup word is found, annotation, digitization, and fielded searching. This approach is equally applicable on simple scripts as well as complex writing systems. Using our proposed approach, we have built a Web application called \"Dictionary Explorer\" which supports word indexes in various languages and every language can have multiple dictionaries associated with it. Word lookup gives direct access to appropriate pages of all the dictionaries of that language simultaneously. The application has exploration features like searching, pagination, and navigating the word index through a tree-like interface. The application also supports feedback, annotation, and digitization features. Apart from the scanned images, \"Dictionary Explorer\" aggregates results from various sources and user contributions in Unicode. We have evaluated the time required for indexing dictionaries of different sizes and complexities in the Urdu language and examined various trade-offs in our implementation. Using our approach, a single person can make a dictionary of 1,000 pages searchable in less than an hour.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2756406.2756926","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

We propose an approach to index raster images of dictionary pages which in turn would require very little manual effort to enable direct access to the appropriate pages of the dictionary for lookup. Accessibility is further improved by feedback and crowdsourcing that enables highlighting of the specific location on the page where the lookup word is found, annotation, digitization, and fielded searching. This approach is equally applicable on simple scripts as well as complex writing systems. Using our proposed approach, we have built a Web application called "Dictionary Explorer" which supports word indexes in various languages and every language can have multiple dictionaries associated with it. Word lookup gives direct access to appropriate pages of all the dictionaries of that language simultaneously. The application has exploration features like searching, pagination, and navigating the word index through a tree-like interface. The application also supports feedback, annotation, and digitization features. Apart from the scanned images, "Dictionary Explorer" aggregates results from various sources and user contributions in Unicode. We have evaluated the time required for indexing dictionaries of different sizes and complexities in the Urdu language and examined various trade-offs in our implementation. Using our approach, a single person can make a dictionary of 1,000 pages searchable in less than an hour.

查看原文本刊更多论文

改进复杂脚本语言的栅格字典存档的可访问性

我们提出了一种索引字典页面的栅格图像的方法，这种方法反过来只需要很少的手工工作，就可以直接访问字典的适当页面进行查找。可访问性通过反馈和众包进一步得到改善，这些反馈和众包支持在找到查找词的页面上突出显示特定位置、注释、数字化和字段搜索。这种方法同样适用于简单的脚本和复杂的书写系统。使用我们提出的方法，我们已经构建了一个名为“字典资源管理器”的Web应用程序，它支持各种语言的单词索引，每种语言都可以有多个与之关联的字典。单词查找可以同时直接访问该语言的所有字典的适当页面。该应用程序具有搜索、分页和通过树状界面导航单词索引等探索特性。该应用程序还支持反馈、注释和数字化功能。除了扫描的图像外，“字典资源管理器”还聚合了来自不同来源的结果和Unicode中的用户贡献。我们已经评估了索引不同大小和复杂性的Urdu语言字典所需的时间，并检查了实现中的各种权衡。使用我们的方法，一个人可以在不到一个小时的时间里完成一本1000页的词典的搜索。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries

自引率

0.00%

发文量