Reducing Annotation Workload Using a Codebook Mapping and Its Evaluation in On-Line Handwriting

2012 International Conference on Frontiers in Handwriting Recognition Pub Date : 2012-09-18 DOI:10.1109/ICFHR.2012.259

Jinpeng Li, H. Mouchère, C. Viard-Gaudin

引用次数: 4

Abstract

The training of most of the existing recognition systems requires availability of large datasets labeled at the symbol level. However, producing ground-truth datasets is a tedious work. Two repetitive tasks have to be chained. One is to select a subset of strokes that belong to the same symbol, a next step is to assign a label to this stroke group. In this paper, we discuss a framework to reduce the human workload for labeling at the symbol level a large set of documents based on any graphical language. A hierarchical clustering is used to produce a codebook with one or several strokes per symbol, which is used for a mapping on the raw handwritten data. Evaluation is proposed on two different datasets.

查看原文本刊更多论文

利用码本映射减少在线手写标注工作量及其评价

大多数现有识别系统的训练需要在符号级别标记的大型数据集的可用性。然而，生成真实数据集是一项繁琐的工作。两个重复的任务必须连接在一起。一是选择属于同一符号的笔画子集，下一步是为这个笔画组分配一个标签。在本文中，我们讨论了一个框架，以减少人类的工作量，在符号级别标记基于任何图形语言的大量文档。分层聚类用于生成每个符号具有一个或多个笔画的码本，用于在原始手写数据上进行映射。在两个不同的数据集上进行了评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 International Conference on Frontiers in Handwriting Recognition

自引率

0.00%

发文量