Reducing Annotation Workload Using a Codebook Mapping and Its Evaluation in On-Line Handwriting

Jinpeng Li, H. Mouchère, C. Viard-Gaudin
{"title":"Reducing Annotation Workload Using a Codebook Mapping and Its Evaluation in On-Line Handwriting","authors":"Jinpeng Li, H. Mouchère, C. Viard-Gaudin","doi":"10.1109/ICFHR.2012.259","DOIUrl":null,"url":null,"abstract":"The training of most of the existing recognition systems requires availability of large datasets labeled at the symbol level. However, producing ground-truth datasets is a tedious work. Two repetitive tasks have to be chained. One is to select a subset of strokes that belong to the same symbol, a next step is to assign a label to this stroke group. In this paper, we discuss a framework to reduce the human workload for labeling at the symbol level a large set of documents based on any graphical language. A hierarchical clustering is used to produce a codebook with one or several strokes per symbol, which is used for a mapping on the raw handwritten data. Evaluation is proposed on two different datasets.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Frontiers in Handwriting Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2012.259","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

The training of most of the existing recognition systems requires availability of large datasets labeled at the symbol level. However, producing ground-truth datasets is a tedious work. Two repetitive tasks have to be chained. One is to select a subset of strokes that belong to the same symbol, a next step is to assign a label to this stroke group. In this paper, we discuss a framework to reduce the human workload for labeling at the symbol level a large set of documents based on any graphical language. A hierarchical clustering is used to produce a codebook with one or several strokes per symbol, which is used for a mapping on the raw handwritten data. Evaluation is proposed on two different datasets.
利用码本映射减少在线手写标注工作量及其评价
大多数现有识别系统的训练需要在符号级别标记的大型数据集的可用性。然而,生成真实数据集是一项繁琐的工作。两个重复的任务必须连接在一起。一是选择属于同一符号的笔画子集,下一步是为这个笔画组分配一个标签。在本文中,我们讨论了一个框架,以减少人类的工作量,在符号级别标记基于任何图形语言的大量文档。分层聚类用于生成每个符号具有一个或多个笔画的码本,用于在原始手写数据上进行映射。在两个不同的数据集上进行了评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信