联机手写文件的分类

2008 The Eighth IAPR International Workshop on Document Analysis Systems Pub Date : 2008-09-16 DOI:10.1109/DAS.2008.45

Sebastián Peña Saldarriaga, E. Morin, C. Viard-Gaudin

{"title":"联机手写文件的分类","authors":"Sebastián Peña Saldarriaga, E. Morin, C. Viard-Gaudin","doi":"10.1109/DAS.2008.45","DOIUrl":null,"url":null,"abstract":"With the growth of on-line handwriting technologies, managing facilities for handwritten documents, such as retrieval of documents by topic, are required. These documents can contain graphics, equations or text for instance. This work reports experiments on categorization of on-line handwritten documents based on their textual contents. We assume that handwritten text blocks have been extracted from the documents, and as a first step of the proposed system, we process them with an existing handwritten recognition engine. We analyse the effect of the word recognition rate on the categorization performances, and we compare them with those obtained with the same texts available as ground truth. Two categorization algorithms (kNN and SVM) are compared in this work. The handwritten texts are a subset of the Reuters-21578 corpus collected from more than 1500 writers. Results show that there is no significant categorization performance loss when the word error rate stands below 22%.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"663 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Categorization of On-Line Handwritten Documents\",\"authors\":\"Sebastián Peña Saldarriaga, E. Morin, C. Viard-Gaudin\",\"doi\":\"10.1109/DAS.2008.45\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the growth of on-line handwriting technologies, managing facilities for handwritten documents, such as retrieval of documents by topic, are required. These documents can contain graphics, equations or text for instance. This work reports experiments on categorization of on-line handwritten documents based on their textual contents. We assume that handwritten text blocks have been extracted from the documents, and as a first step of the proposed system, we process them with an existing handwritten recognition engine. We analyse the effect of the word recognition rate on the categorization performances, and we compare them with those obtained with the same texts available as ground truth. Two categorization algorithms (kNN and SVM) are compared in this work. The handwritten texts are a subset of the Reuters-21578 corpus collected from more than 1500 writers. Results show that there is no significant categorization performance loss when the word error rate stands below 22%.\",\"PeriodicalId\":423207,\"journal\":{\"name\":\"2008 The Eighth IAPR International Workshop on Document Analysis Systems\",\"volume\":\"663 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 The Eighth IAPR International Workshop on Document Analysis Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DAS.2008.45\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS.2008.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

随着在线手写技术的发展，需要手写文档的管理工具，例如按主题检索文档。例如，这些文档可以包含图形、方程或文本。本文报道了基于文本内容对在线手写文档进行分类的实验。我们假设已经从文档中提取了手写文本块，并且作为提议系统的第一步，我们使用现有的手写识别引擎处理它们。我们分析了单词识别率对分类性能的影响，并将其与使用相同文本作为基础真值获得的分类性能进行了比较。本文对两种分类算法(kNN和SVM)进行了比较。这些手写文本是路透社21578语料库的一个子集，这些语料库收集了1500多名作家。结果表明，当单词错误率低于22%时，分类性能没有明显下降。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Categorization of On-Line Handwritten Documents

With the growth of on-line handwriting technologies, managing facilities for handwritten documents, such as retrieval of documents by topic, are required. These documents can contain graphics, equations or text for instance. This work reports experiments on categorization of on-line handwritten documents based on their textual contents. We assume that handwritten text blocks have been extracted from the documents, and as a first step of the proposed system, we process them with an existing handwritten recognition engine. We analyse the effect of the word recognition rate on the categorization performances, and we compare them with those obtained with the same texts available as ground truth. Two categorization algorithms (kNN and SVM) are compared in this work. The handwritten texts are a subset of the Reuters-21578 corpus collected from more than 1500 writers. Results show that there is no significant categorization performance loss when the word error rate stands below 22%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 The Eighth IAPR International Workshop on Document Analysis Systems

自引率

0.00%

发文量