Best practices for convolutional neural networks applied to visual document analysis

Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings. Pub Date : 2003-08-03 DOI:10.1109/ICDAR.2003.1227801

P. Simard, David Steinkraus, John C. Platt

{"title":"Best practices for convolutional neural networks applied to visual document analysis","authors":"P. Simard, David Steinkraus, John C. Platt","doi":"10.1109/ICDAR.2003.1227801","DOIUrl":null,"url":null,"abstract":"Neural networks are a powerful technology forclassification of visual inputs arising from documents.However, there is a confusing plethora of different neuralnetwork methods that are used in the literature and inindustry. This paper describes a set of concrete bestpractices that document analysis researchers can use toget good results with neural networks. The mostimportant practice is getting a training set as large aspossible: we expand the training set by adding a newform of distorted data. The next most important practiceis that convolutional neural networks are better suited forvisual document tasks than fully connected networks. Wepropose that a simple \"do-it-yourself\" implementation ofconvolution with a flexible architecture is suitable formany visual document problems. This simpleconvolutional neural network does not require complexmethods, such as momentum, weight decay, structure-dependentlearning rates, averaging layers, tangent prop,or even finely-tuning the architecture. The end result is avery simple yet general architecture which can yieldstate-of-the-art performance for document analysis. Weillustrate our claims on the MNIST set of English digitimages.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"4 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2755","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2003.1227801","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2755

Abstract

Neural networks are a powerful technology forclassification of visual inputs arising from documents.However, there is a confusing plethora of different neuralnetwork methods that are used in the literature and inindustry. This paper describes a set of concrete bestpractices that document analysis researchers can use toget good results with neural networks. The mostimportant practice is getting a training set as large aspossible: we expand the training set by adding a newform of distorted data. The next most important practiceis that convolutional neural networks are better suited forvisual document tasks than fully connected networks. Wepropose that a simple "do-it-yourself" implementation ofconvolution with a flexible architecture is suitable formany visual document problems. This simpleconvolutional neural network does not require complexmethods, such as momentum, weight decay, structure-dependentlearning rates, averaging layers, tangent prop,or even finely-tuning the architecture. The end result is avery simple yet general architecture which can yieldstate-of-the-art performance for document analysis. Weillustrate our claims on the MNIST set of English digitimages.

查看原文本刊更多论文

卷积神经网络应用于可视化文档分析的最佳实践

神经网络是一种强大的技术，用于分类来自文档的视觉输入。然而，在文献和工业中使用的不同的神经网络方法令人困惑。本文描述了一组具体的最佳实践，文件分析研究人员可以使用神经网络获得良好的结果。最重要的实践是获得尽可能大的训练集:我们通过添加新形式的扭曲数据来扩展训练集。下一个最重要的实践是，卷积神经网络比完全连接的网络更适合于视觉文档任务。我们提出一个简单的“自己动手”的卷积实现，具有灵活的架构，适用于许多可视化文档问题。这个简单的卷积神经网络不需要复杂的方法，比如动量、权重衰减、结构相关学习率、平均层、切线支撑，甚至微调架构。最终的结果是非常简单而通用的架构，可以为文档分析提供最先进的性能。我们用MNIST的英语数字图像集来说明我们的主张。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.

自引率

0.00%

发文量