Hindi Text Document Classification System Using SVM and Fuzzy: A Survey

Int. J. Rough Sets Data Anal. Pub Date : 2018-10-01 DOI:10.4018/IJRSDA.2018100101

Shalini Puri, S. Singh

引用次数: 14

Abstract

In recent years, many information retrieval, character recognition, and feature extraction methodologies in Devanagari and especially in Hindi have been proposed for different domain areas. Due to enormous scanned data availability and to provide an advanced improvement of existing Hindi automated systems beyond optical character recognition, a new idea of Hindi printed and handwritten document classification system using support vector machine and fuzzy logic is introduced. This first pre-processes and then classifies textual imaged documents into predefined categories. With this concept, this article depicts a feasibility study of such systems with the relevance of Hindi, a survey report of statistical measurements of Hindi keywords obtained from different sources, and the inherent challenges found in printed and handwritten documents. The technical reviews are provided and graphically represented to compare many parameters and estimate contents, forms and classifiers used in various existing techniques.

查看原文本刊更多论文

基于支持向量机和模糊的印地语文本文档分类系统综述

近年来，针对不同的领域提出了许多德文语，特别是印地语的信息检索、字符识别和特征提取方法。由于大量的扫描数据的可用性，并提供现有的印地文自动化系统超越光学字符识别的先进改进，提出了一种基于支持向量机和模糊逻辑的印地文印刷和手写文档分类系统的新思路。首先对文本图像文档进行预处理，然后将其分类为预定义的类别。根据这个概念，本文描述了这种系统与印地语相关性的可行性研究，从不同来源获得的印地语关键字统计测量的调查报告，以及在印刷和手写文档中发现的固有挑战。技术评论提供和图形表示，以比较许多参数和估计内容，形式和分类器在各种现有技术中使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Int. J. Rough Sets Data Anal.

自引率

0.00%

发文量