机器学习与深度学习模型在泰卢固语文本分类中的比较研究

Veerraju Gampala, Jaideep Vallapuneni, Pavan Kumar Ande, Ravindra Kumar Indurthi, N. Rajesh
{"title":"机器学习与深度学习模型在泰卢固语文本分类中的比较研究","authors":"Veerraju Gampala, Jaideep Vallapuneni, Pavan Kumar Ande, Ravindra Kumar Indurthi, N. Rajesh","doi":"10.1109/ICOEI51242.2021.9453040","DOIUrl":null,"url":null,"abstract":"Nowadays, many Telugu Language documents have become available in digital form in this information era. These documents should be grouped into a class based on their content for easy retrieval of these electronic data records. Text categorization is perhaps the crucial issue in information systems concerned with text records, owing to the increasing volume of information contained in digital form. Text categorization methods have been applied to Telugu text in order to derive valuable information and insights from unstructured Telugu text. Text categorization is the method of identifying a category or several categories from a set of predefined choices for a document. Indian languages are difficult to categories because they have a lot of morphology, a lot of different word forms, and a lot of different feature spaces. Since Telugu is morphologically rich and requires special algorithms to perform morphological analysis, there hasn't been much research done on it. To construct an organized and reduced-feature lexicon, the preprocessing methods which are designed specifically for Telugu language are applied to raw data. Significant pre-processing is required to construct accurate classification model Telugu text documents. In this paper, we compare the different machine learning and deep learning classifiers performance on the Telugu text such as Naïve Bayes, Support Vector Machine (SVM), and neural network classifier.","PeriodicalId":420826,"journal":{"name":"2021 5th International Conference on Trends in Electronics and Informatics (ICOEI)","volume":"515 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Comparative Study on Telugu text Classification using Machine Learning and Deep Learning models\",\"authors\":\"Veerraju Gampala, Jaideep Vallapuneni, Pavan Kumar Ande, Ravindra Kumar Indurthi, N. Rajesh\",\"doi\":\"10.1109/ICOEI51242.2021.9453040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, many Telugu Language documents have become available in digital form in this information era. These documents should be grouped into a class based on their content for easy retrieval of these electronic data records. Text categorization is perhaps the crucial issue in information systems concerned with text records, owing to the increasing volume of information contained in digital form. Text categorization methods have been applied to Telugu text in order to derive valuable information and insights from unstructured Telugu text. Text categorization is the method of identifying a category or several categories from a set of predefined choices for a document. Indian languages are difficult to categories because they have a lot of morphology, a lot of different word forms, and a lot of different feature spaces. Since Telugu is morphologically rich and requires special algorithms to perform morphological analysis, there hasn't been much research done on it. To construct an organized and reduced-feature lexicon, the preprocessing methods which are designed specifically for Telugu language are applied to raw data. Significant pre-processing is required to construct accurate classification model Telugu text documents. In this paper, we compare the different machine learning and deep learning classifiers performance on the Telugu text such as Naïve Bayes, Support Vector Machine (SVM), and neural network classifier.\",\"PeriodicalId\":420826,\"journal\":{\"name\":\"2021 5th International Conference on Trends in Electronics and Informatics (ICOEI)\",\"volume\":\"515 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 5th International Conference on Trends in Electronics and Informatics (ICOEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOEI51242.2021.9453040\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 5th International Conference on Trends in Electronics and Informatics (ICOEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOEI51242.2021.9453040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

如今,在这个信息时代,许多泰卢固语文档都以数字形式提供。为了便于检索这些电子数据记录,应将这些文件根据其内容分组为一类。由于数字形式的信息量不断增加,文本分类可能是与文本记录有关的信息系统中的关键问题。为了从非结构化的泰卢固语文本中获得有价值的信息和见解,文本分类方法已应用于泰卢固语文本。文本分类是从一组预定义的文档选择中识别一个或几个类别的方法。印度语言很难分类,因为它们有很多词法,很多不同的词形,还有很多不同的特征空间。由于泰卢固语的形态丰富,需要特殊的算法来进行形态分析,因此对它的研究并不多。将专门为泰卢固语设计的预处理方法应用于原始数据,构建了一个有组织、特征约简的词汇库。为了构建准确的泰卢固语文本文档分类模型,需要进行大量的预处理工作。在本文中,我们比较了不同的机器学习和深度学习分类器在泰卢固语文本上的性能,如Naïve贝叶斯,支持向量机(SVM)和神经网络分类器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparative Study on Telugu text Classification using Machine Learning and Deep Learning models
Nowadays, many Telugu Language documents have become available in digital form in this information era. These documents should be grouped into a class based on their content for easy retrieval of these electronic data records. Text categorization is perhaps the crucial issue in information systems concerned with text records, owing to the increasing volume of information contained in digital form. Text categorization methods have been applied to Telugu text in order to derive valuable information and insights from unstructured Telugu text. Text categorization is the method of identifying a category or several categories from a set of predefined choices for a document. Indian languages are difficult to categories because they have a lot of morphology, a lot of different word forms, and a lot of different feature spaces. Since Telugu is morphologically rich and requires special algorithms to perform morphological analysis, there hasn't been much research done on it. To construct an organized and reduced-feature lexicon, the preprocessing methods which are designed specifically for Telugu language are applied to raw data. Significant pre-processing is required to construct accurate classification model Telugu text documents. In this paper, we compare the different machine learning and deep learning classifiers performance on the Telugu text such as Naïve Bayes, Support Vector Machine (SVM), and neural network classifier.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信