{"title":"Comparative Study on Telugu text Classification using Machine Learning and Deep Learning models","authors":"Veerraju Gampala, Jaideep Vallapuneni, Pavan Kumar Ande, Ravindra Kumar Indurthi, N. Rajesh","doi":"10.1109/ICOEI51242.2021.9453040","DOIUrl":null,"url":null,"abstract":"Nowadays, many Telugu Language documents have become available in digital form in this information era. These documents should be grouped into a class based on their content for easy retrieval of these electronic data records. Text categorization is perhaps the crucial issue in information systems concerned with text records, owing to the increasing volume of information contained in digital form. Text categorization methods have been applied to Telugu text in order to derive valuable information and insights from unstructured Telugu text. Text categorization is the method of identifying a category or several categories from a set of predefined choices for a document. Indian languages are difficult to categories because they have a lot of morphology, a lot of different word forms, and a lot of different feature spaces. Since Telugu is morphologically rich and requires special algorithms to perform morphological analysis, there hasn't been much research done on it. To construct an organized and reduced-feature lexicon, the preprocessing methods which are designed specifically for Telugu language are applied to raw data. Significant pre-processing is required to construct accurate classification model Telugu text documents. In this paper, we compare the different machine learning and deep learning classifiers performance on the Telugu text such as Naïve Bayes, Support Vector Machine (SVM), and neural network classifier.","PeriodicalId":420826,"journal":{"name":"2021 5th International Conference on Trends in Electronics and Informatics (ICOEI)","volume":"515 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 5th International Conference on Trends in Electronics and Informatics (ICOEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOEI51242.2021.9453040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Nowadays, many Telugu Language documents have become available in digital form in this information era. These documents should be grouped into a class based on their content for easy retrieval of these electronic data records. Text categorization is perhaps the crucial issue in information systems concerned with text records, owing to the increasing volume of information contained in digital form. Text categorization methods have been applied to Telugu text in order to derive valuable information and insights from unstructured Telugu text. Text categorization is the method of identifying a category or several categories from a set of predefined choices for a document. Indian languages are difficult to categories because they have a lot of morphology, a lot of different word forms, and a lot of different feature spaces. Since Telugu is morphologically rich and requires special algorithms to perform morphological analysis, there hasn't been much research done on it. To construct an organized and reduced-feature lexicon, the preprocessing methods which are designed specifically for Telugu language are applied to raw data. Significant pre-processing is required to construct accurate classification model Telugu text documents. In this paper, we compare the different machine learning and deep learning classifiers performance on the Telugu text such as Naïve Bayes, Support Vector Machine (SVM), and neural network classifier.