{"title":"Text Categorization of Telugu News Headlines","authors":"Vukyam Sri Sravya, Sachin Kumar S, K. Soman","doi":"10.1109/CONIT55038.2022.9847875","DOIUrl":null,"url":null,"abstract":"The era of digitization has generated huge amounts of data in every field in the range of petabytes, and the news is one of them. To adopt a classification technique using only human intervention is impossible and also like many other Indian languages, the Telugu language is belonging to the Dravidian family which is rich in morphological content. While Natural Language Processing deals with the textual format of data, different types of word embedding features are considered and passed to the models. Existing work on this problem statement is accomplished only with count-based algorithm word embeddings. In this study, several methods were performed to obtain the best model for categorization of the newspaper articles. These methods include building custom-based Machine Learning and Deep Learning models with both count and prediction based word embeddings.","PeriodicalId":270445,"journal":{"name":"2022 2nd International Conference on Intelligent Technologies (CONIT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Intelligent Technologies (CONIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONIT55038.2022.9847875","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The era of digitization has generated huge amounts of data in every field in the range of petabytes, and the news is one of them. To adopt a classification technique using only human intervention is impossible and also like many other Indian languages, the Telugu language is belonging to the Dravidian family which is rich in morphological content. While Natural Language Processing deals with the textual format of data, different types of word embedding features are considered and passed to the models. Existing work on this problem statement is accomplished only with count-based algorithm word embeddings. In this study, several methods were performed to obtain the best model for categorization of the newspaper articles. These methods include building custom-based Machine Learning and Deep Learning models with both count and prediction based word embeddings.