{"title":"Topic classification in Romanian blogosphere","authors":"A. Vasile, Roxana Rădulescu, I. Pavaloiu","doi":"10.1109/NEUREL.2014.7011480","DOIUrl":null,"url":null,"abstract":"In this paper we analyze the performance of several methods for classification applied to the Romanian blogosphere. Blogs are difficult to categorize by humans and machines alike, because they are written in a changeable style. In the early days of web, directories maintained by humans could not keep up millions the websites; likewise, blog directories cannot keep up with the explosive growth of the blogsphere. This paper investigates the efficacy of using machine learning to categorize blogs written in Romanian language belonging to the Romanian blogosphere. We design a text classification experiment to categorize Romanian blogs into nine topics. The baseline feature is unigrams weighed by TF-IDF. We analyze the corpus, features, and the result data.","PeriodicalId":402208,"journal":{"name":"12th Symposium on Neural Network Applications in Electrical Engineering (NEUREL)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"12th Symposium on Neural Network Applications in Electrical Engineering (NEUREL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NEUREL.2014.7011480","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
In this paper we analyze the performance of several methods for classification applied to the Romanian blogosphere. Blogs are difficult to categorize by humans and machines alike, because they are written in a changeable style. In the early days of web, directories maintained by humans could not keep up millions the websites; likewise, blog directories cannot keep up with the explosive growth of the blogsphere. This paper investigates the efficacy of using machine learning to categorize blogs written in Romanian language belonging to the Romanian blogosphere. We design a text classification experiment to categorize Romanian blogs into nine topics. The baseline feature is unigrams weighed by TF-IDF. We analyze the corpus, features, and the result data.