Arsenii Rasov, I. Obabkov, E. Olbrich, Ivan P. Yamshchikov
{"title":"单语无词政治宣言的文本分类","authors":"Arsenii Rasov, I. Obabkov, E. Olbrich, Ivan P. Yamshchikov","doi":"10.5220/0009792101490154","DOIUrl":null,"url":null,"abstract":"In this position paper, we implement an automatic coding algorithm for electoral programs from the Manifesto Project Database. We propose a new approach that works with new words that are out of the training vocabulary, replacing them with the words from training vocabulary that are the closest neighbors in the space of word embeddings. A set of simulations demonstrates that the proposed algorithm shows classification accuracy comparable to the state-of-the-art benchmarks for monolingual multi-label classification. The agreement levels for the algorithm is comparable with manual labeling. The results for a broad set of model hyperparameters are compared to each other.","PeriodicalId":414016,"journal":{"name":"International Conference on Complex Information Systems","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Text Classification for Monolingual Political Manifestos with Words Out of Vocabulary\",\"authors\":\"Arsenii Rasov, I. Obabkov, E. Olbrich, Ivan P. Yamshchikov\",\"doi\":\"10.5220/0009792101490154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this position paper, we implement an automatic coding algorithm for electoral programs from the Manifesto Project Database. We propose a new approach that works with new words that are out of the training vocabulary, replacing them with the words from training vocabulary that are the closest neighbors in the space of word embeddings. A set of simulations demonstrates that the proposed algorithm shows classification accuracy comparable to the state-of-the-art benchmarks for monolingual multi-label classification. The agreement levels for the algorithm is comparable with manual labeling. The results for a broad set of model hyperparameters are compared to each other.\",\"PeriodicalId\":414016,\"journal\":{\"name\":\"International Conference on Complex Information Systems\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Complex Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5220/0009792101490154\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Complex Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0009792101490154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Text Classification for Monolingual Political Manifestos with Words Out of Vocabulary
In this position paper, we implement an automatic coding algorithm for electoral programs from the Manifesto Project Database. We propose a new approach that works with new words that are out of the training vocabulary, replacing them with the words from training vocabulary that are the closest neighbors in the space of word embeddings. A set of simulations demonstrates that the proposed algorithm shows classification accuracy comparable to the state-of-the-art benchmarks for monolingual multi-label classification. The agreement levels for the algorithm is comparable with manual labeling. The results for a broad set of model hyperparameters are compared to each other.