Creation of Necessary Technical and Expert- Analytical Conditions for Development of the Information System of Evaluating Open Text Information Sources’ Influence on Society
R. Mussabayev, Bek Kassymzhanov, Aidos Mukashev, Viktoriya Ibrayeva, Azat Merkebayev
{"title":"Creation of Necessary Technical and Expert- Analytical Conditions for Development of the Information System of Evaluating Open Text Information Sources’ Influence on Society","authors":"R. Mussabayev, Bek Kassymzhanov, Aidos Mukashev, Viktoriya Ibrayeva, Azat Merkebayev","doi":"10.1109/OPCS.2019.8880193","DOIUrl":null,"url":null,"abstract":"In this paper, we trained distributional models (patterns) for text preprocessing in Word2vec and Glove. Three variants of text preprocessing were used to train distributional patterns. Based on the implemented distribution model Word2Vec, a vector representation was obtained for a cluster-separated test sample of 30 news items. All variants of the weighted average calculation of the vector representation of texts were considered. Two-stage clustering was carried out. After training the Doc2Vec model on normalized documents, a vector representation was obtained for each document. The following news about the same event was selected for the test, but from different sources. A 2-dimensional “factual cube” was analyzed.","PeriodicalId":288547,"journal":{"name":"2019 15th International Asian School-Seminar Optimization Problems of Complex Systems (OPCS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 15th International Asian School-Seminar Optimization Problems of Complex Systems (OPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OPCS.2019.8880193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we trained distributional models (patterns) for text preprocessing in Word2vec and Glove. Three variants of text preprocessing were used to train distributional patterns. Based on the implemented distribution model Word2Vec, a vector representation was obtained for a cluster-separated test sample of 30 news items. All variants of the weighted average calculation of the vector representation of texts were considered. Two-stage clustering was carried out. After training the Doc2Vec model on normalized documents, a vector representation was obtained for each document. The following news about the same event was selected for the test, but from different sources. A 2-dimensional “factual cube” was analyzed.