R. Mussabayev, Bek Kassymzhanov, Aidos Mukashev, Viktoriya Ibrayeva, Azat Merkebayev
{"title":"创造必要的技术和专家分析条件——开发评估开放文本信息源对社会影响的信息系统","authors":"R. Mussabayev, Bek Kassymzhanov, Aidos Mukashev, Viktoriya Ibrayeva, Azat Merkebayev","doi":"10.1109/OPCS.2019.8880193","DOIUrl":null,"url":null,"abstract":"In this paper, we trained distributional models (patterns) for text preprocessing in Word2vec and Glove. Three variants of text preprocessing were used to train distributional patterns. Based on the implemented distribution model Word2Vec, a vector representation was obtained for a cluster-separated test sample of 30 news items. All variants of the weighted average calculation of the vector representation of texts were considered. Two-stage clustering was carried out. After training the Doc2Vec model on normalized documents, a vector representation was obtained for each document. The following news about the same event was selected for the test, but from different sources. A 2-dimensional “factual cube” was analyzed.","PeriodicalId":288547,"journal":{"name":"2019 15th International Asian School-Seminar Optimization Problems of Complex Systems (OPCS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Creation of Necessary Technical and Expert- Analytical Conditions for Development of the Information System of Evaluating Open Text Information Sources’ Influence on Society\",\"authors\":\"R. Mussabayev, Bek Kassymzhanov, Aidos Mukashev, Viktoriya Ibrayeva, Azat Merkebayev\",\"doi\":\"10.1109/OPCS.2019.8880193\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we trained distributional models (patterns) for text preprocessing in Word2vec and Glove. Three variants of text preprocessing were used to train distributional patterns. Based on the implemented distribution model Word2Vec, a vector representation was obtained for a cluster-separated test sample of 30 news items. All variants of the weighted average calculation of the vector representation of texts were considered. Two-stage clustering was carried out. After training the Doc2Vec model on normalized documents, a vector representation was obtained for each document. The following news about the same event was selected for the test, but from different sources. A 2-dimensional “factual cube” was analyzed.\",\"PeriodicalId\":288547,\"journal\":{\"name\":\"2019 15th International Asian School-Seminar Optimization Problems of Complex Systems (OPCS)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 15th International Asian School-Seminar Optimization Problems of Complex Systems (OPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/OPCS.2019.8880193\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 15th International Asian School-Seminar Optimization Problems of Complex Systems (OPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OPCS.2019.8880193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Creation of Necessary Technical and Expert- Analytical Conditions for Development of the Information System of Evaluating Open Text Information Sources’ Influence on Society
In this paper, we trained distributional models (patterns) for text preprocessing in Word2vec and Glove. Three variants of text preprocessing were used to train distributional patterns. Based on the implemented distribution model Word2Vec, a vector representation was obtained for a cluster-separated test sample of 30 news items. All variants of the weighted average calculation of the vector representation of texts were considered. Two-stage clustering was carried out. After training the Doc2Vec model on normalized documents, a vector representation was obtained for each document. The following news about the same event was selected for the test, but from different sources. A 2-dimensional “factual cube” was analyzed.