M. Mansurova, V. Barakhnin, Marzhan Kyrgyzbayeva, N. Kadyrbek
{"title":"Named Entity Extraction Model Based on the Random Walk Method","authors":"M. Mansurova, V. Barakhnin, Marzhan Kyrgyzbayeva, N. Kadyrbek","doi":"10.1109/SIST50301.2021.9465992","DOIUrl":null,"url":null,"abstract":"In connection with the rapid development of Internet technologies, modern society in recent decades has experienced an information explosion characterized by an exponential increase in the volume of information, including low quality information. This work is intended to provide all interested parties with intelligent tools to support decision-making by automatically extracting knowledge from heterogeneous data sources, including the Internet. In the work, we examined the primary processing and morphological analysis of texts, implemented a random walk method to extract semantically related words. As a result of the calculations, we got a matrix with the affinities of words, as well as a dictionary that connects the word with the vector component. In addition, the neural network, trained to retrieve linguistic constructions, which include the possible values of descriptors of named text entities, was described in the work.","PeriodicalId":318915,"journal":{"name":"2021 IEEE International Conference on Smart Information Systems and Technologies (SIST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Smart Information Systems and Technologies (SIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIST50301.2021.9465992","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In connection with the rapid development of Internet technologies, modern society in recent decades has experienced an information explosion characterized by an exponential increase in the volume of information, including low quality information. This work is intended to provide all interested parties with intelligent tools to support decision-making by automatically extracting knowledge from heterogeneous data sources, including the Internet. In the work, we examined the primary processing and morphological analysis of texts, implemented a random walk method to extract semantically related words. As a result of the calculations, we got a matrix with the affinities of words, as well as a dictionary that connects the word with the vector component. In addition, the neural network, trained to retrieve linguistic constructions, which include the possible values of descriptors of named text entities, was described in the work.