Methods for Intelligent Data Analysis Based on Keywords and Implicit Relations: The Case of "ISTINA" Data Analysis System

2019 Actual Problems of Systems and Software Engineering (APSSE) Pub Date : 2019-11-01 DOI:10.1109/APSSE47353.2019.00027

V. Vasenin, K. Lunev, S. Afonin, D. Shachnev

引用次数: 5

Abstract

In information analysis systems that are working with big data, there often arises a need to classify objects and calculate the degree of thematic proximity between two objects. One of the natural sources of data for solving such problems are keywords that are attributed to objects of the system. In this paper, a model for calculating the degree of thematic proximity between two keywords as well as between two sets of keywords is described. This model is based on contextual proximity between keywords, which means the number of sets where the two keywords are present together. When calculating the final proximity coefficient, such properties of keywords as abstractness degree and thematic belonging are taken into account. Various ways to use the developed model for solving practical tasks are described, on the example of "ISTINA" scientometric data analysis system in Lomonosov Moscow State University.

查看原文本刊更多论文

基于关键词和隐式关系的智能数据分析方法——以“ISTINA”数据分析系统为例

在处理大数据的信息分析系统中，经常需要对对象进行分类并计算两个对象之间的主题接近程度。用于解决此类问题的自然数据源之一是归属于系统对象的关键字。本文描述了一个计算两个关键词之间以及两组关键词之间主题接近度的模型。该模型基于关键字之间的上下文接近度，这意味着两个关键字同时出现的集合数量。在计算最终接近系数时，考虑了关键词的抽象度、主题性归属等属性。本文以莫斯科国立大学的“ISTINA”科学计量数据分析系统为例，介绍了将所开发的模型用于解决实际任务的各种方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 Actual Problems of Systems and Software Engineering (APSSE)

自引率

0.00%

发文量