Analysis of standard clustering algorithms for grouping MEDLINE abstracts into evidence-based medicine intervention categories

2015 International Conference "Stability and Control Processes" in Memory of V.I. Zubov (SCP) Pub Date : 2015-12-03 DOI:10.1109/SCP.2015.7342223

V. Dobrynin, Y. Balykina, M. Kamalov

引用次数: 1

Abstract

The paper describes a process of clustering of article abstracts, taken from the largest bibliographic life sciences and biomedical information MEDLINE database into categories that correspond to types of medical interventions - types of patient treatments. Experiments were carried out to evaluate the quality of clustering for the following algorithms: K-means; K-means++; Hierarchical clustering, SIB (Sequential information bottleneck) together with the LSA (Latent Semantic Analysis) methods and MI (Mutual Information) which allow selecting feature vectors. Best results of clustering were achieved by K-means++ together with LSA then 210-dimensional space was chosen: Purity = 0.5719, Entropy = 1.3841, Normalized Entropy = 0.6299.

查看原文本刊更多论文

MEDLINE摘要循证医学干预分类标准聚类算法分析

本文描述了文章摘要聚类的过程，文章摘要取自最大的书目生命科学和生物医学信息MEDLINE数据库，按医疗干预类型(患者治疗类型)分类。实验评估了以下算法的聚类质量:K-means;k - means + +;分层聚类，SIB(顺序信息瓶颈)，以及LSA(潜在语义分析)方法和MI(互信息)方法，允许选择特征向量。k -means++结合LSA聚类效果最好，选择210维空间:纯度= 0.5719，熵= 1.3841，归一化熵= 0.6299。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 International Conference "Stability and Control Processes" in Memory of V.I. Zubov (SCP)

自引率

0.00%

发文量