Dynamic Document Clustering Using Singular Value Decomposition

Rashmi Nadubeediramesh, A. Gangopadhyay
{"title":"Dynamic Document Clustering Using Singular Value Decomposition","authors":"Rashmi Nadubeediramesh, A. Gangopadhyay","doi":"10.4018/jcmam.2012070103","DOIUrl":null,"url":null,"abstract":"Incremental document clustering is important in many applications, but particularly so in healthcare contexts where text data is found in abundance, ranging from published research in journals to day-to-day healthcare data such as discharge summaries and nursing notes. In such dynamic environments new documents are constantly added to the set of documents that have been used in the initial cluster formation. Hence it is important to be able to incrementally update the clusters at a low computational cost as new documents are added. In this paper the authors describe a novel, low cost approach for incremental document clustering. Their method is based on conducting singular value decomposition (SVD) incrementally. They dynamically fold in new documents into the existing term-document space and dynamically assign these new documents into pre-defined clusters based on intra-cluster similarity. This saves the cost of re-computing SVD on the entire document set every time updates occur. The authors also provide a way to retrieve documents based on different window sizes with high scalability and good clustering accuracy. They have tested their proposed method experimentally with 960 medical abstracts retrieved from the PubMed medical library. The authors’ incremental method is compared with the default situation where complete re-computation of SVD is done when new documents are added to the initial set of documents. The results show minor decreases in the quality of the cluster formation but much larger gains in computational throughput. Dynamic Document Clustering Using Singular Value Decomposition","PeriodicalId":162417,"journal":{"name":"Int. J. Comput. Model. Algorithms Medicine","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Model. Algorithms Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/jcmam.2012070103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Incremental document clustering is important in many applications, but particularly so in healthcare contexts where text data is found in abundance, ranging from published research in journals to day-to-day healthcare data such as discharge summaries and nursing notes. In such dynamic environments new documents are constantly added to the set of documents that have been used in the initial cluster formation. Hence it is important to be able to incrementally update the clusters at a low computational cost as new documents are added. In this paper the authors describe a novel, low cost approach for incremental document clustering. Their method is based on conducting singular value decomposition (SVD) incrementally. They dynamically fold in new documents into the existing term-document space and dynamically assign these new documents into pre-defined clusters based on intra-cluster similarity. This saves the cost of re-computing SVD on the entire document set every time updates occur. The authors also provide a way to retrieve documents based on different window sizes with high scalability and good clustering accuracy. They have tested their proposed method experimentally with 960 medical abstracts retrieved from the PubMed medical library. The authors’ incremental method is compared with the default situation where complete re-computation of SVD is done when new documents are added to the initial set of documents. The results show minor decreases in the quality of the cluster formation but much larger gains in computational throughput. Dynamic Document Clustering Using Singular Value Decomposition
基于奇异值分解的动态文档聚类
增量文档聚类在许多应用程序中都很重要,但在文本数据丰富的医疗保健环境中尤其如此,从期刊上发表的研究到出院摘要和护理笔记等日常医疗保健数据。在这种动态环境中,不断地将新文档添加到初始集群形成中使用的文档集中。因此,能够在添加新文档时以较低的计算成本增量更新集群是很重要的。在本文中,作者描述了一种新颖的、低成本的增量文档聚类方法。他们的方法是基于增量进行奇异值分解(SVD)。它们动态地将新文档折叠到现有的术语文档空间中,并根据簇内相似性将这些新文档动态地分配到预定义的簇中。这节省了每次发生更新时对整个文档集重新计算SVD的成本。作者还提供了一种基于不同窗口大小的检索文档的方法,具有高可扩展性和良好的聚类精度。他们用从PubMed医学图书馆检索的960篇医学摘要对他们提出的方法进行了实验测试。将作者的增量方法与添加新文档到初始文档集时完全重新计算SVD的默认情况进行了比较。结果表明,簇形成的质量略有下降,但计算吞吐量却有了很大的提高。基于奇异值分解的动态文档聚类
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信