Clustering Sinhala News Articles Using Corpus-Based Similarity Measures

2018 Moratuwa Engineering Research Conference (MERCon) Pub Date : 2018-05-01 DOI:10.1109/MERCON.2018.8421890

P. Nanayakkara, Surangika Ranathunga

引用次数: 9

Abstract

News aggregators help readers to handle large numbers of news items in a convenient manner by collecting them into a single place with meaningful groupings. Such news aggregators/clusters are available for English and some other popular languages. However, no such tools are available for Sinhala language. To address this void, this paper presents a system to collect news articles published across the web and group related articles using corpus-based similarity measures. Despite the simplicity of the technique and morphological richness of Sinhala, we achieved very promising results that prove the viability of the presented technique.

查看原文本刊更多论文

使用基于语料库的相似性度量聚类僧伽罗语新闻文章

新闻聚合器通过将大量新闻条目以有意义的分组收集到一个地方，帮助读者以方便的方式处理大量新闻条目。这样的新闻聚合器/集群可用于英语和其他一些流行语言。然而，没有这样的工具可用于僧伽罗语。为了解决这一空白，本文提出了一个系统来收集在网络上发布的新闻文章，并使用基于语料库的相似性度量对相关文章进行分组。尽管技术简单，僧伽罗语形态丰富，但我们取得了非常有希望的结果，证明了所提出技术的可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 Moratuwa Engineering Research Conference (MERCon)

自引率

0.00%

发文量