基于(两层)增量聚类的在线新闻源分析与比较

Francesco Cambi, P. Crescenzi, L. Pagli
{"title":"基于(两层)增量聚类的在线新闻源分析与比较","authors":"Francesco Cambi, P. Crescenzi, L. Pagli","doi":"10.4230/LIPIcs.FUN.2016.9","DOIUrl":null,"url":null,"abstract":"In this paper, we analyse the contents of the web site of two Italian news agencies and of four \nof the most popular Italian newspapers, in order to answer questions such as what are the most \nrelevant news, what is the average life of news, and how much different are different sites. To this \naim, we have developed a web-based application which hourly collects the articles in the main \ncolumn of the six web sites, implements an incremental clustering algorithm for grouping the \narticles into news, and finally allows the user to see the answer to the above questions. We have \nalso designed and implemented a two-layer modification of the incremental clustering algorithm \nand executed some preliminary experimental evaluation of this modification: it turns out that \nthe two-layer clustering is extremely efficient in terms of time performances, and it has quite \ngood performances in terms of precision and recall.","PeriodicalId":293763,"journal":{"name":"Fun with Algorithms","volume":"271 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Analyzing and Comparing On-Line News Sources via (Two-Layer) Incremental Clustering\",\"authors\":\"Francesco Cambi, P. Crescenzi, L. Pagli\",\"doi\":\"10.4230/LIPIcs.FUN.2016.9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we analyse the contents of the web site of two Italian news agencies and of four \\nof the most popular Italian newspapers, in order to answer questions such as what are the most \\nrelevant news, what is the average life of news, and how much different are different sites. To this \\naim, we have developed a web-based application which hourly collects the articles in the main \\ncolumn of the six web sites, implements an incremental clustering algorithm for grouping the \\narticles into news, and finally allows the user to see the answer to the above questions. We have \\nalso designed and implemented a two-layer modification of the incremental clustering algorithm \\nand executed some preliminary experimental evaluation of this modification: it turns out that \\nthe two-layer clustering is extremely efficient in terms of time performances, and it has quite \\ngood performances in terms of precision and recall.\",\"PeriodicalId\":293763,\"journal\":{\"name\":\"Fun with Algorithms\",\"volume\":\"271 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Fun with Algorithms\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/LIPIcs.FUN.2016.9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fun with Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.FUN.2016.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

在本文中,我们分析了两家意大利新闻机构和四家最受欢迎的意大利报纸的网站内容,以回答诸如什么是最相关的新闻,新闻的平均寿命是什么,以及不同网站的差异有多大等问题。为此,我们开发了一个基于web的应用程序,该应用程序每小时收集六个网站主栏中的文章,并实现增量聚类算法将文章分组为新闻,最后让用户看到上述问题的答案。我们还设计并实现了增量聚类算法的两层修改,并对该修改进行了一些初步的实验评估:结果表明,两层聚类在时间性能上是非常高效的,在精度和召回率方面也有相当好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Analyzing and Comparing On-Line News Sources via (Two-Layer) Incremental Clustering
In this paper, we analyse the contents of the web site of two Italian news agencies and of four of the most popular Italian newspapers, in order to answer questions such as what are the most relevant news, what is the average life of news, and how much different are different sites. To this aim, we have developed a web-based application which hourly collects the articles in the main column of the six web sites, implements an incremental clustering algorithm for grouping the articles into news, and finally allows the user to see the answer to the above questions. We have also designed and implemented a two-layer modification of the incremental clustering algorithm and executed some preliminary experimental evaluation of this modification: it turns out that the two-layer clustering is extremely efficient in terms of time performances, and it has quite good performances in terms of precision and recall.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信