Documents, Topics, and Authors: Text Mining of Online News

Mete Sertkan, J. Neidhardt, H. Werthner
{"title":"Documents, Topics, and Authors: Text Mining of Online News","authors":"Mete Sertkan, J. Neidhardt, H. Werthner","doi":"10.1109/CBI.2019.00053","DOIUrl":null,"url":null,"abstract":"The goal of recommender systems is, in essence, to help people to discover items they might like, i.e., items that fit their preferences, personality, and needs. Depending on the respective domain, those items can be books, movies, music, hotels, and much more. Typically, recommendations are based on past user interactions (e.g., movies a user saw, hotels a user booked, etc.). This work in progress paper focuses on news recommender systems. Because of the nature of news (e.g., constantly new items, short item lifetime, etc.), recommendations based on past interactions are especially hard to make. Hence, news recommender systems heavily rely on the actual content of news. While previous work mainly considers one aspect of the content of news articles, we jointly analyse and discuss in this work a given corpora of news articles on three different levels (i.e., document-level, topic-level, and author-level). The overall aim is to set to provide the basis for a comprehensive news recommender system, which reaches beyond accuracy and considers also diversity and serendipity. We demonstrate that relevant information can be extracted out of a given corpora, and differences in author, time, and topic can be shown. Furthermore, the author-level analysis shows that documents can be clustered based on the writing style of authors. Finally, our findings show that author-level analysis has the potential to recommend the most diverse items compared to the other approaches.","PeriodicalId":193238,"journal":{"name":"2019 IEEE 21st Conference on Business Informatics (CBI)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 21st Conference on Business Informatics (CBI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBI.2019.00053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The goal of recommender systems is, in essence, to help people to discover items they might like, i.e., items that fit their preferences, personality, and needs. Depending on the respective domain, those items can be books, movies, music, hotels, and much more. Typically, recommendations are based on past user interactions (e.g., movies a user saw, hotels a user booked, etc.). This work in progress paper focuses on news recommender systems. Because of the nature of news (e.g., constantly new items, short item lifetime, etc.), recommendations based on past interactions are especially hard to make. Hence, news recommender systems heavily rely on the actual content of news. While previous work mainly considers one aspect of the content of news articles, we jointly analyse and discuss in this work a given corpora of news articles on three different levels (i.e., document-level, topic-level, and author-level). The overall aim is to set to provide the basis for a comprehensive news recommender system, which reaches beyond accuracy and considers also diversity and serendipity. We demonstrate that relevant information can be extracted out of a given corpora, and differences in author, time, and topic can be shown. Furthermore, the author-level analysis shows that documents can be clustered based on the writing style of authors. Finally, our findings show that author-level analysis has the potential to recommend the most diverse items compared to the other approaches.
文档、主题和作者:在线新闻的文本挖掘
从本质上讲,推荐系统的目标是帮助人们发现他们可能喜欢的物品,即符合他们偏好、个性和需求的物品。根据各自的领域,这些项目可以是书籍、电影、音乐、酒店等等。通常,推荐是基于过去的用户交互(例如,用户看过的电影,用户预订的酒店等)。这篇正在进行的论文的重点是新闻推荐系统。由于新闻的性质(例如,不断有新项目,项目寿命短等),基于过去互动的推荐特别困难。因此,新闻推荐系统严重依赖于新闻的实际内容。以往的工作主要考虑新闻文章内容的一个方面,而在这项工作中,我们从三个不同的层面(即文档层面、主题层面和作者层面)对给定的新闻文章语料库进行了分析和讨论。总体目标是为全面的新闻推荐系统提供基础,该系统不仅要达到准确性,还要考虑多样性和偶然性。我们证明了相关信息可以从给定的语料库中提取出来,并且可以显示作者,时间和主题的差异。此外,作者级别的分析表明,可以根据作者的写作风格对文档进行聚类。最后,我们的研究结果表明,与其他方法相比,作者层面的分析有可能推荐最多样化的项目。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信