基于本体和新的层次聚类方法改进基于内容的文档聚类推荐系统

Maryam Hourali, Mansoureh Hourali
{"title":"基于本体和新的层次聚类方法改进基于内容的文档聚类推荐系统","authors":"Maryam Hourali, Mansoureh Hourali","doi":"10.52547/itrc.14.3.37","DOIUrl":null,"url":null,"abstract":"—Today we live in a period that is known to an area of communication. By increasing the information on the internet, the extra news are published on news agencies websites or other resources, the users are confused more with the problems of finding their desired information and related news. Among these are recommended systems they can automatically finding the news and information of their favorite’s users and suggesting to them too. This article attempts to improve the user’s interests and user’s satisfactions by refining the content based recommendation system to suggest better sources to their users. A clustering approach has been used to carry out this improvement. An attempt has been made to define a cluster threshold for clustering the same news and information in the K-means clustering algorithm. By detecting best resemblance criterion value and using an external knowledge base (ontology), we could generalize words into a set of related words (instead of using them alone). This approach is promoted the accuracy of news clustering and use the provided cluster to find user’s favorite news and also could have suggest the news to the user. Since the dataset has an important and influential role in advisory recommended systems, the standard Persian dataset is not provided and not published yet. In this research, an attempted has been made to connect and publish the dataset to finish the effect of this vacuum. The data are collected and crawl 8 periods of days from the Tabnak news agency website. The profile of each volunteers has been created and also saved at the same time as they read the favorite news on that period of time. An analysis shows that the proposed clustering approach provided by the NMI criterion has reached 70.2% on our the dataset. Also, using the suggested clustering recommendation system yield 89.2% performance based on the accuracy criterion, which shows an improvement of 8.5% in a standardized way.","PeriodicalId":270455,"journal":{"name":"International Journal of Information and Communication Technology Research","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Content-Based Recommender System For Clustering Documents Based on Ontology And New Hierarchical Clustering Method\",\"authors\":\"Maryam Hourali, Mansoureh Hourali\",\"doi\":\"10.52547/itrc.14.3.37\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"—Today we live in a period that is known to an area of communication. By increasing the information on the internet, the extra news are published on news agencies websites or other resources, the users are confused more with the problems of finding their desired information and related news. Among these are recommended systems they can automatically finding the news and information of their favorite’s users and suggesting to them too. This article attempts to improve the user’s interests and user’s satisfactions by refining the content based recommendation system to suggest better sources to their users. A clustering approach has been used to carry out this improvement. An attempt has been made to define a cluster threshold for clustering the same news and information in the K-means clustering algorithm. By detecting best resemblance criterion value and using an external knowledge base (ontology), we could generalize words into a set of related words (instead of using them alone). This approach is promoted the accuracy of news clustering and use the provided cluster to find user’s favorite news and also could have suggest the news to the user. Since the dataset has an important and influential role in advisory recommended systems, the standard Persian dataset is not provided and not published yet. In this research, an attempted has been made to connect and publish the dataset to finish the effect of this vacuum. The data are collected and crawl 8 periods of days from the Tabnak news agency website. The profile of each volunteers has been created and also saved at the same time as they read the favorite news on that period of time. An analysis shows that the proposed clustering approach provided by the NMI criterion has reached 70.2% on our the dataset. Also, using the suggested clustering recommendation system yield 89.2% performance based on the accuracy criterion, which shows an improvement of 8.5% in a standardized way.\",\"PeriodicalId\":270455,\"journal\":{\"name\":\"International Journal of Information and Communication Technology Research\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Information and Communication Technology Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.52547/itrc.14.3.37\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information and Communication Technology Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52547/itrc.14.3.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

今天,我们生活在一个众所周知的交流领域。通过增加互联网上的信息,额外的新闻被发布在新闻机构、网站或其他资源上,用户更加困惑于寻找他们想要的信息和相关新闻的问题。在这些推荐系统中,他们可以自动找到他们喜欢的用户的新闻和信息,并向他们提出建议。本文试图通过改进基于内容的推荐系统,向用户推荐更好的资源,从而提高用户的兴趣和用户满意度。我们使用了一种聚类方法来实现这种改进。在K-means聚类算法中,我们尝试定义一个聚类阈值来聚类相同的新闻和信息。通过检测最佳相似判据值并利用外部知识库(本体),我们可以将单词泛化成一组相关单词(而不是单独使用它们)。该方法提高了新闻聚类的准确性,并利用所提供的聚类找到用户喜欢的新闻,还可以向用户推荐新闻。由于该数据集在咨询推荐系统中具有重要和有影响力的作用,因此没有提供标准的波斯语数据集,也没有发布。在本研究中,我们尝试连接和发布数据集来完成这种真空的影响。这些数据是从Tabnak通讯社网站上收集和抓取的,为期8天。每个志愿者的个人资料都已创建并保存在他们阅读那段时间最喜欢的新闻的同时。分析表明,NMI标准提供的聚类方法在我们的数据集上达到了70.2%。基于准确率标准,使用建议的聚类推荐系统的准确率达到89.2%,标准化后的准确率提高了8.5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improving Content-Based Recommender System For Clustering Documents Based on Ontology And New Hierarchical Clustering Method
—Today we live in a period that is known to an area of communication. By increasing the information on the internet, the extra news are published on news agencies websites or other resources, the users are confused more with the problems of finding their desired information and related news. Among these are recommended systems they can automatically finding the news and information of their favorite’s users and suggesting to them too. This article attempts to improve the user’s interests and user’s satisfactions by refining the content based recommendation system to suggest better sources to their users. A clustering approach has been used to carry out this improvement. An attempt has been made to define a cluster threshold for clustering the same news and information in the K-means clustering algorithm. By detecting best resemblance criterion value and using an external knowledge base (ontology), we could generalize words into a set of related words (instead of using them alone). This approach is promoted the accuracy of news clustering and use the provided cluster to find user’s favorite news and also could have suggest the news to the user. Since the dataset has an important and influential role in advisory recommended systems, the standard Persian dataset is not provided and not published yet. In this research, an attempted has been made to connect and publish the dataset to finish the effect of this vacuum. The data are collected and crawl 8 periods of days from the Tabnak news agency website. The profile of each volunteers has been created and also saved at the same time as they read the favorite news on that period of time. An analysis shows that the proposed clustering approach provided by the NMI criterion has reached 70.2% on our the dataset. Also, using the suggested clustering recommendation system yield 89.2% performance based on the accuracy criterion, which shows an improvement of 8.5% in a standardized way.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信