Using Word2Vec for news articles recommendations: Considering evaluation options for hyperparameter optimization and different input options

2022 IEEE 16th International Scientific Conference on Informatics (Informatics) Pub Date : 2022-11-23 DOI:10.1109/Informatics57926.2022.10083395

Bogdan Walek, Patrik Müller

引用次数: 0

Abstract

Evaluation of unsupervised and semi-supervised learning methods, especially in the field of information retrieval and recommender systems is a problematic and resource-intensive task. Often, there is no way to evaluate the used machine learning model until user testing is performed. We investigated hyperparameter optimization options of Gensim's Word2Vec implementation by evaluating model performance on word analogies and word pairs tests and statistics of out-of-vocabulary ratio. These automatic and task-independent offline (pre-) evaluations techniques could provide a simple way to reduce the set of final model variants used for resource-demanding user testing or hybrid recommender models, thus we investigated whether those tests were useful for the accuracy of our final task of providing similar articles to a chosen article. We also consider options of using Wikipedia articles for the model training input or the pre-trained FastText model.

查看原文本刊更多论文

使用Word2Vec进行新闻文章推荐:考虑超参数优化的评估选项和不同的输入选项

评估无监督和半监督学习方法，特别是在信息检索和推荐系统领域，是一项有问题且资源密集型的任务。通常，在执行用户测试之前，没有办法评估使用的机器学习模型。通过评价模型在单词类比和单词对测试上的性能以及词汇外率的统计，研究了Gensim的Word2Vec实现的超参数优化选项。这些自动和任务独立的离线(预)评估技术可以提供一种简单的方法来减少用于资源要求高的用户测试或混合推荐模型的最终模型变体集，因此我们研究了这些测试是否有助于我们为所选文章提供类似文章的最终任务的准确性。我们还考虑使用维基百科文章作为模型训练输入或预训练的FastText模型的选项。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 16th International Scientific Conference on Informatics (Informatics)

自引率

0.00%

发文量