{"title":"Using Word2Vec for news articles recommendations: Considering evaluation options for hyperparameter optimization and different input options","authors":"Bogdan Walek, Patrik Müller","doi":"10.1109/Informatics57926.2022.10083395","DOIUrl":null,"url":null,"abstract":"Evaluation of unsupervised and semi-supervised learning methods, especially in the field of information retrieval and recommender systems is a problematic and resource-intensive task. Often, there is no way to evaluate the used machine learning model until user testing is performed. We investigated hyperparameter optimization options of Gensim's Word2Vec implementation by evaluating model performance on word analogies and word pairs tests and statistics of out-of-vocabulary ratio. These automatic and task-independent offline (pre-) evaluations techniques could provide a simple way to reduce the set of final model variants used for resource-demanding user testing or hybrid recommender models, thus we investigated whether those tests were useful for the accuracy of our final task of providing similar articles to a chosen article. We also consider options of using Wikipedia articles for the model training input or the pre-trained FastText model.","PeriodicalId":101488,"journal":{"name":"2022 IEEE 16th International Scientific Conference on Informatics (Informatics)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 16th International Scientific Conference on Informatics (Informatics)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Informatics57926.2022.10083395","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Evaluation of unsupervised and semi-supervised learning methods, especially in the field of information retrieval and recommender systems is a problematic and resource-intensive task. Often, there is no way to evaluate the used machine learning model until user testing is performed. We investigated hyperparameter optimization options of Gensim's Word2Vec implementation by evaluating model performance on word analogies and word pairs tests and statistics of out-of-vocabulary ratio. These automatic and task-independent offline (pre-) evaluations techniques could provide a simple way to reduce the set of final model variants used for resource-demanding user testing or hybrid recommender models, thus we investigated whether those tests were useful for the accuracy of our final task of providing similar articles to a chosen article. We also consider options of using Wikipedia articles for the model training input or the pre-trained FastText model.