arXiv - CS - Information Retrieval最新文献_第5页

NeSHFS: Neighborhood Search with Heuristic-based Feature Selection for Click-Through Rate Prediction NeSHFS：基于启发式特征选择的邻域搜索，用于点击率预测

arXiv - CS - Information Retrieval Pub Date : 2024-09-13 DOI: arxiv-2409.08703

Dogukan Aksu, Ismail Hakki Toroslu, Hasan Davulcu

{"title":"NeSHFS: Neighborhood Search with Heuristic-based Feature Selection for Click-Through Rate Prediction","authors":"Dogukan Aksu, Ismail Hakki Toroslu, Hasan Davulcu","doi":"arxiv-2409.08703","DOIUrl":"https://doi.org/arxiv-2409.08703","url":null,"abstract":"Click-through-rate (CTR) prediction plays an important role in online\u0000advertising and ad recommender systems. In the past decade, maximizing CTR has\u0000been the main focus of model development and solution creation. Therefore,\u0000researchers and practitioners have proposed various models and solutions to\u0000enhance the effectiveness of CTR prediction. Most of the existing literature\u0000focuses on capturing either implicit or explicit feature interactions. Although\u0000implicit interactions are successfully captured in some studies, explicit\u0000interactions present a challenge for achieving high CTR by extracting both\u0000low-order and high-order feature interactions. Unnecessary and irrelevant\u0000features may cause high computational time and low prediction performance.\u0000Furthermore, certain features may perform well with specific predictive models\u0000while underperforming with others. Also, feature distribution may fluctuate due\u0000to traffic variations. Most importantly, in live production environments,\u0000resources are limited, and the time for inference is just as crucial as\u0000training time. Because of all these reasons, feature selection is one of the\u0000most important factors in enhancing CTR prediction model performance. Simple\u0000filter-based feature selection algorithms do not perform well and they are not\u0000sufficient. An effective and efficient feature selection algorithm is needed to\u0000consistently filter the most useful features during live CTR prediction\u0000process. In this paper, we propose a heuristic algorithm named Neighborhood\u0000Search with Heuristic-based Feature Selection (NeSHFS) to enhance CTR\u0000prediction performance while reducing dimensionality and training time costs.\u0000We conduct comprehensive experiments on three public datasets to validate the\u0000efficiency and effectiveness of our proposed solution.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparative Analysis of Pretrained Audio Representations in Music Recommender Systems 音乐推荐系统中的预训练音频表示比较分析

arXiv - CS - Information Retrieval Pub Date : 2024-09-13 DOI: arxiv-2409.08987

Yan-Martin Tamm, Anna Aljanaki

{"title":"Comparative Analysis of Pretrained Audio Representations in Music Recommender Systems","authors":"Yan-Martin Tamm, Anna Aljanaki","doi":"arxiv-2409.08987","DOIUrl":"https://doi.org/arxiv-2409.08987","url":null,"abstract":"Over the years, Music Information Retrieval (MIR) has proposed various models\u0000pretrained on large amounts of music data. Transfer learning showcases the\u0000proven effectiveness of pretrained backend models with a broad spectrum of\u0000downstream tasks, including auto-tagging and genre classification. However, MIR\u0000papers generally do not explore the efficiency of pretrained models for Music\u0000Recommender Systems (MRS). In addition, the Recommender Systems community tends\u0000to favour traditional end-to-end neural network learning over these models. Our\u0000research addresses this gap and evaluates the applicability of six pretrained\u0000backend models (MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, and MusiCNN) in\u0000the context of MRS. We assess their performance using three recommendation\u0000models: K-nearest neighbours (KNN), shallow neural network, and BERT4Rec. Our\u0000findings suggest that pretrained audio representations exhibit significant\u0000performance variability between traditional MIR tasks and MRS, indicating that\u0000valuable aspects of musical information captured by backend models may differ\u0000depending on the task. This study establishes a foundation for further\u0000exploration of pretrained audio representations to enhance music recommendation\u0000systems.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Proactive Recommendation in Social Networks: Steering User Interest via Neighbor Influence 社交网络中的主动推荐：通过邻居影响引导用户兴趣

arXiv - CS - Information Retrieval Pub Date : 2024-09-13 DOI: arxiv-2409.08934

Hang Pan, Shuxian Bi, Wenjie Wang, Haoxuan Li, Peng Wu, Fuli Feng, Xiangnan He

{"title":"Proactive Recommendation in Social Networks: Steering User Interest via Neighbor Influence","authors":"Hang Pan, Shuxian Bi, Wenjie Wang, Haoxuan Li, Peng Wu, Fuli Feng, Xiangnan He","doi":"arxiv-2409.08934","DOIUrl":"https://doi.org/arxiv-2409.08934","url":null,"abstract":"Recommending items solely catering to users' historical interests narrows\u0000users' horizons. Recent works have considered steering target users beyond\u0000their historical interests by directly adjusting items exposed to them.\u0000However, the recommended items for direct steering might not align perfectly\u0000with users' interests evolution, detrimentally affecting target users'\u0000experience. To avoid this issue, we propose a new task named Proactive\u0000Recommendation in Social Networks (PRSN) that indirectly steers users' interest\u0000by utilizing the influence of social neighbors, i.e., indirect steering by\u0000adjusting the exposure of a target item to target users' neighbors. The key to\u0000PRSN lies in answering an interventional question: what would a target user's\u0000feedback be on a target item if the item is exposed to the user's different\u0000neighbors? To answer this question, we resort to causal inference and formalize\u0000PRSN as: (1) estimating the potential feedback of a user on an item, under the\u0000network interference by the item's exposure to the user's neighbors; and (2)\u0000adjusting the exposure of a target item to target users' neighbors to trade off\u0000steering performance and the damage to the neighbors' experience. To this end,\u0000we propose a Neighbor Interference Recommendation (NIRec) framework with two\u0000key modules: (1)an interference representation-based estimation module for\u0000modeling potential feedback; and (2) a post-learning-based optimization module\u0000for optimizing a target item's exposure to trade off steering performance and\u0000the neighbors' experience by greedy search. We conduct extensive\u0000semi-simulation experiments based on three real-world datasets, validating the\u0000steering effectiveness of NIRec.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142255107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Evaluation Framework for Attributed Information Retrieval using Large Language Models 使用大型语言模型进行归属式信息检索的评估框架

arXiv - CS - Information Retrieval Pub Date : 2024-09-12 DOI: arxiv-2409.08014

Hanane Djeddal, Pierre Erbacher, Raouf Toukal, Laure Soulier, Karen Pinel-Sauvagnat, Sophia Katrenko, Lynda Tamine

引用次数: 0

Harnessing TI Feeds for Exploitation Detection 利用 TI 数据源进行开发检测

arXiv - CS - Information Retrieval Pub Date : 2024-09-12 DOI: arxiv-2409.07709

Kajal Patel, Zubair Shafiq, Mateus Nogueira, Daniel Sadoc Menasché, Enrico Lovat, Taimur Kashif, Ashton Woiwood, Matheus Martins

{"title":"Harnessing TI Feeds for Exploitation Detection","authors":"Kajal Patel, Zubair Shafiq, Mateus Nogueira, Daniel Sadoc Menasché, Enrico Lovat, Taimur Kashif, Ashton Woiwood, Matheus Martins","doi":"arxiv-2409.07709","DOIUrl":"https://doi.org/arxiv-2409.07709","url":null,"abstract":"Many organizations rely on Threat Intelligence (TI) feeds to assess the risk\u0000associated with security threats. Due to the volume and heterogeneity of data,\u0000it is prohibitive to manually analyze the threat information available in\u0000different loosely structured TI feeds. Thus, there is a need to develop\u0000automated methods to vet and extract actionable information from TI feeds. To\u0000this end, we present a machine learning pipeline to automatically detect\u0000vulnerability exploitation from TI feeds. We first model threat vocabulary in\u0000loosely structured TI feeds using state-of-the-art embedding techniques\u0000(Doc2Vec and BERT) and then use it to train a supervised machine learning\u0000classifier to detect exploitation of security vulnerabilities. We use our\u0000approach to identify exploitation events in 191 different TI feeds. Our\u0000longitudinal evaluation shows that it is able to accurately identify\u0000exploitation events from TI feeds only using past data for training and even on\u0000TI feeds withheld from training. Our proposed approach is useful for a variety\u0000of downstream tasks such as data-driven vulnerability risk assessment.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"117 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142205313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG 利用排名模型加强问答文本检索：为 RAG 制定基准、微调和部署 Rerankers

arXiv - CS - Information Retrieval Pub Date : 2024-09-12 DOI: arxiv-2409.07691

Gabriel de Souza P. Moreira, Ronay Ak, Benedikt Schifferer, Mengyao Xu, Radek Osmulski, Even Oldridge

{"title":"Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG","authors":"Gabriel de Souza P. Moreira, Ronay Ak, Benedikt Schifferer, Mengyao Xu, Radek Osmulski, Even Oldridge","doi":"arxiv-2409.07691","DOIUrl":"https://doi.org/arxiv-2409.07691","url":null,"abstract":"Ranking models play a crucial role in enhancing overall accuracy of text\u0000retrieval systems. These multi-stage systems typically utilize either dense\u0000embedding models or sparse lexical indices to retrieve relevant passages based\u0000on a given query, followed by ranking models that refine the ordering of the\u0000candidate passages by its relevance to the query. This paper benchmarks various publicly available ranking models and examines\u0000their impact on ranking accuracy. We focus on text retrieval for\u0000question-answering tasks, a common use case for Retrieval-Augmented Generation\u0000systems. Our evaluation benchmarks include models some of which are\u0000commercially viable for industrial applications. We introduce a state-of-the-art ranking model, NV-RerankQA-Mistral-4B-v3,\u0000which achieves a significant accuracy increase of ~14% compared to pipelines\u0000with other rerankers. We also provide an ablation study comparing the\u0000fine-tuning of ranking models with different sizes, losses and self-attention\u0000mechanisms. Finally, we discuss challenges of text retrieval pipelines with ranking\u0000models in real-world industry applications, in particular the trade-offs among\u0000model size, ranking accuracy and system requirements like indexing and serving\u0000latency / throughput.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142205295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Cross-Market Recommendation System with Graph Isomorphism Networks: A Novel Approach to Personalized User Experience 利用图同构网络增强跨市场推荐系统：个性化用户体验的新方法

arXiv - CS - Information Retrieval Pub Date : 2024-09-12 DOI: arxiv-2409.07850

Sümeyye Öztürk, Ahmed Burak Ercan, Resul Tugay, Şule Gündüz Öğüdücü

{"title":"Enhancing Cross-Market Recommendation System with Graph Isomorphism Networks: A Novel Approach to Personalized User Experience","authors":"Sümeyye Öztürk, Ahmed Burak Ercan, Resul Tugay, Şule Gündüz Öğüdücü","doi":"arxiv-2409.07850","DOIUrl":"https://doi.org/arxiv-2409.07850","url":null,"abstract":"In today's world of globalized commerce, cross-market recommendation systems\u0000(CMRs) are crucial for providing personalized user experiences across diverse\u0000market segments. However, traditional recommendation algorithms have\u0000difficulties dealing with market specificity and data sparsity, especially in\u0000new or emerging markets. In this paper, we propose the CrossGR model, which\u0000utilizes Graph Isomorphism Networks (GINs) to improve CMR systems. It\u0000outperforms existing benchmarks in NDCG@10 and HR@10 metrics, demonstrating its\u0000adaptability and accuracy in handling diverse market segments. The CrossGR\u0000model is adaptable and accurate, making it well-suited for handling the\u0000complexities of cross-market recommendation tasks. Its robustness is\u0000demonstrated by consistent performance across different evaluation timeframes,\u0000indicating its potential to cater to evolving market trends and user\u0000preferences. Our findings suggest that GINs represent a promising direction for\u0000CMRs, paving the way for more sophisticated, personalized, and context-aware\u0000recommendation systems in the dynamic landscape of global e-commerce.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142205293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Collaborative Automatic Modulation Classification via Deep Edge Inference for Hierarchical Cognitive Radio Networks 通过深度边缘推理为分层认知无线电网络协同自动调制分类

arXiv - CS - Information Retrieval Pub Date : 2024-09-12 DOI: arxiv-2409.07946

Chaowei He, Peihao Dong, Fuhui Zhou, Qihui Wu

{"title":"Collaborative Automatic Modulation Classification via Deep Edge Inference for Hierarchical Cognitive Radio Networks","authors":"Chaowei He, Peihao Dong, Fuhui Zhou, Qihui Wu","doi":"arxiv-2409.07946","DOIUrl":"https://doi.org/arxiv-2409.07946","url":null,"abstract":"In hierarchical cognitive radio networks, edge or cloud servers utilize the\u0000data collected by edge devices for modulation classification, which, however,\u0000is faced with problems of the transmission overhead, data privacy, and\u0000computation load. In this article, an edge learning (EL) based framework\u0000jointly mobilizing the edge device and the edge server for intelligent\u0000co-inference is proposed to realize the collaborative automatic modulation\u0000classification (C-AMC) between them. A spectrum semantic compression neural\u0000network (SSCNet) with the lightweight structure is designed for the edge device\u0000to compress the collected raw data into a compact semantic message that is then\u0000sent to the edge server via the wireless channel. On the edge server side, a\u0000modulation classification neural network (MCNet) combining bidirectional long\u0000short-term memory (Bi?LSTM) and multi-head attention layers is elaborated to\u0000deter?mine the modulation type from the noisy semantic message. By leveraging\u0000the computation resources of both the edge device and the edge server, high\u0000transmission overhead and risks of data privacy leakage are avoided. The\u0000simulation results verify the effectiveness of the proposed C-AMC framework,\u0000significantly reducing the model size and computational complexity.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"173 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142205292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PDC-FRS: Privacy-preserving Data Contribution for Federated Recommender System PDC-FRS：联盟推荐系统的隐私保护数据贡献

arXiv - CS - Information Retrieval Pub Date : 2024-09-12 DOI: arxiv-2409.07773

Chaoqun Yang, Wei Yuan, Liang Qu, Thanh Tam Nguyen

{"title":"PDC-FRS: Privacy-preserving Data Contribution for Federated Recommender System","authors":"Chaoqun Yang, Wei Yuan, Liang Qu, Thanh Tam Nguyen","doi":"arxiv-2409.07773","DOIUrl":"https://doi.org/arxiv-2409.07773","url":null,"abstract":"Federated recommender systems (FedRecs) have emerged as a popular research\u0000direction for protecting users' privacy in on-device recommendations. In\u0000FedRecs, users keep their data locally and only contribute their local\u0000collaborative information by uploading model parameters to a central server.\u0000While this rigid framework protects users' raw data during training, it\u0000severely compromises the recommendation model's performance due to the\u0000following reasons: (1) Due to the power law distribution nature of user\u0000behavior data, individual users have few data points to train a recommendation\u0000model, resulting in uploaded model updates that may be far from optimal; (2) As\u0000each user's uploaded parameters are learned from local data, which lacks global\u0000collaborative information, relying solely on parameter aggregation methods such\u0000as FedAvg to fuse global collaborative information may be suboptimal. To bridge\u0000this performance gap, we propose a novel federated recommendation framework,\u0000PDC-FRS. Specifically, we design a privacy-preserving data contribution\u0000mechanism that allows users to share their data with a differential privacy\u0000guarantee. Based on the shared but perturbed data, an auxiliary model is\u0000trained in parallel with the original federated recommendation process. This\u0000auxiliary model enhances FedRec by augmenting each user's local dataset and\u0000integrating global collaborative information. To demonstrate the effectiveness\u0000of PDC-FRS, we conduct extensive experiments on two widely used recommendation\u0000datasets. The empirical results showcase the superiority of PDC-FRS compared to\u0000baseline methods.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"404 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142205294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the challenges of studying bias in Recommender Systems: A UserKNN case study 研究推荐系统中的偏见所面临的挑战：用户 KNN 案例研究

arXiv - CS - Information Retrieval Pub Date : 2024-09-12 DOI: arxiv-2409.08046

Savvina Daniil, Manel Slokom, Mirjam Cuper, Cynthia C. S. Liem, Jacco van Ossenbruggen, Laura Hollink

{"title":"On the challenges of studying bias in Recommender Systems: A UserKNN case study","authors":"Savvina Daniil, Manel Slokom, Mirjam Cuper, Cynthia C. S. Liem, Jacco van Ossenbruggen, Laura Hollink","doi":"arxiv-2409.08046","DOIUrl":"https://doi.org/arxiv-2409.08046","url":null,"abstract":"Statements on the propagation of bias by recommender systems are often hard\u0000to verify or falsify. Research on bias tends to draw from a small pool of\u0000publicly available datasets and is therefore bound by their specific\u0000properties. Additionally, implementation choices are often not explicitly\u0000described or motivated in research, while they may have an effect on bias\u0000propagation. In this paper, we explore the challenges of measuring and\u0000reporting popularity bias. We showcase the impact of data properties and\u0000algorithm configurations on popularity bias by combining synthetic data with\u0000well known recommender systems frameworks that implement UserKNN. First, we\u0000identify data characteristics that might impact popularity bias, based on the\u0000functionality of UserKNN. Accordingly, we generate various datasets that\u0000combine these characteristics. Second, we locate UserKNN configurations that\u0000vary across implementations in literature. We evaluate popularity bias for five\u0000synthetic datasets and five UserKNN configurations, and offer insights on their\u0000joint effect. We find that, depending on the data characteristics, various\u0000UserKNN configurations can lead to different conclusions regarding the\u0000propagation of popularity bias. These results motivate the need for explicitly\u0000addressing algorithmic configuration and data properties when reporting and\u0000interpreting bias in recommender systems.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142205291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0