{"title":"Podify: A Podcast Streaming Platform with Automatic Logging of User Behaviour for Academic Research","authors":"Francesco Meggetto, Yashar Moshfeghi","doi":"10.1145/3539618.3591824","DOIUrl":"https://doi.org/10.1145/3539618.3591824","url":null,"abstract":"Podcasts are spoken documents that, in recent years, have gained widespread popularity. Despite the growing research interest in this domain, conducting user studies remains challenging due to the lack of datasets that include user behaviour. In particular, there is a need for a podcast streaming platform that reduces the overhead of conducting user studies. To address these issues, in this work, we present Podify. It is the first web-based platform for podcast streaming and consumption specifically designed for research. The platform highly resembles existing streaming systems to provide users with a high level of familiarity on both desktop and mobile. A catalogue of podcast episodes can be easily created via RSS feeds. The platform also offers Elasticsearch-based indexing and search that is highly customisable, allowing research and experimentation in podcast search. Users can manually curate playlists of podcast episodes for consumption. With mechanisms to collect explicit feedback from users (i.e., liking and disliking behaviour), Podify also automatically collects implicit feedback (i.e., all user interactions). Users' behaviour can be easily exported to a readable format for subsequent experimental analysis. A demonstration of the platform is available at https://youtu.be/k9Z5w_KKHr8, with the code and documentation available at https://github.com/NeuraSearch/Podify.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128886199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Next Basket Recommendation with Intent-aware Hypergraph Adversarial Network","authors":"Ran Li, Liang Zhang, Guannan Liu, Junjie Wu","doi":"10.1145/3539618.3591742","DOIUrl":"https://doi.org/10.1145/3539618.3591742","url":null,"abstract":"Next Basket Recommendation (NBR) that recommends a basket of items to users has become a promising promotion artifice for online businesses. The key challenge of NBR is rooted in the complicated relations of items that are dependent on one another in a same basket with users' diverse purchasing intentions, which goes far beyond the pairwise item relations in traditional recommendation tasks, and yet has not been well addressed by existing NBR methods that mostly model the inter-basket item relations only. To that end, in this paper, we construct a hypergraph from basket-wise purchasing records and probe the inter-basket and intra-basket item relations behind the hyperedges. In particular, we combine the strength of HyperGraph Neural Network with disentangled representation learning to derive the intent-aware representations of hyperedges for characterizing the nuances of user purchasing patterns. Moreover, considering the information loss in traditional item-wise optimization, we propose a novel basket-wise optimization scheme via an adversarial network to generate high-quality negative baskets. Extensive experiments conducted on four different data sets demonstrate the superior performances over the state-of-the-art NBR methods. Notably, our method is shown to strike a good balance in recommending both repeated and explorative items as a basket.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123738102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"pybool_ir: A Toolkit for Domain-Specific Search Experiments","authors":"Harrisen Scells, Martin Potthast","doi":"10.1145/3539618.3591819","DOIUrl":"https://doi.org/10.1145/3539618.3591819","url":null,"abstract":"Undertaking research in domain-specific scenarios such as systematic review literature search, legal search, and patent search can often have a high barrier of entry due to complicated indexing procedures and complex Boolean query syntax. Indexing and searching document collections like PubMed in off-the-shelf tools such as Elasticsearch and Lucene often yields less accurate (and less effective) results than the PubMed search engine, i.e., retrieval results do not match what would be retrieved if one issued the same query to PubMed. Furthermore, off-the-shelf tools have their own nuanced query languages and do not allow directly using the often large and complicated Boolean queries seen in domain-specific search scenarios. The pybool_ir toolkit aims to address these problems and to lower the barrier to entry for developing new methods for domain-specific search. The toolkit is an open source package available at https://github.com/hscells/pybool_ir.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132406019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EEDN: Enhanced Encoder-Decoder Network with Local and Global Context Learning for POI Recommendation","authors":"Xinfeng Wang, Fumiyo Fukumoto, Jin Cui, Yoshimi Suzuki, Jiyi Li, Dongjin Yu","doi":"10.1145/3539618.3591678","DOIUrl":"https://doi.org/10.1145/3539618.3591678","url":null,"abstract":"The point-of-interest (POI) recommendation predicts users' destinations, which might be of interest to users and has attracted considerable attention as one of the major applications in location-based social networks (LBSNs). Recent work on graph-based neural networks (GNN) or matrix factorization-based (MF) approaches has resulted in better representations of users and POIs to forecast users' latent preferences. However, they still suffer from the implicit feedback and cold-start problems of check-in data, as they cannot capture both local and global graph-based relations among users (or POIs) simultaneously, and the cold-start neighbors are not handled properly during graph convolution in GNN. In this paper, we propose an enhanced encoder-decoder network (EEDN) to exploit rich latent features between users, POIs, and interactions between users and POIs for POI recommendation. The encoder of EEDN utilizes a hybrid hypergraph convolution to enhance the aggregation ability of each graph convolution step and learns to derive more robust cold-start-aware user representations. In contrast, the decoder mines local and global interactions by both graph- and sequential-based patterns for modeling implicit feedback, especially to alleviate exposure bias. Extensive experiments in three public real-world datasets demonstrate that EEDN outperforms state-of-the-art methods. Our source codes and data are released at https://github.com/WangXFng/EEDN","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132482021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing Spurious Correlations for Relation Extraction by Feature Decomposition and Semantic Augmentation","authors":"Tianshu Yu, Min Yang, Chengming Li, Ruifeng Xu","doi":"10.1145/3539618.3592050","DOIUrl":"https://doi.org/10.1145/3539618.3592050","url":null,"abstract":"Deep neural models have become mainstream in relation extraction (RE), yielding state-of-the-art performance. However, most existing neural models are prone to spurious correlations between input features and prediction labels, making the models suffer from low robustness and generalization.In this paper, we propose a spurious correlation reduction method for RE via feature decomposition and semantic augmentation (denoted as FDSA). First, we decompose the original sentence representation into class-related features and context-related features. To obtain better context-related features, we devise a contrastive learning method to pull together the context-related features of the anchor sentence and its augmented sentences, and push away the context-related features of different anchor sentences. In addition, we propose gradient-based semantic augmentation on context-related features in order to improve the robustness of the RE model. Experiments on four datasets show that our model outperforms the strong competitors.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131937473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How Significant Attributes are in the Community Detection of Attributed Multiplex Networks","authors":"Junwei Cheng, Chaobo He, Kunlin Han, Wenjie Ma, Yong-Hong Tang","doi":"10.1145/3539618.3591998","DOIUrl":"https://doi.org/10.1145/3539618.3591998","url":null,"abstract":"Existing community detection methods for attributed multiplex networks focus on exploiting the complementary information from different topologies, while they are paying little attention to the role of attributes. However, we observe that real attributed multiplex networks exhibit two unique features, namely, consistency and homogeneity of node attributes. Therefore, in this paper, we propose a novel method, called ACDM, which is based on these two characteristics of attributes, to detect communities on attributed multiplex networks. Specifically, we extract commonality representation of nodes through the consistency of attributes. The collaboration between the homogeneity of attributes and topology information reveals the particularity representation of nodes. The comprehensive experimental results on real attributed multiplex networks well validate that our method outperforms state-of-the-art methods in most networks.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130219622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunyang Wang, Yanmin Zhu, Aixin Sun, Zhaobo Wang, K. Wang
{"title":"A Preference Learning Decoupling Framework for User Cold-Start Recommendation","authors":"Chunyang Wang, Yanmin Zhu, Aixin Sun, Zhaobo Wang, K. Wang","doi":"10.1145/3539618.3591627","DOIUrl":"https://doi.org/10.1145/3539618.3591627","url":null,"abstract":"The issue of user cold-start poses a long-standing challenge to recommendation systems, due to the scarce interactions of new users. Recently, meta-learning based studies treat each cold-start user as a user-specific few-shot task and then derive meta-knowledge about fast model adaptation across training users. However, existing solutions mostly do not clearly distinguish the concept of new users and the concept of novel preferences, leading to over-reliance on meta-learning based adaptability to novel patterns. In addition, we also argue that the existing meta-training task construction inherently suffers from the memorization overfitting issue, which inevitably hinders meta-generalization to new users. In response to the aforementioned issues, we propose a preference learning decoupling framework, which is enhanced with meta-augmentation (PDMA), for user cold-start recommendation. To rescue the meta-learning from unnecessary adaptation to common patterns, our framework decouples preference learning for a cold-start user into two complementary aspects: common preference transfer, and novel preference adaptation. To handle the memorization overfitting issue, we further propose to augment meta-training users by injecting attribute-based noises, to achieve mutually-exclusive tasks. Extensive experiments on benchmark datasets demonstrate that our framework achieves superior performance improvements against state-of-the-art methods. We also show that our proposed framework is effective in alleviating memorization overfitting.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127817862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nuo Chen, Dong-Jin Park, Hyun-Chul Park, Kijun Choi, T. Sakai, Jinyoung Kim
{"title":"Practice and Challenges in Building a Business-oriented Search Engine Quality Metric","authors":"Nuo Chen, Dong-Jin Park, Hyun-Chul Park, Kijun Choi, T. Sakai, Jinyoung Kim","doi":"10.1145/3539618.3591841","DOIUrl":"https://doi.org/10.1145/3539618.3591841","url":null,"abstract":"One of the most challenging aspects of operating a large-scale web search engine is to accurately evaluate and monitor the search engine's result quality regardless of search types. From a business perspective, in the face of such challenges, it is important to establish a universal search quality metric that can be easily understood by the entire organisation. In this paper, we introduce a model-based quality metric using Explainable Boosting Machine as the classifier and online user behaviour signals as features to predict search quality. The proposed metric takes into account a variety of search types and has good interpretability. To examine the performance of the metric, we constructed a large dataset of user behaviour on search engine results pages (SERPs) with SERP quality ratings from professional annotators. We compared the performance of the model in our metric to those of other black-box machine learning models on the dataset. We also share a few experiences within our company for the org-wide adoption of this metric relevant to metric design.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125447525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-Scale Data Processing for Information Retrieval Applications","authors":"Pooya Khandel","doi":"10.1145/3539618.3591797","DOIUrl":"https://doi.org/10.1145/3539618.3591797","url":null,"abstract":"Developing Information Retrieval (IR) applications such as search engines and recommendation systems require training of models that are growing in complexity and size with immense collections of data that contain multiple dimensions (documents/items text, user profiles, and interactions). Much of the research in IR concentrates on improving the performance of ranking models; however, given the high training time and high computational resources required to improve the performance by designing new models, it is crucial to address efficiency aspects of the design and deployment of IR applications at large-scale. In my thesis, I aim to improve the training efficiency of IR applications and speed up the development phase of new models, by applying dataset distillation approaches to reduce the dataset size while preserving the ranking quality and employing efficient High-Performance Computing (HPC) solutions to increase the processing speed.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126411718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harrisen Scells, Ferdinand Schlatt, Martin Potthast
{"title":"Smooth Operators for Effective Systematic Review Queries","authors":"Harrisen Scells, Ferdinand Schlatt, Martin Potthast","doi":"10.1145/3539618.3591768","DOIUrl":"https://doi.org/10.1145/3539618.3591768","url":null,"abstract":"Effective queries are crucial to minimising the time and cost of medical systematic reviews, as all retrieved documents must be judged for relevance. Boolean queries, developed by expert librarians, are the standard for systematic reviews. They guarantee reproducible and verifiable retrieval and more control than free-text queries. However, the result sets of Boolean queries are unranked and difficult to control due to the strict Boolean operators. We address these problems in a single unified retrieval model by formulating a class of smooth operators that are compatible with and extend existing Boolean operators. Our smooth operators overcome several shortcomings of previous extensions of the Boolean retrieval model. In particular, our operators are independent of the underlying ranking function, so that exact-match and large language model rankers can be combined in the same query. We found that replacing Boolean operators with equivalent or similar smooth operators often improves the effectiveness of queries. Their properties make tuning a query to precision or recall intuitive and allow greater control over how documents are retrieved. This additional control leads to more effective queries and reduces the cost of systematic reviews.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123017741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}