Proceedings of the 31st ACM International Conference on Information & Knowledge Management最新文献

筛选
英文 中文
RaDaR: A Real-Word Dataset for AI powered Run-time Detection of Cyber-Attacks 雷达:用于人工智能驱动的网络攻击运行时检测的实时数据集
Sareena Karapoola, Nikhilesh Singh, C. Rebeiro, V. Kamakoti
{"title":"RaDaR: A Real-Word Dataset for AI powered Run-time Detection of Cyber-Attacks","authors":"Sareena Karapoola, Nikhilesh Singh, C. Rebeiro, V. Kamakoti","doi":"10.1145/3511808.3557121","DOIUrl":"https://doi.org/10.1145/3511808.3557121","url":null,"abstract":"Artificial Intelligence techniques on malware run-time behavior have emerged as a promising tool in the arms race against sophisticated and stealthy cyber-attacks. While data of malware run-time features are critical for research and benchmark comparisons, unfortunately, there is a dearth of real-world datasets due to multiple challenges to their collection. The evasive nature of malware, its dependence on connected real-world conditions to execute, and its potential repercussions pose significant challenges for executing malware in laboratory settings. Consequently, prior open datasets rely on isolated virtual sandboxes to run malware, resulting in data that is not representative of malware behavior in the wild. This paper presents RaDaR, an open real-world dataset for run-time behavioral analysis of Windows malware. RaDaR is collected by executing malware on a real-world testbed with Internet connectivity and in a timely manner, thus providing a close-to-real-world representation of malware behavior. To enable an unbiased comparison of different solutions and foster multiple verticals in malware research, RaDaR provides a multi-perspective data collection and labeling of malware activity. The multi-perspective collection provides a comprehensive view of malware activity across the network, operating system (OS), and hardware. On the other hand, the multi-perspective labeling provides four independent perspectives to analyze the same malware, including its methodology, objective, capabilities, and the information it exfiltrates. To date, RaDaR includes 7 million network packets, 11.3 million OS system call traces, and 3.3 million hardware events of 10,434 malware samples having different methodologies (3 classes) and objectives (9 classes), spread across 30 well-known malware families.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124116664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ledgit: A Service to Diagnose Illicit Addresses on Blockchain using Multi-modal Unsupervised Learning ledger:一种使用多模态无监督学习诊断区块链上非法地址的服务
Xiaoying Zhi, Yash Satsangi, Sean J. Moran, Shaltiel Eloul
{"title":"Ledgit: A Service to Diagnose Illicit Addresses on Blockchain using Multi-modal Unsupervised Learning","authors":"Xiaoying Zhi, Yash Satsangi, Sean J. Moran, Shaltiel Eloul","doi":"10.1145/3511808.3557212","DOIUrl":"https://doi.org/10.1145/3511808.3557212","url":null,"abstract":"Distributed ledger technology benefits society by enabling an ecosystem of decentralised finance. However the pseudo-anonymised nature of transactions has also been an enabler of new routes for illicit activities ranging from individual scams to organised crimes. Current solutions for identifying addresses involved in illicit activities (illicit addresses) rely on commercial intelligence services, which are costly due to the intensive investigative efforts required. We propose Ledgit, an automatic real-time service for diagnosing illicit addresses on the Bitcoin blockchain. Ledgit is based solely on publicly available data, and uses an unsupervised clustering method that combines information from textual reports and the blockchain graph to assign a risk score that a Bitcoin address is involved in illicit activities. We verify the system with labeled addresses, showing high performance in identifying illicit addresses. Finally, we provide an intuitive user interface that provides accessible risk assessment with graph and report analytics.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126415567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semorph: A Morphology Semantic Enhanced Pre-trained Model for Chinese Spam Text Detection Semorph:用于中文垃圾文本检测的形态学语义增强预训练模型
Kaiting Lai, Yinong Long, Bowen Wu, Ying Li, Baoxun Wang
{"title":"Semorph: A Morphology Semantic Enhanced Pre-trained Model for Chinese Spam Text Detection","authors":"Kaiting Lai, Yinong Long, Bowen Wu, Ying Li, Baoxun Wang","doi":"10.1145/3511808.3557448","DOIUrl":"https://doi.org/10.1145/3511808.3557448","url":null,"abstract":"Chinese spam text detection is essential for social media since these texts affect the user experience of Chinese speakers and pollute the community. The underlying text classification method is employed to explore the unique combinations of characters that represent clues of spam information from annotated or further augmented data. However, based on the diversity of Chinese characters in glyphs, the spammers frequently wrap the spam content in another visually close text to fool the model but make sure people understand. This paper proposes to adopt the essence of human cognition of these adversarial texts into spam text detection models, by designing a pre-trained model to learn the morphology semantics of Chinese characters and represent their contextual meanings from scratch. The model pre-trains on self-supervised Chinese corpus and fine-tunes on spam-annotated community texts. Besides, cooperating with the pre-trained model that can capture the morphological features of Chinese, a new data perturbation method is introduced to guide the optimization towards the direction of recognizing the actual meaning of a text after spammers tamper with partial characters by visually close ones. The experimental results have shown that our proposed methodology can notably improve the performance of spam text detection as well as maintain robustness against adversarial samples.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128082650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Control-based Bidding for Mobile Livestreaming Ads with Exposure Guarantee 带有曝光保证的基于控制的移动直播广告竞价
Haoqi Zhang, Junqi Jin, Zhenzhe Zheng, Fan Wu, Haiyang Xu, Jian Xu
{"title":"Control-based Bidding for Mobile Livestreaming Ads with Exposure Guarantee","authors":"Haoqi Zhang, Junqi Jin, Zhenzhe Zheng, Fan Wu, Haiyang Xu, Jian Xu","doi":"10.1145/3511808.3557269","DOIUrl":"https://doi.org/10.1145/3511808.3557269","url":null,"abstract":"Mobile livestreaming ads are becoming a popular approach for brand promotion and product marketing. However, a large number of advertisers fail to achieve their desired advertising performance due to the lack of ad exposure guarantee in the dynamic advertising environment. In this work, we propose a bidding-based ad delivery algorithm for mobile livestreaming ads that can provide advertisers with bidding strategies for optimizing diverse marketing objectives under general ad performance guaranteed constraints, such as ad exposure and cost-efficiency constraints. By modeling the problem as an online integer programming and applying primal-dual theory, we can derive the bidding strategy from solving the optimal dual variables. The initialization of the dual variables is realized through a deep neural network that captures the complex relation between dual variables and dynamic advertising environments. We further propose a control-based bidding algorithm to adjust the dual variables in an online manner based on the real-time advertising performance feedback and constraints. Experiments on a real-world industrial dataset demonstrate the effectiveness of our bidding algorithm in terms of optimizing marketing objectives and guaranteeing ad constraints.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125416739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Biased Sampling Method for Imbalanced Personalized Ranking 一种不平衡个性化排名的偏抽样方法
Lu Yu, Shichao Pei, Feng Zhu, Longfei Li, Jun Zhou, Chuxu Zhang, Xiangliang Zhang
{"title":"A Biased Sampling Method for Imbalanced Personalized Ranking","authors":"Lu Yu, Shichao Pei, Feng Zhu, Longfei Li, Jun Zhou, Chuxu Zhang, Xiangliang Zhang","doi":"10.1145/3511808.3557218","DOIUrl":"https://doi.org/10.1145/3511808.3557218","url":null,"abstract":"Pairwise ranking models have been widely used to address recommendation problems. The basic idea is to learn the rank of users' preferred items through separating items into positive samples if user-item interactions exist, and negative samples otherwise. Due to the limited number of observable interactions, pairwise ranking models face serious class-imbalance issues. Our theoretical analysis shows that current sampling-based methods cause the vertex-level imbalance problem, which makes the norm of learned item embeddings towards infinite after a certain training iterations, and consequently results in vanishing gradient and affects the model inference results. We thus propose an efficient Vital Negative Sampler (VINS) to alleviate the class-imbalance issue for pairwise ranking model, in particular for deep learning models optimized by gradient methods. The core of VINS is a bias sampler with reject probability that will tend to accept a negative candidate with a larger degree weight than the given positive item. Evaluation results on several real datasets demonstrate that the proposed sampling method speeds up the training procedure 30% to 50% for ranking models ranging from shallow to deep, while maintaining and even improving the quality of ranking results in top-N item recommendations.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132135695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Memory Bank Augmented Long-tail Sequential Recommendation 记忆库增强长尾顺序推荐
Yidan Hu, Yong Liu, C. Miao, Yuan Miao
{"title":"Memory Bank Augmented Long-tail Sequential Recommendation","authors":"Yidan Hu, Yong Liu, C. Miao, Yuan Miao","doi":"10.1145/3511808.3557391","DOIUrl":"https://doi.org/10.1145/3511808.3557391","url":null,"abstract":"The goal of sequential recommendation is to predict the next item that a user would like to interact with, by capturing her dynamic historical behaviors. However, most existing sequential recommendation methods do not focus on solving the long-tail item recommendation problem that is caused by the imbalanced distribution of item data. To solve this problem, we propose a novel sequential recommendation framework, named MASR (ie Memory Bank Augmented Long-tail Sequential Recommendation). MASR is an \"Open-book'' model that combines novel types of memory banks and a retriever-copy network to alleviate the long-tail problem. During inference, the designed retriever-copy network retrieves related sequences from the training samples and copies the useful information as a cue to improve the recommendation performance on tail items. Two designed memory banks provide reference samples to the retriever-copy network by memorizing the historical samples appearing in the training phase. Extensive experiments have been performed on five real-world datasets to demonstrate the effectiveness of the proposed MASR model. The experimental results indicate that MASR consistently outperforms baseline methods in terms of recommendation performance on tail items.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132528446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models 大规模预排名系统的再思考:全链跨领域模型
Jinbo Song, Ruoran Huang, Xinyang Wang, Wei Huang, Qian Yu, Mingming Chen, Yafei Yao, Chaosheng Fan, Changping Peng, Zhangang Lin, Jinghe Hu, Jingping Shao
{"title":"Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models","authors":"Jinbo Song, Ruoran Huang, Xinyang Wang, Wei Huang, Qian Yu, Mingming Chen, Yafei Yao, Chaosheng Fan, Changping Peng, Zhangang Lin, Jinghe Hu, Jingping Shao","doi":"10.1145/3511808.3557683","DOIUrl":"https://doi.org/10.1145/3511808.3557683","url":null,"abstract":"Industrial systems such as recommender systems and online advertising, have been widely equipped with multi-stage architectures, which are divided into several cascaded modules, including matching, pre-ranking, ranking and re-ranking. As a critical bridge between matching and ranking, existing pre-ranking approaches mainly endure sample selection bias (SSB) problem owing to ignoring the entire-chain data dependence, resulting in sub-optimal performances. In this paper, we rethink pre-ranking system from the perspective of the entire sample space, and propose Entire-chain Cross-domain Models (ECM), which leverage samples from the whole cascaded stages to effectively alleviate SSB problem. Besides, we design a fine-grained neural structure named ECMM to further improve the pre-ranking accuracy. Specifically, we propose a cross-domain multi-tower neural network to comprehensively predict for each stage result, and introduce the sub-networking routing strategy with L0 regularization to reduce computational costs. Evaluations on real-world large-scale traffic logs demonstrate that our pre-ranking models outperform SOTA methods while time consumption is maintained within an acceptable level, which achieves better trade-off between efficiency and effectiveness.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130005482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Uncertainty-Aware Imputation Framework for Alleviating the Sparsity Problem in Collaborative Filtering 一种缓解协同过滤稀疏性问题的不确定性感知Imputation框架
Sung-Woong Hwang, Dong-Kyu Chae
{"title":"An Uncertainty-Aware Imputation Framework for Alleviating the Sparsity Problem in Collaborative Filtering","authors":"Sung-Woong Hwang, Dong-Kyu Chae","doi":"10.1145/3511808.3557236","DOIUrl":"https://doi.org/10.1145/3511808.3557236","url":null,"abstract":"Collaborative Filtering (CF) methods for recommender systems commonly suffer from the data sparsity issue. Data imputation has been widely adopted to deal with this issue. However, existing studies have limitations in the sense that both uncertainty and robustness of imputation have not been taken into account, where there is a high risk that the imputed values are likely to be far from the true values. This paper explores a novel imputation framework, named Uncertainty-Aware Multiple Imputation (UA-MI), which can effectively solve the sparsity issue. Given a (sparse) user-item interaction matrix, our key idea is to quantify uncertainty on each missing entry and then the cells with the lowest uncertainty are selectively imputed. Here, we suggest three strategies for measuring uncertainty in missing user-item interactions, each of which is based on sampling, dropout, and ensemble, respectively. They successfully obtain element-wise mean and variance on the missing entries, where the variance helps determine where in the matrix should be imputed and the corresponding mean values are imputed. Experiments show that our UA-MI framework significantly outperformed the existing imputation strategies","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130152854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Model-Centric Explainer for Graph Neural Network based Node Classification 基于节点分类的图神经网络模型中心解释器
Sayan Saha, Monidipa Das, S. Bandyopadhyay
{"title":"A Model-Centric Explainer for Graph Neural Network based Node Classification","authors":"Sayan Saha, Monidipa Das, S. Bandyopadhyay","doi":"10.1145/3511808.3557535","DOIUrl":"https://doi.org/10.1145/3511808.3557535","url":null,"abstract":"Graph Neural Networks (GNNs) learn node representations by aggregating a node's feature vector with its neighbors. They perform well across a variety of graph tasks. However, to enhance the reliability and trustworthiness of these models during use in critical scenarios, it is of essence to look into the decision making mechanisms of these models rather than treating them as black boxes. Our model-centric method gives insight into the kind of information learnt by GNNs about node neighborhoods during the task of node classification. We propose a neighborhood generator as an explainer that generates optimal neighborhoods to maximize a particular class prediction of the trained GNN model. We formulate neighborhood generation as a reinforcement learning problem and use a policy gradient method to train our generator using feedback from the trained GNN-based node classifier. Our method provides intelligible explanations of learning mechanisms of GNN models on synthetic as well as real-world datasets and even highlights certain shortcomings of these models.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130292091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Global and Local Feature Interaction with Vision Transformer for Few-shot Image Classification 基于视觉变换的局部特征与全局特征交互的少镜头图像分类
M. Sun, Weizhi Ma, Yang Liu
{"title":"Global and Local Feature Interaction with Vision Transformer for Few-shot Image Classification","authors":"M. Sun, Weizhi Ma, Yang Liu","doi":"10.1145/3511808.3557604","DOIUrl":"https://doi.org/10.1145/3511808.3557604","url":null,"abstract":"Image classification is a classical machine learning task and has been widely used. Due to the high costs of annotation and data collection in real scenarios, few-shot learning has become a vital technique to improve image classification performances. However, most existing few-shot image classification methods only focus on modeling the global image feature or image local patches, which ignore the global-local interactions. In this study, we propose a new method, named GL-ViT, to integrate both global and local features to fully exploit the few-shot samples for image classification. Firstly, we design a feature extractor module to calculate the interactions between the global representation and local patch embeddings, where ViT is also adopted to achieve efficient and effective image representation. Then, Earth Mover's Distance is adopted to measure the similarity between two images. Abundant Experimental results on several widely-used open datasets show that GL-ViT outperforms state-of-the-art algorithms significantly, and our ablation studies also verify the effectiveness of both global-local features.","PeriodicalId":389624,"journal":{"name":"Proceedings of the 31st ACM International Conference on Information & Knowledge Management","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134044909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信