World Wide WebPub Date : 2024-08-16DOI: 10.1007/s11280-024-01292-1
Jianian Zhu, Yichen Li, Haozhao Wang, Yining Qi, Ruixuan Li
{"title":"Hypernetwork-driven centralized contrastive learning for federated graph classification","authors":"Jianian Zhu, Yichen Li, Haozhao Wang, Yining Qi, Ruixuan Li","doi":"10.1007/s11280-024-01292-1","DOIUrl":"https://doi.org/10.1007/s11280-024-01292-1","url":null,"abstract":"<p>In the domain of Graph Federated Learning (GFL), prevalent methods often focus on local client data, which can limit the understanding of broader global patterns and pose challenges with Non-IID (Non-Independent and Identically Distributed) issues in cross-domain datasets. Direct aggregation can lead to a reduction in the differences among various clients, which is detrimental to personalized datasets. Contrastive Learning (CL) has emerged as an effective tool for enhancing a model’s ability to distinguish variations across diverse views but has not been fully leveraged in GFL. This study introduces a novel hypernetwork-based method, termed CCL (Centralized Contrastive Learning), which is a server-centric innovation that effectively addresses the challenges posed by traditional client-centric approaches in heterogeneous datasets. CCL integrates global patterns from multiple clients, capturing a wider range of patterns and significantly improving GFL performance. Our extensive experiments, including both supervised and unsupervised scenarios, demonstrate CCL’s superiority over existing models, its remarkable compatibility with standard backbones, and its ability to enhance GFL performance across various settings.</p>","PeriodicalId":501180,"journal":{"name":"World Wide Web","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142204997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint marginal and central sample learning for domain adaptation","authors":"Shaohua Teng, Wenjie Liu, Luyao Teng, Zefeng Zheng, Wei Zhang","doi":"10.1007/s11280-024-01290-3","DOIUrl":"https://doi.org/10.1007/s11280-024-01290-3","url":null,"abstract":"<p>Domain adaptation aims to alleviate the impact of distribution differences when migrating knowledge from the source domain to the target domain. However, two issues remain to be addressed. One is the difficulty of learning both marginal and specific knowledge at the same time. The other is the low quality of pseudo labels in target domain can constrain the performance improvement during model iteration. To solve the above problems, we propose a domain adaptation method called Joint Marginal and Central Sample Learning (JMCSL). This method consists of three parts which are marginal sample learning (MSL), central sample learning (CSL) and unified strategy for multi-classifier (USMC). MSL and CSL aim to better learning of common and specific knowledge. USMC improves the accuracy and stability of pseudo labels in the target domain. Specifically, MSL learns specific knowledge from a novel triple distance, which is defined by sample pair and their class center. CSL uses the closest class center and the second closest class center of samples to retain the common knowledge. USMC selects label consistent samples by applying K-Nearest Neighbors (KNN) and Structural Risk Minimization (SRM), while it utilizes the class centers of both two domains for classification. Finally, extensive experiments on four visual datasets demonstrate that JMCSL is superior to other competing methods.</p>","PeriodicalId":501180,"journal":{"name":"World Wide Web","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142205002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
World Wide WebPub Date : 2024-08-02DOI: 10.1007/s11280-024-01293-0
Chuhan Zhang, Jianzhong Li, Shouxu Jiang
{"title":"Durable reverse top-k queries on time-varying preference","authors":"Chuhan Zhang, Jianzhong Li, Shouxu Jiang","doi":"10.1007/s11280-024-01293-0","DOIUrl":"https://doi.org/10.1007/s11280-024-01293-0","url":null,"abstract":"<p>Recently, a query, called reverse top-<span>(varvec{k})</span> query, is proposed. The reverse top-<span>(varvec{k})</span> query takes an object as input and retrieves the users whose top-<span>(varvec{k})</span> query results include the object while the top-<span>(varvec{k})</span> query retrieves the top-<span>(varvec{k})</span> matching objects based on the user preference. In business analysis, reverse top-<span>(varvec{k})</span> queries are crucial for evaluating product impact and potential market. However, the reverse top-<span>(varvec{k})</span> query assumes that user’s preference is static. In practice, user preference may change with moods, seasons, economic conditions or other reasons. To overcome this disadvantage, this paper proposes a new reverse top-<span>(varvec{k})</span> query, named as durable reverse top-<span>(varvec{k})</span> query, without limitation of user’s preference being static. The durable reverse top-<span>(varvec{k})</span> query retrieves users who put a given object in the top-<span>(varvec{k})</span> favorite objects most of the time during a given time period. An efficient pruning-based algorithm for the queries with fixed <span>(varvec{k})</span> is proposed in this paper. For the case of <span>(varvec{k})</span> being variable, this paper proposes a pruning-based algorithm with an index to achieve a trade-off between time and space. Experiments on both real and synthetic datasets demonstrate that the proposed algorithms are very efficient.</p>","PeriodicalId":501180,"journal":{"name":"World Wide Web","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141881739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
World Wide WebPub Date : 2024-07-19DOI: 10.1007/s11280-024-01289-w
Yunpeng Wang, Bo Ning, Xin Wang, Guanyu Li
{"title":"Multi-hop neighbor fusion enhanced hierarchical transformer for multi-modal knowledge graph completion","authors":"Yunpeng Wang, Bo Ning, Xin Wang, Guanyu Li","doi":"10.1007/s11280-024-01289-w","DOIUrl":"https://doi.org/10.1007/s11280-024-01289-w","url":null,"abstract":"<p>Multi-modal knowledge graph (MKG) refers to a structured semantic network that accurately represents the real-world information by incorporating multiple modalities. Existing researches primarily focus on leveraging multi-modal fusion to enhance the representation capability of entity nodes and link prediction to deal with the incompleteness of the MKG. However, the inherent heterogeneity between structural modality and semantic modality poses challenges to the multi-modal fusion, as noise interference could compromise the effectiveness of the fusion representation. In this study, we propose a novel hierarchical Transformer architecture, named MNFormer, which captures the structural and semantic information while avoiding heterogeneity issues by fully integrating both multi-hop neighbor paths and image-text embeddings. During the encoding stage of MNFormer, we design multiple layers of Multi-hop Neighbor Fusion (MNF) module that employ attentions to merge the image and text features. These MNF modules progressively fuse the information of neighboring entities hop by hop along the neighbor paths of the source entity. The Transformer during decoding stage is then utilized to integrate the outputs of all MNF modules, whose output is subsequently employed to match target entities and accomplish MKG completion. Moreover, we develop a semantic direction loss to enhance the fitting performance of MNFormer. Experimental results on four datasets demonstrate that MNFormer exhibits notable competitiveness when compared to the state-of-the-art models. Additionally, ablation studies showcase the significant ability of MNFormer to effectively combine structural and semantic information, leading to enhanced performance through complementary enhancements.</p>","PeriodicalId":501180,"journal":{"name":"World Wide Web","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141746371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
World Wide WebPub Date : 2024-07-18DOI: 10.1007/s11280-024-01288-x
Bingchen Liu, Huang Peng, Weixin Zeng, Xiang Zhao, Shijun Liu, Li Pan, Xin Li
{"title":"Open knowledge base canonicalization with multi-task learning","authors":"Bingchen Liu, Huang Peng, Weixin Zeng, Xiang Zhao, Shijun Liu, Li Pan, Xin Li","doi":"10.1007/s11280-024-01288-x","DOIUrl":"https://doi.org/10.1007/s11280-024-01288-x","url":null,"abstract":"<p>The construction of large open knowledge bases (OKBs) is integral to many knowledge-driven applications on the world wide web such as web search. However, noun phrases in OKBs often suffer from redundancy and ambiguity, which calls for the investigation on OKB canonicalization. Current solutions address OKB canonicalization by devising advanced clustering algorithms and using knowledge graph embedding (KGE) to further facilitate the canonicalization process. Nevertheless, these works fail to fully exploit the synergy between clustering and KGE learning, and the methods designed for these sub-tasks are sub-optimal. To this end, we put forward a multi-task learning framework, namely <span>MulCanon</span>, to tackle OKB canonicalization. Specifically, diffusion model is used in the soft clustering process to improve the noun phrase representations with neighboring information, which can lead to more accurate representations. <span>MulCanon</span> unifies the learning objective of diffusion model, KGE model, side information and cluster assignment, and adopts a two-stage multi-task learning paradigm for training. A thorough experimental study on popular OKB canonicalization benchmarks validates that <span>MulCanon</span> can achieve competitive canonicalization results.</p>","PeriodicalId":501180,"journal":{"name":"World Wide Web","volume":"179 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141742812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
World Wide WebPub Date : 2024-07-18DOI: 10.1007/s11280-024-01256-5
Tao Wu, Jiali Mao, Yifan Zhu, Kaixuan Zhu, Aoying Zhou
{"title":"Multi-view context awareness based transport stay hotspot recognizing","authors":"Tao Wu, Jiali Mao, Yifan Zhu, Kaixuan Zhu, Aoying Zhou","doi":"10.1007/s11280-024-01256-5","DOIUrl":"https://doi.org/10.1007/s11280-024-01256-5","url":null,"abstract":"<p>During long-distance transporting for bulk commodities, the trucks need to stop off at multiple places for resting, refueling, repairing or unloading, called as transport stay hotspots (or <i>Tshot</i> for short). Massive waybills and their related trajectories accumulated by the freight platforms enable us to recognize <i>Tshot</i>s and keep them updated constantly. But due to most of <i>Tshot</i>s have varying sizes and are adjacent to each other, it is hard to pinpoint their locations precisely. In addition, to correctly annotate functional tags of <i>Tshots</i> that have fewer visiting trajectories is quite difficult. In this paper, we propose a <u>M</u>ulti-view <u>C</u>ontext awareness based transport <span>(underline{S})</span><i>tay hotspot</i> <u>R</u>ecognition framework, called <i>MCSR</i>, consisting of <i>location identification</i>, <i>feature extraction</i> and <i>functional tag annotation</i>. To address the missed-detection issue in pinpointing adjacent <i>Tshots</i> having various sizes, we design a <i>multi-view clustering</i> based stay area merging strategy by incorporating the distance between <i>road turn-off locations</i>, the number of visiting trajectories with the similarity of <i>visiting time distribution</i>. Further, aiming at the issue of low annotating precision resulted by data scarcity, based on extracting <i>behavioral features</i> and <i>attribute features</i> from waybill trajectories, we leverage a <i>time interval awareness self-attention network</i> to extract <i>semantic contextual features</i> to assist in ensemble learning based annotation modeling. Experimental results on a large-scale logistics dataset demonstrate that our proposal can improve <i>F-measure</i> by an average of 14.76%, <i>AIoU</i> by an average of 12.89% for <i>location identification</i>, and <i>G-mean</i> by an average of 18.39% and <i>mAUC</i> by an average of 14.48% for <i>functional tag annotation</i> as compared to the baselines.</p>","PeriodicalId":501180,"journal":{"name":"World Wide Web","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141742810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical adaptive evolution framework for privacy-preserving data publishing","authors":"Mingshan You, Yong-Feng Ge, Kate Wang, Hua Wang, Jinli Cao, Georgios Kambourakis","doi":"10.1007/s11280-024-01286-z","DOIUrl":"https://doi.org/10.1007/s11280-024-01286-z","url":null,"abstract":"<p>The growing need for data publication and the escalating concerns regarding data privacy have led to a surge in interest in Privacy-Preserving Data Publishing (PPDP) across research, industry, and government sectors. Despite its significance, PPDP remains a challenging NP-hard problem, particularly when dealing with complex datasets, often rendering traditional traversal search methods inefficient. Evolutionary Algorithms (EAs) have emerged as a promising approach in response to this challenge, but their effectiveness, efficiency, and robustness in PPDP applications still need to be improved. This paper presents a novel Hierarchical Adaptive Evolution Framework (HAEF) that aims to optimize <i>t</i>-closeness anonymization through attribute generalization and record suppression using Genetic Algorithm (GA) and Differential Evolution (DE). To balance GA and DE, the first hierarchy of HAEF employs a GA-prioritized adaptive strategy enhancing exploration search. This combination aims to strike a balance between exploration and exploitation. The second hierarchy employs a random-prioritized adaptive strategy to select distinct mutation strategies, thus leveraging the advantages of various mutation strategies. Performance bencmark tests demonstrate the effectiveness and efficiency of the proposed technique. In 16 test instances, HAEF significantly outperforms traditional depth-first traversal search and exceeds the performance of previous state-of-the-art EAs on most datasets. In terms of overall performance, under the three privacy constraints tested, HAEF outperforms the conventional DFS search by an average of 47.78%, the state-of-the-art GA-based ID-DGA method by an average of 37.38%, and the hybrid GA-DE method by an average of 8.35% in TLEF. Furthermore, ablation experiments confirm the effectiveness of the various strategies within the framework. These findings enhance the efficiency of the data publishing process, ensuring privacy and security and maximizing data availability.</p>","PeriodicalId":501180,"journal":{"name":"World Wide Web","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141611510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MIM: A multiple integration model for intrusion detection on imbalanced samples","authors":"Zhiqiang Zhang, Le Wang, Junyi Zhu, Dong Zhu, Zhaoquan Gu, Yanchun Zhang","doi":"10.1007/s11280-024-01285-0","DOIUrl":"https://doi.org/10.1007/s11280-024-01285-0","url":null,"abstract":"<p>The quantity of normal samples is commonly significantly greater than that of malicious samples, resulting in an imbalance in network security data. When dealing with imbalanced samples, the classification model requires careful sampling and attribute selection methods to cope with bias towards majority classes. Simple data sampling methods and incomplete feature selection techniques cannot improve the accuracy of intrusion detection models. In addition, a single intrusion detection model cannot accurately classify all attack types in the face of massive imbalanced security data. Nevertheless, the existing model integration methods based on stacking or voting technologies suffer from high coupling that undermines their stability and reliability. To address these issues, we propose a Multiple Integration Model (MIM) to implement feature selection and attack classification. First, MIM uses random Oversampling, random Undersampling and Washing Methods (OUWM) to reconstruct the data. Then, a modified simulated annealing algorithm is employed to generate candidate features. Finally, an integrated model based on Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost) and gradient Boosting with Categorical features support (CatBoost) is designed to achieve intrusion detection and attack classification. MIM leverages a Rule-based and Priority-based Ensemble Strategy (RPES) to combine the high accuracy of the former and the high effectiveness of the latter two, improving the stability and reliability of the integration model. We evaluate the effectiveness of our approach on two publicly available intrusion detection datasets, as well as a dataset created by researchers from the University of New Brunswick and another dataset collected by the Australian Center for Cyber Security. In our experiments, MIM significantly outperforms several existing intrusion detection models in terms of accuracy. Specifically, compared to two recently proposed methods, namely, the reinforcement learning method based on the adaptive sample distribution dual-experience replay pool mechanism (ASD2ER) and the method that combines Auto Encoder, Principal Component Analysis, and Long Short-Term Memory (AE+PCA+LSTM), MIM exhibited a respective enhancement in intrusion detection accuracy by 1.35% and 1.16%.</p>","PeriodicalId":501180,"journal":{"name":"World Wide Web","volume":"71 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
World Wide WebPub Date : 2024-07-10DOI: 10.1007/s11280-024-01270-7
Bingfeng Li, Xiaoting Xie, Shuang Qiao, Shilei Tan
{"title":"Unveiling the impact of employee-customer familiarity on customer purchase intentions: an empirical investigation within the realm of web-based date analytics","authors":"Bingfeng Li, Xiaoting Xie, Shuang Qiao, Shilei Tan","doi":"10.1007/s11280-024-01270-7","DOIUrl":"https://doi.org/10.1007/s11280-024-01270-7","url":null,"abstract":"<p>This research delves into the intricate dynamics of employee-customer familiarity and its profound influence on customer purchase intentions within the burgeoning domain of web-based data analytics. In an era characterized by an increasingly digital marketplace, understanding the nuanced interactions between employees and customers is paramount for businesses striving to enhance customer relationships and drive purchase decisions. Drawing on empirical investigations, this study unravels the multifaceted facets of employee-customer familiarity, seeking to shed light on its implications for customer purchase intentions in the context of web-based data analytics. In this paper, an empirical study investigates the influence of employee-customer familiarity on customers’ purchase intention for the home bedding industry, summarizes the current situation, puts forward research hypotheses and constructs a model of the effect of employee-customer familiarity on purchase intention was constructed. The familiarity of buyers and sellers was evaluated through a customer questionnaire, which provided subjective insights into the strength of interpersonal relationships. Meanwhile, confidence analysis, ANOVA (analysis of variance), correlation analysis and regression analysis were conducted on the survey data to explore the actual effects of these relationships on customers’ purchase intention, and the positive effects of the five hypotheses on purchase intention were investigated. The anticipated findings suggest that increasing employee-customer familiarity positively impacts customers’ purchase intentions, thereby illuminating the critical role of personalized interactions in driving business outcomes. Furthermore, the study sought to reveal the nuances of this relationship, recognizing the potential impact of different customer characteristics and industry contexts. Practical implications center on guiding companies in aligning their strategies to improve customer satisfaction and loyalty. From staff training programmes to targeted marketing campaigns, from brand influence to web e-commerce platform optimisation, businesses can use the insights gained from this research to build more meaningful connections with their customers. Building more meaningful connections with customers.</p>","PeriodicalId":501180,"journal":{"name":"World Wide Web","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}