{"title":"Enhanced Multi-Scale Features Mutual Mapping Fusion Based on Reverse Knowledge Distillation for Industrial Anomaly Detection and Localization","authors":"Guoxiang Tong;Quanquan Li;Yan Song","doi":"10.1109/TBDATA.2024.3350539","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3350539","url":null,"abstract":"Unsupervised anomaly detection methods based on knowledge distillation have exhibited promising results. However, there is still room for improvement in the differential characterization of anomalous samples. In this article, a novel anomaly detection and localization model based on reverse knowledge distillation is proposed, where an enhanced multi-scale feature mutual mapping feature fusion module is proposed to greatly extract discrepant features at different scales. This module helps enhance the difference in anomaly region representation in the teacher-student structure by inhomogeneously fusing features at different levels. Then, the coordinate attention mechanism is introduced in the reverse distillation structure to pay special attention to dominant issues, facilitating nice direction guidance and position encoding. Furthermore, an innovative single-category embedding memory bank, inspired by human memory mechanisms, is developed to normalize single-category embedding to encourage high-quality model reconstruction. Finally, in several categories of the well-known MVTec dataset, our model achieves better results than state-of-the-art models in terms of AUROC and PRO, with an overall average of 98.1%, 98.3%, and 95.0% for detection AUROC scores, localization AUROC scores, and localization PRO scores, respectively, across 15 categories. Extensive experiments are conducted on the ablation study to validate the contribution of each component of the model.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"498-513"},"PeriodicalIF":7.5,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable Unsupervised Hashing via Exploiting Robust Cross-Modal Consistency","authors":"Xingbo Liu;Jiamin Li;Xiushan Nie;Xuening Zhang;Shaohua Wang;Yilong Yin","doi":"10.1109/TBDATA.2024.3350541","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3350541","url":null,"abstract":"Unsupervised cross-modal hashing has received increasing attention because of its efficiency and scalability for large-scale data retrieval and analysis. However, existing unsupervised cross-modal hashing methods primarily focus on learning shared feature embedding, ignoring robustness and consistency across different modalities. To this end, this study proposes a novel method called scalable unsupervised hashing (SUH) for large-scale cross-modal retrieval. In the proposed method, latent semantic information and common semantic embedding within heterogeneous data are simultaneously exploited using multimodal clustering and collective matrix factorization, respectively. Furthermore, the robust norm is seamlessly integrated into the two processes, making SUH insensitive to outliers. Based on the robust consistency exploited from the latent semantic information and feature embedding, hash codes can be learned discretely to avoid cumulative quantitation loss. The experimental results on five benchmark datasets demonstrate the effectiveness of the proposed method under various scenarios.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"514-527"},"PeriodicalIF":7.5,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Few-Shot Learning With Multi-Granularity Knowledge Fusion and Decision-Making","authors":"Yuling Su;Hong Zhao;Yifeng Zheng;Yu Wang","doi":"10.1109/TBDATA.2024.3350542","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3350542","url":null,"abstract":"Few-shot learning (FSL) is a challenging task in classifying new classes from few labelled examples. Many existing models embed class structural knowledge as prior knowledge to enhance FSL against data scarcity. However, they fall short of connecting the class structural knowledge with the limited visual information which plays a decisive role in FSL model performance. In this paper, we propose a unified FSL framework with multi-granularity knowledge fusion and decision-making (MGKFD) to overcome the limitation. We aim to simultaneously explore the visual information and structural knowledge, working in a mutual way to enhance FSL. On the one hand, we strongly connect global and local visual information with multi-granularity class knowledge to explore intra-image and inter-class relationships, generating specific multi-granularity class representations with limited images. On the other hand, a weight fusion strategy is introduced to integrate multi-granularity knowledge and visual information to make the classification decision of FSL. It enables models to learn more effectively from limited labelled examples and allows generalization to new classes. Moreover, considering varying erroneous predictions, a hierarchical loss is established by structural knowledge to minimize the classification loss, where greater degree of misclassification is penalized more. Experimental results on three benchmark datasets show the advantages of MGKFD over several advanced models.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"486-497"},"PeriodicalIF":7.5,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SCOREH+: A High-Order Node Proximity Spectral Clustering on Ratios-of-Eigenvectors Algorithm for Community Detection","authors":"Yanhui Zhu;Fang Hu;Lei Hsin Kuo;Jia Liu","doi":"10.1109/TBDATA.2023.3346715","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3346715","url":null,"abstract":"The research on complex networks has achieved significant progress in revealing the mesoscopic features of networks. Community detection is an important aspect of understanding real-world complex systems. We present in this paper a High-order node proximity Spectral Clustering on Ratios-of-Eigenvectors (SCOREH+) algorithm for locating communities in complex networks. The algorithm improves SCORE and SCORE+ and preserves high-order transitivity information of the network affinity matrix. We optimize the high-order proximity matrix from the initial affinity matrix using the Radial Basis Functions (RBFs) and Katz index. In addition to the optimization of the Laplacian matrix, we implement a procedure that joins an additional eigenvector (the \u0000<inline-formula><tex-math>$(k+1){rm th}$</tex-math></inline-formula>\u0000 leading eigenvector) to the spectrum domain for clustering if the network is considered to be a “weak signal” graph. The algorithm has been successfully applied to both real-world and synthetic data sets. The proposed algorithm is compared with state-of-art algorithms, such as ASE, Louvain, Fast-Greedy, Spectral Clustering (SC), SCORE, and SCORE+. To demonstrate the high efficacy of the proposed method, we conducted comparison experiments on eleven real-world networks and a number of synthetic networks with noise. The experimental results in most of these networks demonstrate that SCOREH+ outperforms the baseline methods. Moreover, by tuning the RBFs and their shaping parameters, we may generate state-of-the-art community structures on all real-world networks and even on noisy synthetic networks.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 3","pages":"301-312"},"PeriodicalIF":7.2,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Causal Chain Graph Structure via Alternate Learning and Double Pruning","authors":"Shujing Yang;Fuyuan Cao;Kui Yu;Jiye Liang","doi":"10.1109/TBDATA.2023.3346712","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3346712","url":null,"abstract":"Causal chain graphs model the dependency structure between individuals when the assumption of individual independence in causal inference is violated. However, causal chain graphs are often unknown in practice and require learning from data. Existing learning algorithms have certain limitations. Specifically, learning local information requires multiple subset searches, building the skeleton requires additional conditional independence testing, and directing the edges requires obtaining local information from the skeleton again. To remedy these problems, we propose a novel algorithm for learning causal chain graph structure. The algorithm alternately learns the adjacencies and spouses of each variable as local information and doubly prunes them to obtain more accurate local information, which reduces subset searches, improves its accuracy, and facilitates subsequent learning. It then directly constructs the chain graphs skeleton using the learned adjacencies without conditional independence testing. Finally, it directs the edges of complexes using the learned adjacencies and spouses to learn chain graphs without reacquiring local information, further improving its efficiency. We conduct theoretical analysis to prove the correctness of our algorithm and compare it with the state-of-the-art algorithms on synthetic and real-world datasets. The experimental results demonstrate our algorithm is more reliable than its rivals.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"442-456"},"PeriodicalIF":7.5,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cascaded Knowledge-Level Fusion Network for Online Course Recommendation System","authors":"Wenjun Ma;Yibing Zhao;Xiaomao Fan","doi":"10.1109/TBDATA.2023.3346711","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3346711","url":null,"abstract":"In light of the global proliferation of the COVID-19 pandemic, there is a notable surge in public interest towards Massive Open Online Courses (MOOCs) recently. Within the realm of personalized course-learning services, large amounts of online course recommendation systems have been developed to cater to the diverse needs of learners. However, despite these advancements, there still exist three unsolved challenges: 1) how to effectively utilize the course information spanning from the title-level to the more granular keyword-level; 2) how to well capture the sequential information among learning courses; 3) how to identify the high-correlated courses in the course corpora. To address these challenges, we propose a novel solution known as \u0000<bold>C</b>\u0000ascaded \u0000<bold>K</b>\u0000nowledge-level \u0000<bold>F</b>\u0000usion \u0000<bold>N</b>\u0000etwork (CKFN) for online course recommendation with incorporating a three-fold approach to maximize the utilization of course information: 1) two knowledge graphs spanning from the keyword-level to title-level; 2) a two-stage attention fusion mechanism; 3) a novel knowledge-aware negative sampling method. Experimental results on a real dataset of XuetangX demonstrate that CKFN surpasses existing baseline models by a substantial margin, thereby achieving the state-of-the-art recommendation performance. It means that CKFN can be potentially deployed into MOOCs platforms as a pivotal component to provide personalized course recommendation service.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"457-469"},"PeriodicalIF":7.5,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiao Zhang;Zhaoqian He;Jinhai Li;Changlin Mei;Yanyan Yang
{"title":"Bi-Selection of Instances and Features Based on Neighborhood Importance Degree","authors":"Xiao Zhang;Zhaoqian He;Jinhai Li;Changlin Mei;Yanyan Yang","doi":"10.1109/TBDATA.2023.3342643","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3342643","url":null,"abstract":"As one of the most important concepts for classification learning, neighborhood granules obtained by dividing adjacent objects or instances can be regarded as the minimal elements to simulate human cognition. At present, neighborhood granules have been successfully applied to knowledge acquisition. Nevertheless, little work has been devoted to the simultaneous selection of features and instances by the use of neighborhood granules. To fill this gap, we investigate in this paper the issue of bi-selection of instances and features based on neighborhood importance degree (NID). First, the conditional neighborhood entropy is defined to measure decision uncertainty of a neighborhood granule. Considering both decision uncertainty and coverage ability of a neighborhood granule, we propose the concept of NID. Then, an instance selection algorithm is formulated to select representative instances based on NID. Furthermore, an NID-based feature selection algorithm is provided for a neighborhood decision system. By integrating the instance selection and feature selection methods, a bi-selection approach based on NID (BSNID) is finally proposed to select instances and features. Lastly, some numerical experiments are conducted to evaluate the performance of BSNID. The results demonstrate that BSNID can take account of both reduction ratio and classification accuracy and, therefore, performs satisfactorily in effectiveness.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"415-428"},"PeriodicalIF":7.5,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianghe Cai;Yuhui Deng;Yi Zhou;Jiande Huang;Geyong Min
{"title":"FIG: Feature-Weighted Information Granules With High Consistency Rate","authors":"Jianghe Cai;Yuhui Deng;Yi Zhou;Jiande Huang;Geyong Min","doi":"10.1109/TBDATA.2023.3343348","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3343348","url":null,"abstract":"Information granules are effective in revealing the structure of data. Therefore, it is a common practice in data mining to use information granules for classifying datasets. In the existing granular classifiers, the information granules are often classified according to the standard membership function only without considering the influence of different feature weights on the quality of granules and label classification results. In this article, we utilize the feature weighting of data to produce the information granules with high consistency rate called FIG. First, we use consistency rate and contribution scores to generate information granules. Then, we propose a granular two-stage classifier GTC based on FIG. GTC divides the data into fuzzy and fixed points and then calculates the interval matching degree to assign data points to the most suitable cluster in the second step. Finally, we compare FIG with two state-of-the-art granular models (T-GrM and FGC-rule), and classification accuracy is also compared with other classification algorithms. The extensive experiments on synthetic datasets and public datasets from UCI show that FIG has sufficient performance to describe the data structure and excellent capability under the constructed granular classifier GTC. Compared with T-GrM and FGC-rule, the time overhead required for FIG to obtain information granules is reduced by an average of 51.07%, the per unit quality of the granules is also increased by more than 14.74%. Compared with other classification algorithms, an average of 5.04% improves GTC accuracy.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"400-414"},"PeriodicalIF":7.5,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph Contrastive Learning for Clustering of Multi-Layer Networks","authors":"Yifei Yang;Xiaoke Ma","doi":"10.1109/TBDATA.2023.3343349","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3343349","url":null,"abstract":"Multi-layer networks precisely model complex systems in society and nature with various types of interactions, and identifying conserved modules that are well-connected in all layers is of great significance for revealing their structure-function relationships. Current algorithms are criticized for either ignoring the intrinsic relations among various layers, or failing to learn discriminative features. To attack these limitations, a novel graph contrastive learning framework for clustering of multi-layer networks is proposed by joining nonnegative matrix factorization and graph contrastive learning (called jNMF-GCL), where the intrinsic structure and discriminative of features are simultaneously addressed. Specifically, features of vertices are first learned by preserving the conserved structure in multi-layer networks with matrix factorization, and then jNMF-GCL learns an affinity structure of vertices by manipulating features of various layers. To enhance quality of features, contrastive learning is executed by selecting the positive and negative samples from the constructed affinity graph, which significantly improves discriminative of features. Finally, jNMF-GCL incorporates feature learning, construction of affinity graph, contrastive learning and clustering into an overall objective, where global and local structural information are seamlessly fused, providing a more effective way to describe structure of multi-layer networks. Extensive experiments conducted on both artificial and real-world networks have shown the superior performance of jNMF-GCL over state-of-the-art models across various metrics.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"429-441"},"PeriodicalIF":7.5,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient and Privacy-Preserving Aggregate Query Over Public Property Graphs","authors":"Yunguo Guan;Rongxing Lu;Songnian Zhang;Yandong Zheng;Jun Shao;Guiyi Wei","doi":"10.1109/TBDATA.2023.3342623","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3342623","url":null,"abstract":"Graph data structures’ ability of representing vertex relationships has made them increasingly popular in recent years. Amid this trend, many property graph datasets have been collected and made public to facilitate a variant of queries such as the aggregate queries that will be extensively exploited in this paper. While cloud deployment of both the datasets and query services is intriguing, it could raise privacy concerns related to user queries and results. In past years, many works on graph privacy have been put forth, however they either do not consider query privacy or cannot be adapted for aggregate queries. Some others consider queries over encrypted graphs but cannot protect access pattern privacy. In particular, when deploying them to handle queries over public graph datasets, the cloud server can infer additional information related to user queries. Aiming at this challenge, we propose a privacy-preserving property graph aggregate query scheme in this paper. Specifically, we first design new privacy-preserving vertex matching and matching update techniques, which securely initialize and update the mapping between vertices in the dataset and the user-specified patterns, respectively. Based on them, we construct our proposed scheme to achieve aggregate queries over public property graphs. Rigid security analysis shows that our proposed scheme can protect the privacy of user queries and results as well as achieve access pattern privacy. In addition, extensive experiments also demonstrate the efficiency of our scheme in terms of computational overheads.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 2","pages":"146-157"},"PeriodicalIF":7.2,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140123523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}