IEEE Transactions on Big Data最新文献_第4页

Fine-Tuning a Biased Model for Improving Fairness 对有偏见的模型进行微调以提高公平性

IF 7.5 3区计算机科学

IEEE Transactions on Big Data Pub Date : 2024-09-13 DOI: 10.1109/TBDATA.2024.3460537

Huiqiang Chen;Tianqing Zhu;Bo Liu;Wanlei Zhou;Philip S. Yu

{"title":"Fine-Tuning a Biased Model for Improving Fairness","authors":"Huiqiang Chen;Tianqing Zhu;Bo Liu;Wanlei Zhou;Philip S. Yu","doi":"10.1109/TBDATA.2024.3460537","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3460537","url":null,"abstract":"Fairness has emerged as a crucial concern in machine learning since biased models would generate dissimilar predictions for different groups, perpetuating social inequalities. Although numerous techniques have been proposed to address the fairness issue in machine learning, most rely on incorporating fairness constraints during the training phase, rendering them ineffective once the model is deployed. This paper explores the potential of fine-tuning biased models to enhance fairness, particularly suitable for scenarios where retraining the model is not feasible. Our approach is rooted in an empirical analysis of the distribution of bias within a biased model, and we fine-tune the model parameter in a limited scope so that the performance of the original model can be maintained. We first observe that fine-tuning a biased model leads to deviations from its initial state, with deep layers undergoing the most significant changes. We then design and apply a bias-discovery algorithm, revealing that bias predominantly resides in the model’s deep layers. Based on these observations, we propose a straightforward yet highly effective method for debiasing the model: fine-tuning the classification head. We conduct a thorough theoretical analysis to justify the proposed method and provide guidance for fine-tuning. Furthermore, we experimentally validate our method on tabular and image datasets using four networks (CNN, AlexNet, VGG-11, and ResNet-18).","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1397-1410"},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Developing Novel Algorithms for Generating Inexact Data Through Triangle Distribution 通过三角分布生成不精确数据的新算法

IF 7.5 3区计算机科学

IEEE Transactions on Big Data Pub Date : 2024-09-13 DOI: 10.1109/TBDATA.2024.3460529

Muhammad Aslam

引用次数: 0

Combine the Growth of Cascades and Impact of Users for Diffusion Prediction 结合级联的增长和用户的影响进行扩散预测

IF 7.5 3区计算机科学

IEEE Transactions on Big Data Pub Date : 2024-09-13 DOI: 10.1109/TBDATA.2024.3460530

Pengfei Jiao;Peng Yan;Jilin Zhang;Biao Wang;Wang Zhang;Nailiang Zhao

{"title":"Combine the Growth of Cascades and Impact of Users for Diffusion Prediction","authors":"Pengfei Jiao;Peng Yan;Jilin Zhang;Biao Wang;Wang Zhang;Nailiang Zhao","doi":"10.1109/TBDATA.2024.3460530","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3460530","url":null,"abstract":"Information diffusion and diffusion prediction have attracted a great deal of research attention over the past decades. Existing approaches usually make predictions based on the order of the activated users, while recently, some studies have taken the social network into consideration and begun to analyze the influence of neighbors via some graph neural networks. However, they ignore the fact that the interests of users and their neighbors may dynamically change along with the growth of the cascade, and thus fail to model the potential impact of activated users. To address the above shortcomings, we proposed in this paper a deep learning model that combines the <bold>Mode of cascades <bold>Growth and potential <bold>Impact of users (MGI). It leverages GCNs to represent users from the social network to model their static features. Besides, we designed an attention mechanism on the cascade sequence to compute features of activated users, and added the popularity variable to model features of users in cascades. Finally, we combined the growth of cascades and impact of users in our model for diffusion prediction. We conducted extensive experiments on several real-world datasets, and the experimental results demonstrate that our model significantly outperforms the state-of-the-art methods in diffusion prediction.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 2","pages":"887-895"},"PeriodicalIF":7.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143628715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Graph Data Model and Graph Query Language Based on the Monadic Second-Order Logic 基于一元二阶逻辑的图数据模型与图查询语言

IF 7.5 3区计算机科学

IEEE Transactions on Big Data Pub Date : 2024-09-05 DOI: 10.1109/TBDATA.2024.3455172

Yunkai Lou;Chaokun Wang;Songyao Wang

{"title":"Graph Data Model and Graph Query Language Based on the Monadic Second-Order Logic","authors":"Yunkai Lou;Chaokun Wang;Songyao Wang","doi":"10.1109/TBDATA.2024.3455172","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3455172","url":null,"abstract":"With the wide application of graphs in various fields, graph query languages have attracted more and more attention. Existing graph query languages, such as GraphQL and SoQL, mostly have similar expressive power as the first-order logic or its extended versions, and are limited when used to express various queries. In this paper, since the graph data model is the base of the graph query language, we propose a new graph data model with the expressive power of monadic second-order logic (abbr. MSOL), and then present a more expressive SQL-like declarative graph query language named <inline-formula><tex-math>$SOGQL$</tex-math></inline-formula> to support more common queries efficiently. Specifically, a new graph calculus is first proposed based on MSOL for attributed graphs. Then, the new graph data model is proposed. Its graph algebra, which operates on graph sets, has seven fundamental operators such as union, filter, map, and reduce. Next, the graph query language <inline-formula><tex-math>$SOGQL$</tex-math></inline-formula> is proposed based on the graph data model. Since the graph algebra has the same expressive power as the graph calculus, <inline-formula><tex-math>$SOGQL$</tex-math></inline-formula> has the expressive power of MSOL, and can express queries with constraints on subgraphs. Moreover, applied with <inline-formula><tex-math>$SOGQL$</tex-math></inline-formula>, a prototype system named <inline-formula><tex-math>$SOGDB$</tex-math></inline-formula> is implemented. <inline-formula><tex-math>$SOGDB$</tex-math></inline-formula> is applied with <inline-formula><tex-math>$SOGQL$</tex-math></inline-formula>, and the experimental results show its efficiency.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1381-1396"},"PeriodicalIF":7.5,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-View Few-Shot Reasoning for Emerging Entities in Knowledge Graphs 知识图谱中新兴实体的多视图少镜头推理

IF 7.5 3区计算机科学

IEEE Transactions on Big Data Pub Date : 2024-09-03 DOI: 10.1109/TBDATA.2024.3453749

Cheng Yan;Feng Zhao;Xiaohui Tao;Xiaofeng Zhu

{"title":"Multi-View Few-Shot Reasoning for Emerging Entities in Knowledge Graphs","authors":"Cheng Yan;Feng Zhao;Xiaohui Tao;Xiaofeng Zhu","doi":"10.1109/TBDATA.2024.3453749","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3453749","url":null,"abstract":"A knowledge graph (KG) is a form of representing knowledge of the objective world. With the expansion of knowledge, KGs frequently incorporate new entities, which often possess limited associated data, known as few-shot features. Addressing the missing knowledge for these emerging entities is crucial practically, but there are significant challenges due to data scarcity. Previously developed methods based on knowledge graph embedding (KGE) and graph neural networks (GNNs) focusing on instance-level KGs are confronted with challenges of data scarcity and model simplicity, rendering them inapplicable to reasoning tasks in few-shot scenarios. To tackle these issues, we propose a multi-view few-shot KG reasoning method for emerging entities. The primary focus of our method lies in resolving the problem of link prediction for emerging entities with limited associated triples from multiple perspectives. Distinct from previous methods, our approach initially abstracts a concept-view KG from the conventional instance-view KG, enabling the formulation of commonsense rules. Additionally, we employ the aggregation of multi-hop subgraph features to enhance the representation of emerging entities. Furthermore, we introduce a more efficient cross-domain negative sampling strategy and a multi-view triple scoring function based on commonsense rules. Our experimental evaluations highlight the effectiveness of our method in few-shot contexts, demonstrating its robustness and adaptability in both cross-shot and zero-shot scenarios, significantly outperforming existing models in these challenging settings.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1321-1333"},"PeriodicalIF":7.5,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DistillSleepNet: Heterogeneous Multi-Level Knowledge Distillation via Teacher Assistant for Sleep Staging 蒸馏睡眠网：基于睡眠分期的教师辅助异构多层次知识蒸馏

IF 7.5 3区计算机科学

IEEE Transactions on Big Data Pub Date : 2024-09-03 DOI: 10.1109/TBDATA.2024.3453763

Ziyu Jia;Heng Liang;Yucheng Liu;Haichao Wang;Tianzi Jiang

{"title":"DistillSleepNet: Heterogeneous Multi-Level Knowledge Distillation via Teacher Assistant for Sleep Staging","authors":"Ziyu Jia;Heng Liang;Yucheng Liu;Haichao Wang;Tianzi Jiang","doi":"10.1109/TBDATA.2024.3453763","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3453763","url":null,"abstract":"Accurate sleep staging is crucial for the diagnosis of diseases such as sleep disorders. Existing sleep staging models with excellent performance are usually large and require a lot of computational resources, limiting their application on wearable devices. Therefore, it is a key issue to distil the knowledge embedded in large models into small heterogeneous models for better deployment. In the process of knowledge distillation of heterogeneous models for sleep electroencephalography (EEG) signals, we mainly deal with three major challenges: 1) There are large structural differences between heterogeneous sleep staging models; 2) What kind of knowledge should be conveyed in sleep EEG signals in the knowledge distillation of heterogeneous models; 3) Significant scale differences exist between heterogeneous models. To address these challenges, we design a generic heterogeneous model knowledge distillation framework for sleep staging. Specifically, we first propose a knowledge distillation strategy for heterogeneous models that addresses the large structural differences between heterogeneous models. Then, a multi-level knowledge distillation module is designed to effectively transfer important multi-level feature knowledge. In addition, the teacher assistant module is introduced to ease the scale difference between the heterogeneous models which further enhances the knowledge distillation performance. Experimental results on both Sleep-EDF and ISRUC datasets show that our distillation framework achieves state-of-the-art performance.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1273-1284"},"PeriodicalIF":7.5,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EduGraph: Learning Path-Based Hypergraph Neural Networks for MOOC Course Recommendation EduGraph：用于 MOOC 课程推荐的基于学习路径的超图神经网络

IF 7.5 3区计算机科学

IEEE Transactions on Big Data Pub Date : 2024-09-03 DOI: 10.1109/TBDATA.2024.3453757

Ming Li;Zhao Li;Changqin Huang;Yunliang Jiang;Xindong Wu

{"title":"EduGraph: Learning Path-Based Hypergraph Neural Networks for MOOC Course Recommendation","authors":"Ming Li;Zhao Li;Changqin Huang;Yunliang Jiang;Xindong Wu","doi":"10.1109/TBDATA.2024.3453757","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3453757","url":null,"abstract":"In online learning, personalized course recommendations that align with learners’ preferences and future needs are essential. Thus, the development of efficient recommender systems is crucial to guide learners to appropriate courses. Graph learning in recommender systems has been extensively studied, yet many models focus on low-frequency information, underscoring similar learner preferences and overlooking high-frequency data that indicates varied learning trajectories. Furthermore, course co-occurrence and sequential relationships are often insufficiently investigated. In this paper, we introduce \u0000<monospace>EduGraph</monospace>\u0000, a novel framework developed specifically for MOOC course recommendation systems. \u0000<monospace>EduGraph</monospace>\u0000 is characterized by its incorporation of a learning path-based hypergraph, a unique perspective wherein learners are represented as hyperedges, and courses are delineated as vertices. The framework incorporates a framelet-based hypergraph convolution, integrating low-pass filters to highlight similarities and high-pass filters to underscore distinct learning paths among learners. Furthermore, \u0000<monospace>EduGraph</monospace>\u0000 features a dual hypergraph learning model, with channels designated for vertex and hyperedge encoding, fostering a collaborative information exchange that refines the learners’ preference embeddings. The empirical assessment of \u0000<monospace>EduGraph</monospace>\u0000 is conducted through a comprehensive comparison with many existing baselines, utilizing two distinct MOOC datasets. Our experimental studies not only emphasize the enhanced recommendation performance of \u0000<monospace>EduGraph</monospace>\u0000 but also elucidate the significant contributions of its individual components, such as the integration of low-pass and high-pass filters and the framelet-wise collaborative strategy that effectively bridges hyperedge-level and vertex-level representations, augmenting the overall efficacy of the course recommendation system.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 6","pages":"706-719"},"PeriodicalIF":7.5,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Weak Multi-Label Data Stream Classification Under Distribution Changes in Labels 标签分布变化下的弱多标签数据流分类

IF 7.5 3区计算机科学

IEEE Transactions on Big Data Pub Date : 2024-09-03 DOI: 10.1109/TBDATA.2024.3453760

Yizhang Zou;Xuegang Hu;Peipei Li;Jun Hu

{"title":"Weak Multi-Label Data Stream Classification Under Distribution Changes in Labels","authors":"Yizhang Zou;Xuegang Hu;Peipei Li;Jun Hu","doi":"10.1109/TBDATA.2024.3453760","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3453760","url":null,"abstract":"Multi-label stream classification aims to address the challenge of dynamically assigning multiple labels to sequentially-arrived instances. In real situations, only partial labels of instances can be observed due to the expensive human annotations, and the problem of label distribution changes arises from multiple labels in a streaming mode, but few existing works jointly consider such challenges. Motivated by this, we propose the problem of weak multi-label stream classification (WMSC) and an online classification algorithm robust to weak labels. Specifically, we incrementally update the margin-based model using information from both the past model and the current incoming instance with partially observed labels. To increase the robustness to weak labels, we first adjust the classification margin of negative labels using the label causality matrix, which is constructed by the conditional probability of label pairs. Second, we introduce the label prototype matrix to regulate the margin by controlling the weighting parameter of the slack term. Additionally, to handle the potential distribution changes in labels, we utilize the instance-specific threshold via online thresholding to perform binary classification, which is formulated as a regression problem. Finally, theoretical analysis and empirical experimental results are presented to demonstrate the effectiveness of WMSC in classifying unobserved streaming instances.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1369-1380"},"PeriodicalIF":7.5,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PosParser: A Heuristic Online Log Parsing Method Based on Part-of-Speech Tagging 基于词性标注的启发式在线日志解析方法

IF 7.5 3区计算机科学

IEEE Transactions on Big Data Pub Date : 2024-09-03 DOI: 10.1109/TBDATA.2024.3453756

Jinzhao Jiang;Yuanyuan Fu;Jian Xu

{"title":"PosParser: A Heuristic Online Log Parsing Method Based on Part-of-Speech Tagging","authors":"Jinzhao Jiang;Yuanyuan Fu;Jian Xu","doi":"10.1109/TBDATA.2024.3453756","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3453756","url":null,"abstract":"Log parsing, the process of transforming raw logs into structured data, is a key step in the complex computer system's intelligent operation and maintenance and therefore has received extensive attention. Among all log parsing methods, heuristic log parsing methods are lightweight and can work in a streaming mode to well meet the real-time parsing requirements. However, the existing log representations used in the heuristic log parsing methods are not powerful in distinguishing log messages, which leads to low parsing accuracy and weak generality. Inspired by trigger word extraction of the event detection task in natural language processing (NLP), this paper proposes an online log parser, named PosParser, which employs the part-of-speech (PoS) tagging to extract a function token sequence (FTS) as the log message representation, and then identify event templates of log messages through the FTS. Experimental results on sixteen logs from real systems demonstrate that the FTS is powerful in distinguishing log messages from different event templates, and PosParser not only performs better in terms of parsing accuracy than state-of-the-art methods but is also comparable to them in efficiency.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 3","pages":"1334-1345"},"PeriodicalIF":7.5,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143949172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AKGNN: Attribute Knowledge Graph Neural Networks Recommendation for Corporate Volunteer Activities AKGNN：企业志愿者活动的属性知识图谱神经网络推荐

IF 7.5 3区计算机科学

IEEE Transactions on Big Data Pub Date : 2024-09-03 DOI: 10.1109/TBDATA.2024.3453761

Dan Du;Pei-Yuan Lai;Yan-Fei Wang;De-Zhang Liao;Min Chen

{"title":"AKGNN: Attribute Knowledge Graph Neural Networks Recommendation for Corporate Volunteer Activities","authors":"Dan Du;Pei-Yuan Lai;Yan-Fei Wang;De-Zhang Liao;Min Chen","doi":"10.1109/TBDATA.2024.3453761","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3453761","url":null,"abstract":"Due to the collective decision-making nature of enterprises, the process of accepting recommendations is predominantly characterized by an analytical synthesis of objective requirements and cost-effectiveness, rather than being rooted in individual interests. This distinguishes enterprise recommendation scenarios from those tailored for individuals or groups formed by similar individuals, rendering traditional recommendation algorithms less applicable in the corporate context. To overcome the challenges, by taking the corporate volunteer as an example, which aims to recommend volunteer activities to enterprises, we propose a novel recommendation model called \u0000<bold>A\u0000ttribute \u0000<bold>K\u0000nowledge \u0000<bold>G\u0000raph \u0000<bold>N\u0000eural \u0000<bold>N\u0000etworks (AKGNN). Specifically, a novel comprehensive attribute knowledge graph is constructed for enterprises and volunteer activities, based on which we obtain the feature representation. Then we utilize an \u0000<bold>e\u0000xtended \u0000<bold>V\u0000ariational \u0000<bold>A\u0000uto-\u0000<bold>E\u0000ncoder (eVAE) model to learn the preferences representation and then we utilize a GNN model to learn the comprehensive representation with representation of the similar nodes. Finally, all the comprehensive representations are input to the prediction layer. Extensive experiments have been conducted on real datasets, confirming the advantages of the AKGNN model. We delineate the challenges faced by recommendation algorithms in Business-to-Business (B2B) platforms and introduces a novel research approach utilizing attribute knowledge graphs.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 6","pages":"720-730"},"PeriodicalIF":7.5,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0