2022 IEEE International Conference on Data Mining Workshops (ICDMW)最新文献_第3页

Discovering Unknown Labels for Multi-Label Image Classification 多标签图像分类中的未知标签发现

2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00108

Jun Huang, Yu Yan, Xiao Zheng, Xiwen Qu, Xudong Hong

引用次数: 0

Feature Extraction and Prediction of Combined Text and Survey Data using Two-Staged Modeling 基于两阶段建模的文本与调查数据组合特征提取与预测

2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00064

A. A. Neloy, M. Turgeon

{"title":"Feature Extraction and Prediction of Combined Text and Survey Data using Two-Staged Modeling","authors":"A. A. Neloy, M. Turgeon","doi":"10.1109/ICDMW58026.2022.00064","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00064","url":null,"abstract":"Deep learning (DL) based natural language processing (NLP) has recently grown as one the fastest research domain and retained remarkable improvement in many applications. Due to the significant amount of data, the adaptation of feature learning and symmetric data efficiency is a critical underlying task in such applications. However, their ability to extract features is limited due to a lack of proper model formation. Moreover, the use of these methods on smaller datasets is unexplored and underdeveloped compared to more popular research areas. This work introduces a two-stage modeling approach to combine classical statistical analysis with NLP problems in a real-world dataset. We effectively layout a combination of the classical statistical model incorporating a stacked ensemble classifier and a DL framework of convolutional neural network (CNN) and Bidirectional Recurrent Neural Networks (Bi-RNN) to structure a more decomposed architecture with lower computational complexity. Additionally, the experimental results illustrating 96.69 % training and 70.56 % testing accuracy and hypothesis testing from our DL models followed by an ablation study empirically demonstrate the validation of our proposed combined modeling technique.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116606562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A machine learning-based approach for mercury detection in marine waters 一种基于机器学习的海水汞检测方法

2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00074

F. Piccialli, F. Giampaolo, Vincenzo Schiano Di Cola, Federico Gatta, Diletta Chiaro, E. Prezioso, Stefano Izzo, S. Cuomo

{"title":"A machine learning-based approach for mercury detection in marine waters","authors":"F. Piccialli, F. Giampaolo, Vincenzo Schiano Di Cola, Federico Gatta, Diletta Chiaro, E. Prezioso, Stefano Izzo, S. Cuomo","doi":"10.1109/ICDMW58026.2022.00074","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00074","url":null,"abstract":"Thanks to the widespread use of mobile devices, analyses that in the past had to be carried out in specifically designated and equipped laboratories and which required long processing times, may now take place outdoor and in real time. In the marine science, for example, the development of a mobile and compact system for the on-site detection of heavy metals contamination in seawater would be helpful for scientists and society in at least two ways: i) reduction of time and costs associated with these experiments; ii) the implementation of a strategy for outdoor analysis, eventually embeddable in a lab-on-hardware system. This paper falls within the context of machine learning (ML) for utility pattern mining applied on interdisciplinary domains: starting from wellplates images, we provide a novel proof-of-concept (PoC) machine learning-based framework to assist scientists in their daily research on seawater samples, proposing a system which automatically recognise wells in a multiwell firstly and then predicts the degree of fluorescence in each of them, thus showing possible presence of heavy metals.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127817917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HeteroGuard: Defending Heterogeneous Graph Neural Networks against Adversarial Attacks 异构保护:防御异构图神经网络对抗攻击

2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00096

Udesh Kumarasinghe, Mohamed Nabeel, K. de Zoysa, K. Gunawardana, Charitha Elvitigala

{"title":"HeteroGuard: Defending Heterogeneous Graph Neural Networks against Adversarial Attacks","authors":"Udesh Kumarasinghe, Mohamed Nabeel, K. de Zoysa, K. Gunawardana, Charitha Elvitigala","doi":"10.1109/ICDMW58026.2022.00096","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00096","url":null,"abstract":"Graph neural networks (GNNs) have achieved re-markable success in many application domains including drug discovery, program analysis, social networks, and cyber security. However, it has been shown that they are not robust against adversarial attacks. In the recent past, many adversarial attacks against homogeneous GNNs and defenses have been proposed. However, most of these attacks and defenses are ineffective on heterogeneous graphs as these algorithms optimize under the assumption that all edge and node types are of the same and further they introduce semantically incorrect edges to perturbed graphs. Here, we first develop, HetePR-BCD, a training time (i.e. poisoning) adversarial attack on heterogeneous graphs that outperforms the start of the art attacks proposed in the literature. Our experimental results on three benchmark heterogeneous graphs show that our attack, with a small perturbation budget of 15 %, degrades the performance up to 32 % (Fl score) compared to existing ones. It is concerning to mention that existing defenses are not robust against our attack. These defenses primarily modify the GNN's neural message passing operators assuming that adversarial attacks tend to connect nodes with dissimilar features, but this assumption does not hold in heterogeneous graphs. We construct HeteroGuard, an effective defense against training time attacks including HetePR-BCD on heterogeneous models. HeteroGuard outperforms the existing defenses by 3–8 % on Fl score depending on the benchmark dataset.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133034498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Degree-Related Bias in Link Prediction 链接预测中的度相关偏差

2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00103

Yu Wang, Tyler Derr

引用次数: 2

DragStream: An Anomaly And Concept Drift Detector In Univariate Data Streams DragStream:单变量数据流中的异常和概念漂移检测器

2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00113

Anne Marthe Sophie Ngo Bibinbe, A. J. Mahamadou, Michael Franklin Mbouopda, E. Nguifo

{"title":"DragStream: An Anomaly And Concept Drift Detector In Univariate Data Streams","authors":"Anne Marthe Sophie Ngo Bibinbe, A. J. Mahamadou, Michael Franklin Mbouopda, E. Nguifo","doi":"10.1109/ICDMW58026.2022.00113","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00113","url":null,"abstract":"Anomaly detection in data streams comes with different technical challenges due to the data nature. The main challenges include storage limitations, the speed of data arrival, and concept drifts. In the literature, methods for mining data streams in order to detect anomalies have been proposed. While some methods focus on tackling a specific issue, other methods handle diverse problems but may have high complexity (time and memory). In the present work, we propose DragStream, a novel subsequence anomaly and concept drift detection algorithm for univariate data streams. DragStream extends the subsequence anomaly detection method for time series data Drag to streaming data. Furthermore, the new method is inspired by the well-known Matrix Profile, Drag, and MILOF which are respectively point and subsequence anomaly detection methods for time series and data streams. We conducted intensive experiments and statistical analysis to evaluate the performance of the proposed approach against existing methods. The results show that our method is competitive in performance while being linear in time and memory complexity. Finally, we provide an open-source implementation of the new method.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123658573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Emerging properties from Bayesian Non-Parametric for multiple clustering: Application for multi-view image dataset 贝叶斯非参数多聚类的新特性:多视图图像数据集的应用

2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00013

Reda Khoufache, M. Dilmi, Hanene Azzag, Etienne Gofinnet, M. Lebbah

引用次数: 0

Mining Valuable Fuzzy Patterns via the RFM Model 利用RFM模型挖掘有价值的模糊模式

2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00075

Yanlin Qi, Fuyin Lai, Guoting Chen, Wensheng Gan

引用次数: 1

Unknown Type Streaming Feature Selection via Maximal Information Coefficient 基于最大信息系数的未知类型流特征选择

2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00089

Peng Zhou, Yunyun Zhang, Yuan-Ting Yan, Shu Zhao

{"title":"Unknown Type Streaming Feature Selection via Maximal Information Coefficient","authors":"Peng Zhou, Yunyun Zhang, Yuan-Ting Yan, Shu Zhao","doi":"10.1109/ICDMW58026.2022.00089","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00089","url":null,"abstract":"Feature selection aims to select an optimal minimal feature subset from the original datasets and has become an indispensable preprocessing component before data mining and machine learning, especially in the era of big data. Most feature selection methods implicitly assume that we can know the feature type (categorical, numerical, or mixed) before learning, then design corresponding measurements to calculate the correlation between features. However, in practical applications, features may be generated dynamically and arrive one by one over time, which we call streaming features. Most existing streaming feature selection methods assume that all dynamically generated features are the same type or assume we can know the feature type for each new arriving feature on the fly, but this is unreasonable and unrealistic. Therefore, this paper firstly studies a practical issue of Unknown Type Streaming Feature Selection and proposes a new method to handle it, named UT-SFS. Extensive experimental results indicate the effectiveness of our new method. UT-SFS is nonparametric and does not need to know the feature type before learning, which aligns with practical application needs.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125894631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Extracting Entities and Events from Cyber-Physical Security Incident Reports 从网络物理安全事件报告中提取实体和事件

2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00083

Nitin Ramrakhiyani, Sangameshwar Patil, Manideep Jella, Alok Kumar, G. Palshikar

{"title":"Extracting Entities and Events from Cyber-Physical Security Incident Reports","authors":"Nitin Ramrakhiyani, Sangameshwar Patil, Manideep Jella, Alok Kumar, G. Palshikar","doi":"10.1109/ICDMW58026.2022.00083","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00083","url":null,"abstract":"Cyber- physical systems are an important part of many industries such as the chemical process industry, manufac- turing industry, automobiles, and even sophisticated weaponry. Given the economic importance and influence of these systems, they have increasingly faced the cybersecurity attacks. In this paper, we provide a dataset of real-life security incident reports on cyber-physical systems annotated with entities and events that are important for analysing such security incidents. We analyze and identify the limitations of the 'Domain Objects' in Structured Threat Information Expression (STIX) standard as well as recent research literature for the entity type clas- sification schemes in cybersecurity domain. We propose an updated classification scheme for entity types in the cybersecurity domain. The enhanced coverage provided by the entity scheme is important for automated information extraction and natural language understanding of textual reports containing details of the cybersecurity incident reports. We use deep-learning based sequence labelling techniques and cybersecurity domain specific word embed dings to set up a benchmark for entity and event extraction for cyber- physical security incident report analysis. The annotated dataset of real-life industrial security incidents will be made available for research purpose.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121620742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0