2022 IEEE International Conference on Data Mining Workshops (ICDMW)最新文献

筛选
英文 中文
An application of Customer Embedding for Clustering 客户嵌入在聚类中的应用
2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00019
Ahmet Tugrul Bayrak
{"title":"An application of Customer Embedding for Clustering","authors":"Ahmet Tugrul Bayrak","doi":"10.1109/ICDMW58026.2022.00019","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00019","url":null,"abstract":"Effective and powerful strategic planning in a competitive business environment brings businesses to the fore. It is important for the growth of the business to move the customer to the center by acting more intelligently in the planning of marketing and sales activities. In order to find customer behavior patterns, the use of clustering models from machine learning algorithms can yield effective results. In this study, traditional customer clustering methods are enriched by using customer representations as features. To be able to achieve that, a natural language processing method, word embedding, is applied to customers. By using the powerful mechanism of word embedding methods, a customer space is created where the customers are represented based on the products they have bought. It is observed that appending customer embeddings for customer clustering have a positive effect and the results seem promising for further studies.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124931831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What Do Audio Transformers Hear? Probing Their Representations For Language Delivery & Structure 音频变压器听到什么?语言表达与结构表征探析
2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00120
Yaman Kumar Singla, Jui Shah, Changyou Chen, R. Shah
{"title":"What Do Audio Transformers Hear? Probing Their Representations For Language Delivery & Structure","authors":"Yaman Kumar Singla, Jui Shah, Changyou Chen, R. Shah","doi":"10.1109/ICDMW58026.2022.00120","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00120","url":null,"abstract":"Transformer models across multiple domains such as natural language processing and speech form an unavoidable part of the tech stack of practitioners and researchers alike. Au-dio transformers that exploit representational learning to train on unlabeled speech have recently been used for tasks from speaker verification to discourse-coherence with much success. However, little is known about what these models learn and represent in the high-dimensional latent space. In this paper, we interpret two such recent state-of-the-art models, wav2vec2.0 and Mockingjay, on linguistic and acoustic features. We probe each of their layers to understand what it is learning and at the same time, we draw a distinction between the two models. By comparing their performance across a wide variety of settings including native, non-native, read and spontaneous speeches, we also show how much these models are able to learn transferable features. Our results show that the models are capable of significantly capturing a wide range of characteristics such as audio, fluency, supraseg-mental pronunciation, and even syntactic and semantic text-based characteristics. For each category of characteristics, we identify a learning pattern for each framework and conclude which model and which layer of that model is better for a specific category of feature to choose for feature extraction for downstream tasks.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127568500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Towards Fair Representation Learning in Knowledge Graph with Stable Adversarial Debiasing 基于稳定对抗去偏的知识图公平表示学习研究
2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00119
Yihe Wang, Mohammad Mahdi Khalili, X. Zhang
{"title":"Towards Fair Representation Learning in Knowledge Graph with Stable Adversarial Debiasing","authors":"Yihe Wang, Mohammad Mahdi Khalili, X. Zhang","doi":"10.1109/ICDMW58026.2022.00119","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00119","url":null,"abstract":"With graph-structured tremendous information, Knowledge Graphs (KG) aroused increasing interest in aca-demic research and industrial applications. Recent studies have shown demographic bias, in terms of sensitive attributes (e.g., gender and race), exist in the learned representations of KG entities. Such bias negatively affects specific popu-lations, especially minorities and underrepresented groups, and exacerbates machine learning-based human inequality. Adversariallearning is regarded as an effective way to alleviate bias in the representation learning model by simultaneously training a task-specific predictor and a sensitive attribute-specific discriminator. However, due to the unique challenge caused by topological structure and the comprehensive re-lationship between knowledge entities, adversarial learning-based debiasing is rarely studied in representation learning in knowledge graphs. In this paper, we propose a framework to learn unbiased representations for nodes and edges in knowledge graph mining. Specifically, we integrate a simple-but-effective normalization technique with Graph Neural Networks (GNNs) to constrain the weights updating process. Moreover, as a work-in-progress paper, we also find that the introduced weights normalization technique can mitigate the pitfalls of instability in adversarial debasing towards fair-and-stable machine learning. We evaluate the proposed framework on a benchmarking graph with multiple edge types and node types. The experimental results show that our model achieves comparable or better gender fairness over three competitive baselines on Equality of Odds. Importantly, our superiority in the fair model does not scarify the performance in the knowledge graph task (i.e., multi-class edge classification).","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126277893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cut the peaches: image segmentation for utility pattern mining in food processing 切桃子:食品加工中实用模式挖掘的图像分割
2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00072
Diletta Chiaro, E. Prezioso, Stefano Izzo, F. Giampaolo, S. Cuomo, F. Piccialli
{"title":"Cut the peaches: image segmentation for utility pattern mining in food processing","authors":"Diletta Chiaro, E. Prezioso, Stefano Izzo, F. Giampaolo, S. Cuomo, F. Piccialli","doi":"10.1109/ICDMW58026.2022.00072","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00072","url":null,"abstract":"The progress achieved in the field of information and communication technologies, particularly in computer science, and the growing capacity of new types of computational systems (cloud/edge computing) significantly contributed to the cyber-physical systems, networks where cooperating computational entities are intensively linked to the surrounding physical en-vironment and its on-going operations. All that has increased the possibility of undertaking tasks hitherto considered to be an exclusively human concern automatically: hence the gradual yet progressive tendency of many companies to adopt artificial intelligence (AI) and machine learning (ML) technologies to automate human activities. This papers falls within the context of deep learning (DL) for utility pattern mining applied to Industry 4.0. Starting from images supplied by a multinational company operating in the food processing industry, we provide a DL framework for real-time pattern recognition applied in the automation of peach pitters. To this aim, we perform transfer learning (TL) for image segmentation by embedding seven pre-trained encoders into multiple segmentation architectures and evaluate and compare segmentation performance in terms of met-rics and inference speed on our data. Furthermore, we propose an attention mechanism to improve multiscale feature learning in the FPN through attention-guided feature aggregation.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127588477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ZeroKBC: A Comprehensive Benchmark for Zero-Shot Knowledge Base Completion ZeroKBC:零射击知识库完成的综合基准
2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00117
Pei Chen, Wenlin Yao, Hongming Zhang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen
{"title":"ZeroKBC: A Comprehensive Benchmark for Zero-Shot Knowledge Base Completion","authors":"Pei Chen, Wenlin Yao, Hongming Zhang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen","doi":"10.1109/ICDMW58026.2022.00117","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00117","url":null,"abstract":"Knowledge base completion (KBC) aims to predict the missing links in knowledge graphs. Previous KBC tasks and approaches mainly focus on the setting where all test entities and relations have appeared in the training set. However, there has been limited research on the zero-shot KBC settings, where we need to deal with unseen entities and relations that emerge in a constantly growing knowledge base. In this work, we systematically examine different possible scenarios of zero-shot KBC and develop a comprehensive benchmark, ZeroKBC, that covers these scenarios with diverse types of knowledge sources. Our systematic analysis reveals several missing yet important zero-shot KBC settings. Experimental results show that canonical and state-of-the-art KBC systems cannot achieve satisfactory performance on this challenging benchmark. By analyzing the strength and weaknesses of these systems on solving ZeroKBC, we further present several important observations and promising future directions.11Work was done during the internship at Tencent AI lab. The data and code are available at: https://github.com/brickee/ZeroKBC","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125352011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identify malfunctions and their possible causes using rules, application to process mining 使用规则识别故障及其可能的原因,应用程序进行流程挖掘
2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00023
Benoit Vuillemin, F. Bertrand
{"title":"Identify malfunctions and their possible causes using rules, application to process mining","authors":"Benoit Vuillemin, F. Bertrand","doi":"10.1109/ICDMW58026.2022.00023","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00023","url":null,"abstract":"In the field of process mining, malfunction analysis is a major research domain. The goal here is to find failures or relatively large processing delays and their possible causes. This paper presents an innovative research paradigm for process mining: prediction rule mining. Through a three-step method and two new algorithms, all observed cases of a process are decomposed into rules, whose information is analyzed, and possible causes are searched. This method provides information about the data, from its internal structure to the possible causes of failures, without having a priori knowledge about them.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129760751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining High Utility Itemset with Multiple Minimum Utility Thresholds Based on Utility Deviation 基于效用偏差的多最小效用阈值高效用项集挖掘
2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00071
Naji Alhusaini, Jing Li, Philippe Fournier-Viger, Ammar Hawbani, Guilin Chen
{"title":"Mining High Utility Itemset with Multiple Minimum Utility Thresholds Based on Utility Deviation","authors":"Naji Alhusaini, Jing Li, Philippe Fournier-Viger, Ammar Hawbani, Guilin Chen","doi":"10.1109/ICDMW58026.2022.00071","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00071","url":null,"abstract":"High Utility Itemset Mining (HUIM) is the task of extracting actionable patterns considering the utility of items such as profits and quantities. An important issue with traditional HUIM methods is that they evaluate all items using a single threshold, which is inconsistent with reality due to differences in the nature and importance of items. Recently, algorithms were proposed to address this problem by assigning a minimum item utility threshold to each item. However, since the minimum item utility (MIU) is expressed as a percentage of the external utility, these methods still face two problems, called “itemset missing” and “itemset explosion”. To solve these problems, this paper introduces a novel notion of Utility Deviation (UD), which is calculated based on the standard deviation. The U D and actual utility are jointly used to calculate the MIU of items. By doing so, the problems of “itemset missing” and “itemset explosion” are alleviated. To implement and evaluate the U D notion, a novel algorithm is proposed, called HUI-MMU-UD. Experimental results demonstrate the effectiveness of the proposed notion for solving the problems of “itemset missing” and “itemset explosion”. Results also show that the proposed algorithm outperforms the previous HUI-MMU algorithm in many cases, in terms of runtime and memory usage.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126926019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient and Reliable Tolerance- Based Algorithm for Principal Component Analysis 一种高效可靠的基于公差的主成分分析算法
2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00088
Michael Yeh, Ming Gu
{"title":"An Efficient and Reliable Tolerance- Based Algorithm for Principal Component Analysis","authors":"Michael Yeh, Ming Gu","doi":"10.1109/ICDMW58026.2022.00088","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00088","url":null,"abstract":"Principal component analysis (PCA) is an important method for dimensionality reduction in data science and machine learning. However, it is expensive for large matrices when only a few components are needed. Existing fast PCA algorithms typically assume the user will supply the number of components needed, but in practice, they may not know this number beforehand. Thus, it is important to have fast PCA algorithms depending on a tolerance. We develop one such algorithm that runs quickly for matrices with rapidly decaying singular values, provide approximation error bounds that are within a constant factor away from optimal, and demonstrate its utility with data from a variety of applications.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123062528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Identifying Hydrometeorological Factors Influencing Reservoir Releases Using Machine Learning Methods 利用机器学习方法识别影响水库释放的水文气象因素
2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00143
Ming Fan, Lujun Zhang, Siyan Liu, Tiantian Yang, Dawei Lu
{"title":"Identifying Hydrometeorological Factors Influencing Reservoir Releases Using Machine Learning Methods","authors":"Ming Fan, Lujun Zhang, Siyan Liu, Tiantian Yang, Dawei Lu","doi":"10.1109/ICDMW58026.2022.00143","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00143","url":null,"abstract":"Simulation of reservoir releases plays a critical role in social-economic functioning and our nation's security. How-ever, it is challenging to predict the reservoir release accurately because of many influential factors from natural environments and engineering controls such as the reservoir inflow and storage. Moreover, climate change and hydrological intensification causing the extreme precipitation and temperature make the accurate prediction of reservoir releases even more challenging. Machine learning (ML) methods have shown some successful applications in simulating reservoir releases. However, previous studies mainly used inflow and storage data as inputs and only considered their short-term influences (e.g, previous one or two days). In this work, we use long short-term memory (LSTM) networks for reservoir release prediction based on four input variables including inflow, storage, precipitation, and temperature and consider their long-term influences. We apply the LSTM model to 30 reservoirs in Upper Colorado River Basin, United States. We analyze the prediction performance using six statistical metrics. More importantly, we investigate the influence of the input hydrometeorological factors, as well as their temporal effects on reservoir release decisions. Results indicate that inflow and storage are the most influential factors but the inclusion of precipitation and temperature can further improve the prediction of release especially in low flows. Additionally, the inflow and storage have a relatively long-term effect on the release. These findings can help optimize the water resources management in the reservoirs.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127820176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Deep-SHEEP: Sense of Humor Extraction from Embeddings in the Personalized Context Deep-SHEEP:个性化语境下嵌入的幽默感提取
2022 IEEE International Conference on Data Mining Workshops (ICDMW) Pub Date : 2022-11-01 DOI: 10.1109/ICDMW58026.2022.00125
Julita Bielaniewicz, Kamil Kanclerz, P. Milkowski, Marcin Gruza, Konrad Karanowski, Przemyslaw Kazienko, Jan Kocoń
{"title":"Deep-SHEEP: Sense of Humor Extraction from Embeddings in the Personalized Context","authors":"Julita Bielaniewicz, Kamil Kanclerz, P. Milkowski, Marcin Gruza, Konrad Karanowski, Przemyslaw Kazienko, Jan Kocoń","doi":"10.1109/ICDMW58026.2022.00125","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00125","url":null,"abstract":"As humans, we experience a wide range of feelings and reactions. One of these is laughter, often related to a personal sense of humor and the perception of funny content. Due to its subjective nature, recognizing humor in NLP is a very challenging task. Here, we present a new approach to the task of predicting humor in the text by applying the idea of a personalized approach. It takes into account both the text and the context of the content receiver. For that purpose, we proposed four Deep-SHEEP learning models that take advantage of user preference information differently. The experiments were conducted on four datasets: Cockamamie, HUMOR, Jester, and Humicroedit. The results have shown that the application of an innovative personalized approach and user-centric perspective significantly improves performance compared to generalized methods. Moreover, even for random text embeddings, our personalized methods outperform the generalized ones in the subjective humor modeling task. We also argue that the user-related data reflecting an individual sense of humor has similar importance as the evaluated text itself. Different types of humor were investigated as well.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133326524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信