{"title":"An Experimental Evaluation of Data Classification Models for Credibility Based Fake News Detection","authors":"A. Ramkissoon, Shareeda Mohammed","doi":"10.1109/ICDMW51313.2020.00022","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00022","url":null,"abstract":"The existence of fake news is a problem challenging today's social media enabled world. Fake news can be classified using varying methods. Predicting and detecting fake news has proven to be challenging even for machine learning algorithms. This research attempts to investigate nine such machine learning algorithms to understand their performance with Credibility Based Fake News Detection. This study uses a standard dataset with features relating to the credibility of news publishers. These features are analysed using each of these algorithms. The results of these experiments are analysed using four evaluation methodologies. The analysis reveals varying performance with the use of each of the nine methods. Based upon our selected dataset, one of these methods has proven to be most appropriate for the purpose of Credibility Based Fake News Detection.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121053428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TrustedChain: A Blockchain-based Data Sharing Scheme for Supply Chain","authors":"Gejun Le, Qifeng Gu, Qingshan Jiang, Weiyi Lin","doi":"10.1109/ICDMW51313.2020.00128","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00128","url":null,"abstract":"Supply chain involves mutual independent and distrusted stakeholders and large of sensitive order data. Sharing data among stakeholders is a essential project because that improves efficiency for various workflow among stakeholders. This paper proposes TrustedChain, a blockchain-based data sharing scheme for supply chain, which has two advantages: (a) trusted: we present a trusted environment, Trusted Environment (TE), based on blockchain to allow mutually distrusted stakeholders manage data collaboratively. (b) secure: we provide a secure design that first stores order forms in Distributed Database (DDB) and then records URI in Contract Account (CA) of TE. In addition, Supply-Business Contract Management (SCM) manages all CA and Node Communication (NC) allows communication over the network. The security analysis and evaluation prove the effectiveness of TrustedChain.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"10 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134470362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. T. Adebisi, V. Gonuguntla, Ho-Won Lee, K. Veluvolu
{"title":"Classification of Dementia Associated Disorders Using EEG based Frequent Subgraph Technique","authors":"A. T. Adebisi, V. Gonuguntla, Ho-Won Lee, K. Veluvolu","doi":"10.1109/ICDMW51313.2020.00087","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00087","url":null,"abstract":"Dementia associated disorders such as vascular dementia, frontotemporal dementia and Alzheimer dementia lead to cognitive impairment. Discrimination of dementia associated disorders has reamined a challenging task as they have overlapping underlying complex structures and display similar clinical features. In this work, we explore an EEG based frequent subgraph searching technique to characterize stages of brain functional networks of mild cognitive impairment (MCI), Alzheimer's disease (AD) and vascular dementia (VD) subjects in comparison with healthy control (HC) subjects. To identify the frequent subgraph related to dementia, we first formulated the brain functional network based on the phase information of EEG with mutual information as a measure. The whole network is then divided into sub-regions and frequent sub-graph search is performed. The identified frequent subgraphs were employed to discriminate the dementia associated disorders from the data recorded from 10 healthy and 32 dementia subjects in various stages. Results show that the proposed method has the potential to quantify the disease progression using brain functional connectivity and the identified networks can aid in the diagnosis of dementia associated disorders.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133765864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A federated learning based approach for loan defaults prediction","authors":"Geet Shingi","doi":"10.1109/ICDMW51313.2020.00057","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00057","url":null,"abstract":"The number of defaults in bank loans have recently been increasing in the past years. However, the process of sanctioning the loan has still been done manually in many of the banking organizations. Dependency on human intervention and delay in results have been the biggest obstacles in this system. While implementing machine learning models for banking applications, the security of sensitive customer banking data has always been a crucial concern and with strong legislative rules in place, sharing of data with other organizations is not possible. Along with this, the loan dataset is highly imbalanced, there are very few samples of defaults as compared to repaid loans. Hence, these problems make the default prediction system difficult to learn the patterns of defaults and thus difficult to predict them. Previous machine learning-based approaches to automate the process have been training models on the same organization's data but in today's world, classifying the loan application on the data within the organizations is no longer sufficient and a feasible solution. In this paper, we propose a federated learning-based approach for the prediction of loan applications that are less likely to be repaid which helps in resolving the above mentioned issues by sharing the weight of the model which are aggregated at the central server. The federated system is coupled with Synthetic Minority Over-sampling Technique(SMOTE) to solve the problem of imbalanced training data. Further, The federated system is coupled with a weighted aggregation based on the number of samples and performance of a worker on his dataset to further augment the performance. The improved performance by our model on publicly available real-world data further validates the same. Flexible, aggregated models can prove to be crucial in keeping out the defaulters in loan applications.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123874255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved Wide-Kernel CNN for Classifying Multivariate Signals in Fault Diagnosis","authors":"J. V. D. Hoogen, Stefan Bloemheuvel, M. Atzmüller","doi":"10.1109/ICDMW51313.2020.00046","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00046","url":null,"abstract":"Deep Learning (DL) provides considerable opportunities for increased efficiency and performance in fault diagnosis. The ability of DL methods for automatic feature extraction can reduce the need for time-intensive feature construction and prior knowledge on complex signal processing. In this paper, we propose two models that are built on the Wide-Kernel Deep Convolutional Neural Network (WDCNN) framework to improve performance of classifying fault conditions using multivariate time series data, also with respect to limited and/or noisy training data. In our experiments, we use the renowned benchmark dataset from the Case Western Reserve University (CWRU) bearing experiment [1] to assess our models' performance, and to investigate their usability towards large-scale applications by simulating noisy industrial environments. Here, the proposed models show an exceptionally good performance without any preprocessing or data augmentation and outperform traditional Machine Learning applications as well as state-of-the-art DL models considerably, even in such complex multi-class classification tasks. We show that both models are also able to adapt well to noisy input data, which makes them suitable for condition-based maintenance contexts. Furthermore, we investigate and demonstrate explainability and transparency of the models which is particularly important in large-scale industrial applications.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116010948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ritika Pandey, P. Brantingham, Craig D. Uchida, G. Mohler
{"title":"Building knowledge graphs of homicide investigation chronologies","authors":"Ritika Pandey, P. Brantingham, Craig D. Uchida, G. Mohler","doi":"10.1109/ICDMW51313.2020.00115","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00115","url":null,"abstract":"Homicide investigations generate large and diverse data in the form of witness interview transcripts, physical evidence, photographs, DNA, etc. Homicide case chronologies are summaries of these data created by investigators that consist of short text-based entries documenting specific steps taken in the investigation. A chronology tracks the evolution of an investigation, including when and how persons involved and items of evidence became part of a case. In this article we discuss a framework for creating knowledge graphs of case chronologies that may aid investigators in analyzing homicide case data and also allow for post hoc analysis of the key features that determine whether a homicide is ultimately solved. Our method consists of 1) performing named entity recognition to determine witnesses, suspects, and detectives from chronology entries 2) using keyword expansion to identify documentary, physical, and forensic evidence in each entry and 3) linking entities and evidence to construct a homicide investigation knowledge graph. We compare the performance of several choices of methodologies for these sub-tasks using homicide investigation chronologies from Los Angeles, California. We then analyze the association between network statistics of the knowledge graphs and homicide solvability.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123822929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sue Hyang Lim, S. Kim, Hyeong Min Lee, Sijun Kim, Y. Shin
{"title":"Design of Neural Network-based Boost Charging for Reducing the Charging Time of Li-ion Battery","authors":"Sue Hyang Lim, S. Kim, Hyeong Min Lee, Sijun Kim, Y. Shin","doi":"10.1109/ICDMW51313.2020.00109","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00109","url":null,"abstract":"Rapid charging of Li-ion batteries is vital for the commercialization of electric propulsion systems. But, during the fast-charging process, reduction in the battery capacity and temperature increases must be considered in real-time. Most Li-ion battery chargers follow the charging profile of an open-loop system, which has been determined based on prior knowledge. However, such a system does not reflect the temperature change of the battery and the degree of aging. Therefore, in this study, we propose a neural network-based charging profile model by applying a closed-loop system to reflect the various states of batteries; we also show two battery-state characteristics in addition to temperature. Consequently, we show battery characteristics other than those shown in the past, such as the battery voltage and temperature trends. In addition to the design of the charging current, an improvement of approximately 22 ∼ 50% based on the mean absolute error (MAE) is achieved. By considering the various characteristics, the long short-term memory performance is determined to be better when compared to the feed-forward neural network, and this performance is improved by 35% based on MAE.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123358265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Wu, Qian Teng, Gautam Srivastava, Matin Pirouz, Chun-Wei Lin
{"title":"Efficient Mining of Non-Dominated High Quantity-Utility Patterns","authors":"J. Wu, Qian Teng, Gautam Srivastava, Matin Pirouz, Chun-Wei Lin","doi":"10.1109/ICDMW51313.2020.00097","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00097","url":null,"abstract":"In this paper, we propose a new pattern called skyline quantity-utility pattern (SQUP) to provide better estimations in the decision-making process by considering quantity and utility together. Two algorithms respectively called SQUM-1 and SQUM-2 are presented to efficiently mine the set of SQUPs. Two new efficient utility-max structures are also mentioned for the reduction of the candidate itemsets respectively utilized in two developed algorithms. Our in-depth experimental results prove that our proposed algorithms achieve good performance in terms of runtime and memory usage.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123810350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Attentive-Feature Transfer based on Mapping for Cross-domain Recommendation","authors":"Zhen Liu, J. Tian, Lingxi Zhao, Yanling Zhang","doi":"10.1109/ICDMW51313.2020.00030","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00030","url":null,"abstract":"Recommendation systems have been widely developed for numerous applications. Existing systems may still suffer from negative transfer or cold starts. These drawbacks are essentially due to overlooking domain-specific users' personal preferences or cross-domain user-item interactions. To address these problems, we propose a cross-domain recommendation algorithm built on a mapping-based attentive feature transfer (MAFT) model. Our MAFT model utilizes matrix factorization and an attention mechanism for fine-grained modeling of user preferences. Then, overlapping cross-domain user features are combined through feature fusion. Moreover, a multilayer perceptron (MLP) is built to map the obtained user features to target-domain user features. Finally, the user-item ratings can be predicted in the target domain. We carried out experiments on the large-scale MovieLens dataset as well as the real Douban Book and Douban Movie datasets. The results show that the precision of the MAFT-based method is clearly higher than those of other cross-domain recommendation methods, especially for cold-start users with few item interactions.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130021462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Schreckenberger, Tim Glockner, H. Stuckenschmidt, Christian Bartelt
{"title":"Restructuring of Hoeffding Trees for Trapezoidal Data Streams","authors":"Christian Schreckenberger, Tim Glockner, H. Stuckenschmidt, Christian Bartelt","doi":"10.1109/ICDMW51313.2020.00064","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00064","url":null,"abstract":"Trapezoidal Data Streams are an emerging topic, where not only the data volume increases, but also the data dimension, i.e. new features emerge. In this paper, we address the challenges that arise from this problem by providing a novel approach to restructure and prune Hoeffding trees. We evaluate our approach on synthetic datasets, where we can show that the approach significantly improves the performance compared to the baseline of an adjusted Hoeffding tree algorithm without restructuring and pruning.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114661349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}