International Journal of Data Mining & Knowledge Management Process最新文献

筛选
英文 中文
Review of Machine Learning Applications and Datasets in Classification of Acute Leukemia 机器学习在急性白血病分类中的应用和数据集综述
International Journal of Data Mining & Knowledge Management Process Pub Date : 2021-11-30 DOI: 10.5121/ijdkp2021.11601
Jaishree Ranganathan
{"title":"Review of Machine Learning Applications and Datasets in Classification of Acute Leukemia","authors":"Jaishree Ranganathan","doi":"10.5121/ijdkp2021.11601","DOIUrl":"https://doi.org/10.5121/ijdkp2021.11601","url":null,"abstract":"Cancer is an extremely heterogenous disease. Leukemia is a cancer of the white blood cells and some other cell types. Diagnosing leukemia is laborious in a multitude of areas including heamatology. Machine Learning (ML) is the branch of Artificial Intelligence. There is an emerging trend in ML models for data classification. This review aimed to describe the literature of ML in the classification of datasets for acute leukemia. In addition to describing the existing literature, this work aims to identify different sources of publicly available data that could be utilised for research and development of intelligent machine learning applications for classification. To best of the knowledge there is no such work that contributes such information to the research community.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"282 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124513417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The 5 Dimensions of Problem Solving using Dinna: Case Study in the Electronics Industry 使用Dinna解决问题的5个维度:电子行业的案例研究
International Journal of Data Mining & Knowledge Management Process Pub Date : 2021-09-30 DOI: 10.5121/ijdkp2021.11502
M. Hamoumi, A. Haddout, M. Benhadou
{"title":"The 5 Dimensions of Problem Solving using Dinna: Case Study in the Electronics Industry","authors":"M. Hamoumi, A. Haddout, M. Benhadou","doi":"10.5121/ijdkp2021.11502","DOIUrl":"https://doi.org/10.5121/ijdkp2021.11502","url":null,"abstract":"Based on the principle that perfection is a divine criterion, process management exists on the one hand to achieve excellence (near perfection) and on the other hand to avoid imperfection. In other words, Operational Excellence (EO) is one of the approaches, when used rigorously, aims to maximize performance. Therefore, the mastery of problem solving remains necessary to achieve such performance level. There are many tools that we can use whether in continuous improvement for the resolution of chronic problems (KAIZEN, DMAIC, Lean six sigma…) or in resolution of sporadic defects (8D, PDCA, QRQC ...). However, these methodologies often use the same basic tools (Ishikawa diagram, 5 why, tree of causes…) to identify potential causes and root causes. This result in three levels of causes: occurrence, no detection and system. The research presents the development of DINNA diagram [1] as an effective and efficient process that links the Ishikawa diagram and the 5 why method to identify the root causes and avoid recurrence. The ultimate objective is to achieve the same result if two working groups with similar skills analyse the same problem separately, to achieve this, the consistent application of a robust methodology is required. Therefore, we are talking about 5 dimensions; occurrence, non-detection, system, effectiveness and efficiency. As such, the paper offers a solution that is both effective and efficient to help practitioners of industrial problem solving avoid missing the real root cause and save costs following a wrong decision.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121145900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Experimental Evaluation of Similarity-Based and Embedding-Based Link Prediction Methods on Graphs 基于相似度和基于嵌入的图链接预测方法的实验评价
International Journal of Data Mining & Knowledge Management Process Pub Date : 2021-09-30 DOI: 10.5121/ijdkp2021.11501
M. Islam, Sabeur Aridhi, Malika Smail-Tabbone
{"title":"An Experimental Evaluation of Similarity-Based and Embedding-Based Link Prediction Methods on Graphs","authors":"M. Islam, Sabeur Aridhi, Malika Smail-Tabbone","doi":"10.5121/ijdkp2021.11501","DOIUrl":"https://doi.org/10.5121/ijdkp2021.11501","url":null,"abstract":"The task of inferring missing links or predicting future ones in a graph based on its current structure is referred to as link prediction. Link prediction methods that are based on pairwise node similarity are well-established approaches in the literature and show good prediction performance in many realworld graphs though they are heuristic. On the other hand, graph embedding approaches learn lowdimensional representation of nodes in graph and are capable of capturing inherent graph features, and thus support the subsequent link prediction task in graph. This paper studies a selection of methods from both categories on several benchmark (homogeneous) graphs with different properties from various domains. Beyond the intra and inter category comparison of the performances of the methods, our aim is also to uncover interesting connections between Graph Neural Network(GNN)- based methods and heuristic ones as a means to alleviate the black-box well-known limitation.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123184699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Petrochemical Production Big Data and its Four Typical Application Paradigms 石化生产大数据及其四种典型应用范式
International Journal of Data Mining & Knowledge Management Process Pub Date : 2021-07-31 DOI: 10.5121/ijdkp.2021.11402
Hu Shaolin, Z. Qinghua, Sun NaiQuan, Li Xiwu
{"title":"Petrochemical Production Big Data and its Four Typical Application Paradigms","authors":"Hu Shaolin, Z. Qinghua, Sun NaiQuan, Li Xiwu","doi":"10.5121/ijdkp.2021.11402","DOIUrl":"https://doi.org/10.5121/ijdkp.2021.11402","url":null,"abstract":"In recent years, the big data has attracted more and more attention. It can bring us more information and broader perspective to analyse and deal with problems than the conventional situation. However, so far, there is no widely acceptable and measurable definition for the term “big data”. For example, what significant features a data set needs to have can be called big data, and how large a data set is can be called big data, and so on. Although the \"5V\" description widely used in textbooks has been tried to solve the above problems in many big data literatures, \"5V\" still has significant shortcomings and limitations, and is not suitable for completely describing big data problems in practical fields such as industrial production. Therefore, this paper creatively puts forward the new concept of data cloud and the data cloud-based \"3M\" descriptive definition of big data, which refers to a wide range of data sources (Multisource), ultra-high dimensions (Multi-dimensional) and a long enough time span (Multi-spatiotemporal). Based on the 3M description of big data, this paper sets up four typical application paradigms for the production big data, analyses the typical application of four paradigms of big data, and lays the foundation for applications of big data from petrochemical industry.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115765421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partitioning Wide Area Graphs Using a Space Filling Curve 使用空间填充曲线划分广域图
International Journal of Data Mining & Knowledge Management Process Pub Date : 2021-01-31 DOI: 10.5121/IJDKP.2021.11102
Cyprien Gottstein, Philippe Raipin Parvédy, M. Hurfin, Thomas Hassan, T. Coupaye
{"title":"Partitioning Wide Area Graphs Using a Space Filling Curve","authors":"Cyprien Gottstein, Philippe Raipin Parvédy, M. Hurfin, Thomas Hassan, T. Coupaye","doi":"10.5121/IJDKP.2021.11102","DOIUrl":"https://doi.org/10.5121/IJDKP.2021.11102","url":null,"abstract":"Graph structure is a very powerful tool to model system and represent their actual shape. For instance, modelling an infrastructure or social network naturally leads to graph. Yet, graphs can be very different from one another as they do not share the same properties (size, connectivity, communities, etc.) and building a system able to manage graphs should take into account this diversity. A big challenge concerning graph management is to design a system providing a scalable persistent storage and allowing efficient browsing. Mainly to study social graphs, the most recent developments in graph partitioning research often consider scale-free graphs. As we are interested in modelling connected objects and their context, we focus on partitioning geometric graphs. Consequently our strategy differs, we consider geometry as our main partitioning tool. In fact, we rely on Inverse Space-filling Partitioning, a technique which relies on a space filling curve to partition a graph and was previously applied to graphs essentially generated from Meshes. Furthermore, we extend Inverse Space-Filling Partitioning toward a new target we define as Wide Area Graphs. We provide an extended comparison with two state-of-the-art graph partitioning streaming strategies, namely LDG and FENNEL. We also propose customized metrics to better understand and identify the use cases for which the ISP partitioning solution is best suited. Experimentations show that in favourable contexts, edge-cuts can be drastically reduced, going from more 34% using FENNEL to less than 1% using ISP.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133768720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comprehensive Analysis of Quantum Clustering : Finding All the Potential Minima 量子聚类的综合分析:寻找所有的潜在极小值
International Journal of Data Mining & Knowledge Management Process Pub Date : 2021-01-31 DOI: 10.5121/IJDKP.2021.11103
A. Maignan, Tony C. Scott
{"title":"A Comprehensive Analysis of Quantum Clustering : Finding All the Potential Minima","authors":"A. Maignan, Tony C. Scott","doi":"10.5121/IJDKP.2021.11103","DOIUrl":"https://doi.org/10.5121/IJDKP.2021.11103","url":null,"abstract":"Quantum clustering (QC), is a data clustering algorithm based on quantum mechanics which is accomplished by substituting each point in a given dataset with a Gaussian. The width of the Gaussian is a σ value, a hyper-parameter which can be manually defined and manipulated to suit the application. Numerical methods are used to find all the minima of the quantum potential as they correspond to cluster centers. Herein, we investigate the mathematical task of expressing and finding all the roots of the exponential polynomial corresponding to the minima of a two-dimensional quantum potential. This is an outstanding task because normally such expressions are impossible to solve analytically. However, we prove that if the points are all included in a square region of size σ, there is only one minimum. This bound is not only useful in the number of solutions to look for, by numerical means, it allows to to propose a new numerical approach “per block”. This technique decreases the number of particles by approximating some groups of particles to weighted particles. These findings are not only useful to the quantum clustering problem but also for the exponential polynomials encountered in quantum chemistry, Solid-state Physics and other applications.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114731993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Apply Machine Learning Methods to Predict Failure of Glaucoma Drainage 应用机器学习方法预测青光眼引流失败
International Journal of Data Mining & Knowledge Management Process Pub Date : 2021-01-31 DOI: 10.5121/IJDKP.2021.11101
Paul Morrison, Maxwell Dixon, A. Sheybani, B. Rahmani
{"title":"Apply Machine Learning Methods to Predict Failure of Glaucoma Drainage","authors":"Paul Morrison, Maxwell Dixon, A. Sheybani, B. Rahmani","doi":"10.5121/IJDKP.2021.11101","DOIUrl":"https://doi.org/10.5121/IJDKP.2021.11101","url":null,"abstract":"The purpose of this retrospective study is to measure machine learning models' ability to predict glaucoma drainage device failure based on demographic information and preoperative measurements. The medical records of 165 patients were used. Potential predictors included the patients' race, age, sex, preoperative intraocular pressure (IOP), preoperative visual acuity, number of IOP-lowering medications, and number and type of previous ophthalmic surgeries. Failure was defined as final IOP greater than 18 mm Hg, reduction in intraocular pressure less than 20% from baseline, or need for reoperation unrelated to normal implant maintenance. Five classifiers were compared: logistic regression, artificial neural network, random forest, decision tree, and support vector machine. Recursive feature elimination was used to shrink the number of predictors and grid search was used to choose hyperparameters. To prevent leakage, nested cross-validation was used throughout. With a small amount of data, the best classfier was logistic regression, but with more data, the best classifier was the random forest.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131880058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of Spatiotemporal Association Rules on Solar Data to Support Space Weather Forecasting 太阳数据时空关联规则在空间天气预报中的应用
International Journal of Data Mining & Knowledge Management Process Pub Date : 2020-03-31 DOI: 10.5121/ijdkp.2020.10201
Carlos Roberto Silveira Junior, J. Cecatto, M. T. P. Santos, M. X. Ribeiro
{"title":"Application of Spatiotemporal Association Rules on Solar Data to Support Space Weather Forecasting","authors":"Carlos Roberto Silveira Junior, J. Cecatto, M. T. P. Santos, M. X. Ribeiro","doi":"10.5121/ijdkp.2020.10201","DOIUrl":"https://doi.org/10.5121/ijdkp.2020.10201","url":null,"abstract":"It is well known that solar energetic phenomena influence the Space Weather, in special those directed to the Earth environment. In this context, the analysis of Solar Data is a challenging task, particularly when are composed of Satellite Image Time Series (SITS). It is a multidisciplinary domain that generates a massive amount of data (several Gigabytes per year). It includes image processing, spatiotemporal characteristics, and the processing of semantic data. Aiming to enhance the SITS analysis, we propose an algorithm called \"Miner of Thematic Spatiotemporal Associations for Images\" (MiTSAI), which is an extractor of Thematic Spatiotemporal Association Rules (TSARs) from Solar SITS. Here, a description is given about the details of the modern algorithm MiTSAI, which is an extractor of Thematic Spatiotemporal Association Rules (TSARs) from solar Satellite Image Time Series (SITS). In addition, its adaptation to the Space Weather and discussion about the specific use in favor of forecasting activities are presented. Finally, some results of its application specifically to solar flare forecasting are also presented. MiTSAI has to extract interesting new patterns compared with the art-state algorithms.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121034847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Web Repository System for Data Mining in Drug Discovery 面向药物发现数据挖掘的Web存储库系统
International Journal of Data Mining & Knowledge Management Process Pub Date : 2020-01-30 DOI: 10.5121/ijdkp.2020.10101
Jiali Tang, Jack Wang, A. Hadaegh
{"title":"A Web Repository System for Data Mining in Drug Discovery","authors":"Jiali Tang, Jack Wang, A. Hadaegh","doi":"10.5121/ijdkp.2020.10101","DOIUrl":"https://doi.org/10.5121/ijdkp.2020.10101","url":null,"abstract":"This project is to produce a repository database system of drugs, drug features (properties), and drug\u0000targets where data can be mined and analyzed. Drug targets are different proteins that drugs try to bind to\u0000stop the activities of the protein. Users can utilize the database to mine useful data to predict the specific\u0000chemical properties that will have the relative efficacy of a specific target and the coefficient for each\u0000chemical property. This database system can be equipped with different data mining\u0000approaches/algorithms such as linear, non-linear, and classification types of data modelling. The data\u0000models have enhanced with the Genetic Evolution (GE) algorithms. This paper discusses implementation\u0000with the linear data models such as Multiple Linear Regression (MLR), Partial Least Square Regression\u0000(PLSR), and Support Vector Machine (SVM).","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125490684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Insolvency Prediction Analysis of Italian Small Firms by Deep Learning 基于深度学习的意大利小企业破产预测分析
International Journal of Data Mining & Knowledge Management Process Pub Date : 2019-11-30 DOI: 10.5121/ijdkp.2019.9601
A. D. Ciaccio, G. Cialone
{"title":"Insolvency Prediction Analysis of Italian Small Firms by Deep Learning","authors":"A. D. Ciaccio, G. Cialone","doi":"10.5121/ijdkp.2019.9601","DOIUrl":"https://doi.org/10.5121/ijdkp.2019.9601","url":null,"abstract":"To improve credit risk management, there is a lot of interest in bankruptcy predictive models. Academic research has mainly used traditional statistical techniques, but interest in the capability of machine learning methods is growing. This Italian case study pursues the goal of developing a commercial firms insolvency prediction model. In compliance with the Basel II Accords, the major objective of the model is an estimation of the probability of default over a given time horizon, typically one year. The collected dataset consists of absolute values as well as financial ratios collected from the balance sheets of 14.966 Italian micro-small firms, 13,846 ongoing and 1,120 bankrupted, with 82 observed variables. The volume of data processed places the research on a scale like that used by Moody’s in the development of its rating model for public and private companies, RiskcalcTM. The study has been conducted using Gradient Boosting, Random Forests, Logistic Regression and some deep learning techniques: Convolutional Neural Networks and Recurrent Neural Networks. The results were compared with respect to the predictive performance on a test set, considering accuracy, sensitivity and AUC. The results obtained show that the choice of the variables was very effective, since all the models show good performances, better than those obtained in previous works. Gradient Boosting was the preferred model, although an increase in observation times would probably favour Recurrent Neural Networks.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"41 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129880044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信