{"title":"Data Privacy for Big Data Publishing Using Newly Enhanced PASS Data Mining Mechanism","authors":"Priyank Jain, Manasi Gyanchandani, N. Khare","doi":"10.5772/INTECHOPEN.77033","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.77033","url":null,"abstract":"Anonymization is one of the main techniques that is being used in recent times to prevent privacy breaches on the published data; one such anonymization technique is k-anonymiz-ation technique. The anonymization is a parametric anonymization technique used for data anonymization. The aim of the k-anonymization is to generalize the tuples in a way that it cannot be identified using quasi-identifiers. In the past few years, we saw a tremendous growth in data that ultimately led to the concept of the big data. The growth in data made anonymization using conventional processing methods inefficient. To make the anonymi- zation more efficient, we used the proposed PASS mechanism in Hadoop framework to reduce the processing time of anonymization. In this work, we have divided the whole program into the map and reduce part. Moreover, the data types used in Hadoop provide better serialization and transport of data. We performed our experiments on the large dataset. The results proved the best efficiency of our implementation.","PeriodicalId":91437,"journal":{"name":"Advances in data mining. Industrial Conference on Data Mining","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78117133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Early Prediction of Patient Mortality Based on Routine Laboratory Tests and Predictive Models in Critically Ill Patients","authors":"Sven Van Poucke, Ana Kovačević, M. Vukicevic","doi":"10.5772/INTECHOPEN.76988","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.76988","url":null,"abstract":"We propose a method for quantitative analysis of predictive power of laboratory tests and early detection of mortality risk by usage of predictive models and feature selection techniques. Our method allows automatic feature selection, model selection, and evalu- ation of predictive models. Experimental evaluation was conducted on patients with renal failure admitted to ICUs (medical intensive care, surgical intensive care, cardiac, and cardiac surgery recovery units) at Boston’s Beth Israel Deaconess Medical Center. Data are extracted from Multi parameter Intelligent Monitoring in Intensive Care III (MIMIC-III) database. We built and evaluated different single (e.g. Logistic regression) and ensemble (e.g. Random Forest) learning methods. Results revealed high predictive accuracy (area under the precision-recall curve (AUPRC) values >86%) from day four, with acceptable results on the second (>81%) and third day (>85%). Random forests seem to provide the best predictive accuracy. Feature selection techniques Gini and ReliefF scored best in most cases. Lactate, white blood cells, sodium, anion gap, chloride, bicar - bonate, creatinine, urea nitrogen, potassium, glucose, INR, hemoglobin, phosphate, total bilirubin, and base excess were most predictive for hospital mortality. Ensemble learn- ing methods are able to predict hospital mortality with high accuracy, based on laboratory tests and provide ranking in predictive priority.","PeriodicalId":91437,"journal":{"name":"Advances in data mining. Industrial Conference on Data Mining","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82057013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Neural Network Classifier-Based Analysis of Big Data in Health Care","authors":"Manaswini Pradhan","doi":"10.5772/INTECHOPEN.77225","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.77225","url":null,"abstract":"Because of the massive volume, variety, and continuous updating of medical data, the efficient processing of medical data and the real-time response of the treatment recom-mendation has become an important issue. Fortunately, parallel computing and cloud computing provide powerful capabilities to cope with large-scale data. Therefore, in this paper, a FCM based Map-Reduce programming model is proposed for the parallel com- puting using AANN approach. The FCM based Map-Reduce, clusters the large medical datasets into smaller groups of certain similarity and assigns each data cluster to one Mapper, where the training of neural networks are done by the optimal selection of the interconnection weights by Whale Optimization Algorithm (WOA). Finally, the Reducer reduces all the AANN classifiers obtained from the Mappers for identifying the normal and abnormal classes of the newer medical records promptly and accurately. The pro- posed methodology is implemented in the working platform of JAVA using CloudSim simulator. memory. The proposed FCM based Map-Reduce model decreases the requirement of memory while equating with other accomplishing k-means based Map-Reduce and DBSCAN method.","PeriodicalId":91437,"journal":{"name":"Advances in data mining. Industrial Conference on Data Mining","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82325407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance-Aware High-Performance Computing for Remote Sensing Big Data Analytics","authors":"Mustafa Kemal Pektürk and Muhammet Ünal","doi":"10.5772/INTECHOPEN.75934","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.75934","url":null,"abstract":"The incredible increase in the volume of data emerging along with recent technological developments has made the analysis processes which use traditional approaches more difficult for many organizations. Especially applications involving subjects that require timely processing and big data such as satellite imagery, sensor data, bank operations, web servers, and social networks require efficient mechanisms for collecting, storing, processing, and analyzing these data. At this point, big data analytics, which contains data mining, machine learning, statistics, and similar techniques, comes to the help of organizations for end-to-end managing of the data. In this chapter, we introduce a novel high-performance computing system on the geo-distributed private cloud for remote sensing applications, which takes advantages of network topology, exploits utilization and workloads of CPU, storage, and memory resources in a distributed fashion, and optimizes resource allocation for realizing big data analytics efficiently.","PeriodicalId":91437,"journal":{"name":"Advances in data mining. Industrial Conference on Data Mining","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83036356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining HCI Data for Theory of Mind Induction","authors":"O. Arnold, K. Jantke","doi":"10.5772/INTECHOPEN.74400","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.74400","url":null,"abstract":"Human-computer interaction (HCI) results in enormous amounts of data-bearing potentials for understanding a human user’s intentions, goals, and desires. Knowing what users want and need is a key to intelligent system assistance. The theory of mind concept known from studies in animal behavior is adopted and adapted for expressive user modeling. Theories of mind are hypothetical user models representing, to some extent, a human user’s thoughts. A theory of mind may even reveal tacit knowledge. In this way, user modeling becomes knowledge discovery going beyond the human’s knowledge and covering domain-specific insights. Theories of mind are induced by mining HCI data. Data mining turns out to be inductive modeling. Intelligent assistant systems inductively modeling a human user’s intentions, goals, and the like, as well as domain knowledge are, by nature, learning systems. To cope with the risk of getting it wrong, learning systems are equipped with the skill of reflection.","PeriodicalId":91437,"journal":{"name":"Advances in data mining. Industrial Conference on Data Mining","volume":"84 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77654049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
José heRNaNdo ávila-tosCaNo, I. Romero-Pérez, AiledMarenco-Escuderos, Eugenio Saavedra Guajardo
{"title":"Identification of Research Thematic Approaches Based on Keywords Network Analysis in Colombian Social Sciences","authors":"José heRNaNdo ávila-tosCaNo, I. Romero-Pérez, AiledMarenco-Escuderos, Eugenio Saavedra Guajardo","doi":"10.5772/INTECHOPEN.76834","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.76834","url":null,"abstract":"The purpose of this research was to unveil the structure of knowledge of Social Sciences in Colombia through the analysis of thematic networks and its association with differ ent disciplines’ new knowledge production to define scenarios and trends in each. 2992 published articles in the period 2006–2015 were revised in this research, all indexed in Web of Science, Scopus and other bibliographic databases, applying the social networks analysis technique to the keywords of all. The analysis included each discipline’s clus tering coefficient and group metrics. The results described in this chapter identify how social disciplines in Colombia have mainly focused its research production in topics such as armed conflict, poverty and human development.","PeriodicalId":91437,"journal":{"name":"Advances in data mining. Industrial Conference on Data Mining","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79182563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Salminen, Päivi Sanerma, S. Niittymäki, Peter W. Eklund
{"title":"Semantic Infrastructure for Service Environment Supporting Successful Aging","authors":"V. Salminen, Päivi Sanerma, S. Niittymäki, Peter W. Eklund","doi":"10.5772/INTECHOPEN.76945","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.76945","url":null,"abstract":"Demographic changes and the rapid increase of aging people are occurring throughout the world. There is a need for step-by-step developing service environment to support elderly living as old as possible at home. Digital equipment and technology solutions installed at home produce real-time data which can be used for predictive and optimized service creation. New technology solutions have to be tested at home environments to get certainty of usability, flexibility, and accessibility. The implementation of new digitalization has to happen according to ethical rules taking into account the values of elderly people. The data gathered through digital equipment is used in optimizing service processes. However, service process misses common ontology and semantic infrastructure to use the gathered data for service optimization. The service environment and semantic infrastructure, which could be used in social and health care, are introduced in this article.","PeriodicalId":91437,"journal":{"name":"Advances in data mining. Industrial Conference on Data Mining","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80096246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ensemble Methods in Environmental Data Mining","authors":"Goksu Tuysuzoglu, Derya Birant, A. Pala","doi":"10.5772/INTECHOPEN.74393","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.74393","url":null,"abstract":"Environmental data mining is the nontrivial process of identifying valid, novel, and potentially useful patterns in data from environmental sciences. This chapter proposes ensemble methods in environmental data mining that combines the outputs from multiple classification models to obtain better results than the outputs that could be obtained by an individual model. The study presented in this chapter focuses on several ensemble strategies in addition to the standard single classifiers such as decision tree, naive Bayes, support vector machine, and k-nearest neighbor (KNN), popularly used in literature. This is the first study that compares four ensemble strategies for envi ronmental data mining: (i) bagging , (ii) bagging combined with random feature subset selection (the random forest algorithm), (iii) boosting (the AdaBoost algorithm), and (iv) voting of different algorithms. In the experimental studies, ensemble methods are tested on different real-world environmental datasets in various subjects such as air, ecology, rainfall, and soil. methods are majority voting, performance weighting, Bayesian combination, and vogging. Meta-learning methods learn from new training data created from the predictions of a set of base classifiers. The most well-known meta-learning methods are stacking strategies for environmental data mining: (i) bagging, (ii) bagging combined with random feature subset selection, (iii) boosting, and (iv) voting. In the experimental studies, ensemble methods are tested on different real-world environmental datasets.","PeriodicalId":91437,"journal":{"name":"Advances in data mining. Industrial Conference on Data Mining","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88483167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Decision Rule Based Approach to Generational Feature Selection","authors":"Wieslaw Paja","doi":"10.1007/978-3-319-95786-9_17","DOIUrl":"https://doi.org/10.1007/978-3-319-95786-9_17","url":null,"abstract":"","PeriodicalId":91437,"journal":{"name":"Advances in data mining. Industrial Conference on Data Mining","volume":"13 1","pages":"230-239"},"PeriodicalIF":0.0,"publicationDate":"2018-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84319817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speeding Up Continuous kNN Join by Binary Sketches","authors":"Filip Nálepa, Michal Batko, P. Zezula","doi":"10.1007/978-3-319-95786-9_14","DOIUrl":"https://doi.org/10.1007/978-3-319-95786-9_14","url":null,"abstract":"","PeriodicalId":91437,"journal":{"name":"Advances in data mining. Industrial Conference on Data Mining","volume":"2 1","pages":"183-198"},"PeriodicalIF":0.0,"publicationDate":"2018-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82387124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}