2016 International Conference on Data Science and Engineering (ICDSE)最新文献

Understanding the Indian labour market: A data-centric approach 理解印度劳动力市场:以数据为中心的方法

2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823939

K. Shabana, Tony Gracious, H. Subramonian

引用次数: 1

Breast cancer detection using two-fold genetic evolution of neural network ensembles 基于双重遗传进化神经网络的乳腺癌检测

2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823969

I. Singh, Karan Sanwal, Satyarth Praveen

引用次数: 10

Data preparation: Art or science? 数据准备:艺术还是科学?

2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823936

Gunjan Mansingh, Kweku-Muata A. Osei-Bryson, L. Rao, Maurice McNaughton

引用次数: 3

Erasure coded storage systems for cloud storage — challenges and opportunities 用于云存储的Erasure编码存储系统——挑战与机遇

2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823943

Ojus Thomas Lee, S. D. M. Kumar, P. Chandran

{"title":"Erasure coded storage systems for cloud storage — challenges and opportunities","authors":"Ojus Thomas Lee, S. D. M. Kumar, P. Chandran","doi":"10.1109/ICDSE.2016.7823943","DOIUrl":"https://doi.org/10.1109/ICDSE.2016.7823943","url":null,"abstract":"Erasure coded storage schemes offer a promising future for cloud storage. Highlights of erasure coded storage systems are that these offer the same level of fault tolerance as that of replication, at lower storage footprints. In the big data era, cloud storage systems based on data replication are of dubious usability due to 200% storage overhead in data replication systems. This has prompted storage service providers to use erasure coded storage as an alternative to replication. Refinements are required in various aspects of erasure coded storage systems to make it a real contender against data replication based storage systems. Streamlining huge bandwidth requirements during the recovery of failed nodes, inefficient update operations, effect of topology in recovery and consistency requirements of erasure coded storage systems, are some areas which need attention. This paper presents an in-depth study on the challenges faced, and research pursued in some of these areas. The survey shows that more research is required to improve erasure coded storage system from being bandwidth crunchers to efficient storage systems. Another challenge that has emerged from the study is the requirement of elaborate research for upgrading the erasure coded storage systems from being mere archival storage systems by providing better update methods. Provision of multiple level consistency in erasure coded storage is yet another research opportunity identified in this work. A brief introduction to open source libraries available for erasure coded storage is also presented in the paper.","PeriodicalId":304765,"journal":{"name":"2016 International Conference on Data Science and Engineering (ICDSE)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131702795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Proper imputation techniques for missing values in data sets 数据集缺失值的正确输入技术

2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823957

Tahani Aljuaid, S. Sasi

{"title":"Proper imputation techniques for missing values in data sets","authors":"Tahani Aljuaid, S. Sasi","doi":"10.1109/ICDSE.2016.7823957","DOIUrl":"https://doi.org/10.1109/ICDSE.2016.7823957","url":null,"abstract":"Data mining requires a pre-processing task in which the data are prepared and cleaned for ensuring the quality. Missing value occurs when no data value is stored for a variable in an observation. This has a significant effect on the results especially when it leads to biased parameter estimates. It will not only diminish the quality of the result, but also disqualify for analysis purposes. Hence there are risks associated with missing values in a dataset. Imputation is a technique of replacing missing data with substituted values. This research presents a comparison of imputation techniques such as MeanMode, K-Nearest Neighbor, Hot-Deck, Expectation Maximization and C5.0 for missing data. The choice of proper imputation method is based on datatypes, missing data mechanisms, patterns and methods. Datatype can be numerical, categorical or mixed. Missing data mechanism can be missing completely at random, missing at random, or not missing at random. Patterns of missing data can be with respect to cases or attributes. Methods can be a pre-replace or an embedded method. These five imputation techniques are used to impute artificially created missing data from different data sets of varying sizes. The performance of these techniques are compared based on the classification accuracy and the results are presented.","PeriodicalId":304765,"journal":{"name":"2016 International Conference on Data Science and Engineering (ICDSE)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133435606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 56

Anomaly detection in web graphs using vertex neighbourhood based signature similarity methods 基于顶点邻域的网络图异常检测方法

2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823959

Aritra Ghosh, Pallavi Gudipati

引用次数: 2

Inferring borrower network in a microfinancing framework (KIVA) 小额融资框架下的推断借款人网络(KIVA)

2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823960

Aritra Ghosh, Jithin Vachery

{"title":"Inferring borrower network in a microfinancing framework (KIVA)","authors":"Aritra Ghosh, Jithin Vachery","doi":"10.1109/ICDSE.2016.7823960","DOIUrl":"https://doi.org/10.1109/ICDSE.2016.7823960","url":null,"abstract":"Microfinance institutions aim at offering financial services to people in low-income category, who typically lack access to traditional banking systems. Till date, greater than 15 billion U.S dollars has been infused into microfinancing, assisting more than 160 million people in developing countries. With the tremendous growth in the World Wide Web, a number of microfinance institutions have recently moved online. One such noble initiative is KIVA, a crowd sourced online microfinance platform which connects borrowers (small entrepreneurs and individuals) to lenders through the field partners. One particular interest to such microfinancing institutions, is the analysis of the network of borrowers which can help them improve the percentage of loan requests fulfilled. KIVA provides a rich dataset capturing the lending activities on the website. In this paper, we analyze the data to find and extract the structure in the KIVA framework. We formulate a novel tripartite extension of SimRank using the network of lenders, loans and borrowers to capture the inherent pattern in the system. We also propose a Multipartite extension of SimRank useful for real world settings. Extensive experiments validate the effectiveness of our modeling and the proposed disambiguation scheme for borrowers.","PeriodicalId":304765,"journal":{"name":"2016 International Conference on Data Science and Engineering (ICDSE)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117077845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

SADA: Secure approximate data aggregation in wireless sensor networks SADA:无线传感器网络中的安全近似数据聚合

2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823942

E. G. Prathima, T. Prakash, K. Venugopal, S. Iyengar, L. Patnaik

引用次数: 4

Comparative study of data mining clustering algorithms 数据挖掘聚类算法的比较研究

2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823946

Iyer Aurobind Venkatkumar, Sanatkumar Jayantibhai Kondhol Shardaben

{"title":"Comparative study of data mining clustering algorithms","authors":"Iyer Aurobind Venkatkumar, Sanatkumar Jayantibhai Kondhol Shardaben","doi":"10.1109/ICDSE.2016.7823946","DOIUrl":"https://doi.org/10.1109/ICDSE.2016.7823946","url":null,"abstract":"In today's world, where we generate large amount of data, we can harness the benefits of the hidden information i.e. patterns or correlations in these data. This information can be used in various constructive fields only if we are able to handle big data efficiently. One such process that is used to extract and handle the hidden information is data mining. There are various techniques in data mining namely Clustering, Prediction, Classification, Association etc. Clustering is dividing data set into related groups such that all the groups do not have anything in common. Prediction, as the name suggests, predictions are made with available data set. It does not give surety of any kind, it may predict right or may predict wrong. Classification is classification of data sets into some predefined sets using various mathematical models. Association is discovering a correlation hidden in large amount of data, that is, in a given transaction based on the relationships between the items a pattern is discovered. In this paper we study one of the most widely used methods to handle big data, that is, data mining clustering algorithms. Here we have studied and made a comparative analysis of four classic clustering algorithms namely K-means, BIRCH, DBSCAN, STING.","PeriodicalId":304765,"journal":{"name":"2016 International Conference on Data Science and Engineering (ICDSE)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115525609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Student profiling to improve teaching and learning: A data mining approach 学生剖析改进教与学:一种数据挖掘方法

2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823947

Anand Desai, Nemil Shah, Madhuri Dhodi

引用次数: 3