2016 International Conference on Data Science and Engineering (ICDSE)最新文献

筛选
英文 中文
Understanding the Indian labour market: A data-centric approach 理解印度劳动力市场:以数据为中心的方法
2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823939
K. Shabana, Tony Gracious, H. Subramonian
{"title":"Understanding the Indian labour market: A data-centric approach","authors":"K. Shabana, Tony Gracious, H. Subramonian","doi":"10.1109/ICDSE.2016.7823939","DOIUrl":"https://doi.org/10.1109/ICDSE.2016.7823939","url":null,"abstract":"India produces 1.5 million engineers every year. Identifying the significant factors that influence the salary and the jobs these engineers are offered can help us understand the inefficiencies or skill gaps in the labour market, which will be extremely useful for policy making and constructive interventions. Predictive modelling of salary was performed using different machine learning techniques on a data set that included both employee profiles and their employment outcomes. Decision tree analysis, feature analysis, correlation analysis and t-test were performed to identify the significant factors that influenced the annual salary offered to a candidate. Visualizations generated based on employee salary, designation and job city revealed interesting insights.","PeriodicalId":304765,"journal":{"name":"2016 International Conference on Data Science and Engineering (ICDSE)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115374626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Breast cancer detection using two-fold genetic evolution of neural network ensembles 基于双重遗传进化神经网络的乳腺癌检测
2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823969
I. Singh, Karan Sanwal, Satyarth Praveen
{"title":"Breast cancer detection using two-fold genetic evolution of neural network ensembles","authors":"I. Singh, Karan Sanwal, Satyarth Praveen","doi":"10.1109/ICDSE.2016.7823969","DOIUrl":"https://doi.org/10.1109/ICDSE.2016.7823969","url":null,"abstract":"Breast cancer is the development of a malignant tumor notably in the breasts of a female. No proven cure is yet known for breast cancer, except when detected at an initial stage. This paper presents an innovative approach to the diagnosis of breast cancer by using two proposed variants of Genetic Algorithms, the Inter-Genetic Algorithm, and the Intra-Genetic Algorithm, that evolves an ensemble of Neural Networks and its constituent Artificial Neural Networks, respectively. The proposed approach obtains an average accuracy of 99.90% using 70–30% training to testing ratio on the Wisconsin Breast Cancer dataset and hence is a reliable alternative for providing a second opinion to human experts for the classification of breast cancer tumors.","PeriodicalId":304765,"journal":{"name":"2016 International Conference on Data Science and Engineering (ICDSE)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117163073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Data preparation: Art or science? 数据准备:艺术还是科学?
2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823936
Gunjan Mansingh, Kweku-Muata A. Osei-Bryson, L. Rao, Maurice McNaughton
{"title":"Data preparation: Art or science?","authors":"Gunjan Mansingh, Kweku-Muata A. Osei-Bryson, L. Rao, Maurice McNaughton","doi":"10.1109/ICDSE.2016.7823936","DOIUrl":"https://doi.org/10.1109/ICDSE.2016.7823936","url":null,"abstract":"Data preparation is often cited as the most time consuming phase of a Knowledge Discovery and Data Mining (KDDM) process. This is attributed to the fact that this phase is highly dependent on the expertise of the analyst. Although process models exist for KDDM the description of their phases of the process focus on outlining what must be done but often do not detail how this should be done. While there is some research in addressing the how of the phases, the data preparation phase is thought to be the most challenging and is often described as an art rather than a science. The tasks defined in this phase are thought to be highly dependent on the expertise of the analyst and the context. While we are of the view that there will always be an art to data preparation we will demonstrate that the science can actually enhance the art. We further contend that as more research of this kind is published, that demonstrates a variety of data preparation techniques that enhance the data mining process, the more effective will be the science of data preparation.","PeriodicalId":304765,"journal":{"name":"2016 International Conference on Data Science and Engineering (ICDSE)","volume":"47 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128200739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Erasure coded storage systems for cloud storage — challenges and opportunities 用于云存储的Erasure编码存储系统——挑战与机遇
2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823943
Ojus Thomas Lee, S. D. M. Kumar, P. Chandran
{"title":"Erasure coded storage systems for cloud storage — challenges and opportunities","authors":"Ojus Thomas Lee, S. D. M. Kumar, P. Chandran","doi":"10.1109/ICDSE.2016.7823943","DOIUrl":"https://doi.org/10.1109/ICDSE.2016.7823943","url":null,"abstract":"Erasure coded storage schemes offer a promising future for cloud storage. Highlights of erasure coded storage systems are that these offer the same level of fault tolerance as that of replication, at lower storage footprints. In the big data era, cloud storage systems based on data replication are of dubious usability due to 200% storage overhead in data replication systems. This has prompted storage service providers to use erasure coded storage as an alternative to replication. Refinements are required in various aspects of erasure coded storage systems to make it a real contender against data replication based storage systems. Streamlining huge bandwidth requirements during the recovery of failed nodes, inefficient update operations, effect of topology in recovery and consistency requirements of erasure coded storage systems, are some areas which need attention. This paper presents an in-depth study on the challenges faced, and research pursued in some of these areas. The survey shows that more research is required to improve erasure coded storage system from being bandwidth crunchers to efficient storage systems. Another challenge that has emerged from the study is the requirement of elaborate research for upgrading the erasure coded storage systems from being mere archival storage systems by providing better update methods. Provision of multiple level consistency in erasure coded storage is yet another research opportunity identified in this work. A brief introduction to open source libraries available for erasure coded storage is also presented in the paper.","PeriodicalId":304765,"journal":{"name":"2016 International Conference on Data Science and Engineering (ICDSE)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131702795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Proper imputation techniques for missing values in data sets 数据集缺失值的正确输入技术
2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823957
Tahani Aljuaid, S. Sasi
{"title":"Proper imputation techniques for missing values in data sets","authors":"Tahani Aljuaid, S. Sasi","doi":"10.1109/ICDSE.2016.7823957","DOIUrl":"https://doi.org/10.1109/ICDSE.2016.7823957","url":null,"abstract":"Data mining requires a pre-processing task in which the data are prepared and cleaned for ensuring the quality. Missing value occurs when no data value is stored for a variable in an observation. This has a significant effect on the results especially when it leads to biased parameter estimates. It will not only diminish the quality of the result, but also disqualify for analysis purposes. Hence there are risks associated with missing values in a dataset. Imputation is a technique of replacing missing data with substituted values. This research presents a comparison of imputation techniques such as MeanMode, K-Nearest Neighbor, Hot-Deck, Expectation Maximization and C5.0 for missing data. The choice of proper imputation method is based on datatypes, missing data mechanisms, patterns and methods. Datatype can be numerical, categorical or mixed. Missing data mechanism can be missing completely at random, missing at random, or not missing at random. Patterns of missing data can be with respect to cases or attributes. Methods can be a pre-replace or an embedded method. These five imputation techniques are used to impute artificially created missing data from different data sets of varying sizes. The performance of these techniques are compared based on the classification accuracy and the results are presented.","PeriodicalId":304765,"journal":{"name":"2016 International Conference on Data Science and Engineering (ICDSE)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133435606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
Anomaly detection in web graphs using vertex neighbourhood based signature similarity methods 基于顶点邻域的网络图异常检测方法
2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823959
Aritra Ghosh, Pallavi Gudipati
{"title":"Anomaly detection in web graphs using vertex neighbourhood based signature similarity methods","authors":"Aritra Ghosh, Pallavi Gudipati","doi":"10.1109/ICDSE.2016.7823959","DOIUrl":"https://doi.org/10.1109/ICDSE.2016.7823959","url":null,"abstract":"With massive increase in the amount of data being generated each day, we need automated tools to oversee the evolution of the web and to quantify global effects like pagerank of webpages. Search engines crawl the web every now and then to build web graphs which store information about the structure of the web. This is an expensive and error prone process. Central to this problem is the notion of graph similarity (between two graphs spaced in time), which validates how well search engines secure content from web and the quality of the search results they produce. In this paper, we propose two different types of anomalies which occur during crawling and two novel similarity measures based on vertex neighbourhood, which overcomes the proposed anomalies. Extensive experimentation on real world datasets shows significant improvement over state of art signature similarity based methods.","PeriodicalId":304765,"journal":{"name":"2016 International Conference on Data Science and Engineering (ICDSE)","volume":"176 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114008813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Inferring borrower network in a microfinancing framework (KIVA) 小额融资框架下的推断借款人网络(KIVA)
2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823960
Aritra Ghosh, Jithin Vachery
{"title":"Inferring borrower network in a microfinancing framework (KIVA)","authors":"Aritra Ghosh, Jithin Vachery","doi":"10.1109/ICDSE.2016.7823960","DOIUrl":"https://doi.org/10.1109/ICDSE.2016.7823960","url":null,"abstract":"Microfinance institutions aim at offering financial services to people in low-income category, who typically lack access to traditional banking systems. Till date, greater than 15 billion U.S dollars has been infused into microfinancing, assisting more than 160 million people in developing countries. With the tremendous growth in the World Wide Web, a number of microfinance institutions have recently moved online. One such noble initiative is KIVA, a crowd sourced online microfinance platform which connects borrowers (small entrepreneurs and individuals) to lenders through the field partners. One particular interest to such microfinancing institutions, is the analysis of the network of borrowers which can help them improve the percentage of loan requests fulfilled. KIVA provides a rich dataset capturing the lending activities on the website. In this paper, we analyze the data to find and extract the structure in the KIVA framework. We formulate a novel tripartite extension of SimRank using the network of lenders, loans and borrowers to capture the inherent pattern in the system. We also propose a Multipartite extension of SimRank useful for real world settings. Extensive experiments validate the effectiveness of our modeling and the proposed disambiguation scheme for borrowers.","PeriodicalId":304765,"journal":{"name":"2016 International Conference on Data Science and Engineering (ICDSE)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117077845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SADA: Secure approximate data aggregation in wireless sensor networks SADA:无线传感器网络中的安全近似数据聚合
2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823942
E. G. Prathima, T. Prakash, K. Venugopal, S. Iyengar, L. Patnaik
{"title":"SADA: Secure approximate data aggregation in wireless sensor networks","authors":"E. G. Prathima, T. Prakash, K. Venugopal, S. Iyengar, L. Patnaik","doi":"10.1109/ICDSE.2016.7823942","DOIUrl":"https://doi.org/10.1109/ICDSE.2016.7823942","url":null,"abstract":"Wireless Sensor Networks are susceptible to communication failures and security attacks due to broadcast nature of communication. Multipath based communication techniques were designed to address communication failure. This paper proposes Secure Approximate Data Aggregation (SADA) in which synopsis are generated using primitive polynomial and Message Authentication Codes (MACs) are transmitted along with the synopsis to ensure integrity. SADA provides data freshness and integrity at a communication cost of O(1). Simulation results show that the SADA protocol incurs lower energy consumption and communication and computation cost compared to the state-of-the art protocols [1] [2].","PeriodicalId":304765,"journal":{"name":"2016 International Conference on Data Science and Engineering (ICDSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117117490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Comparative study of data mining clustering algorithms 数据挖掘聚类算法的比较研究
2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823946
Iyer Aurobind Venkatkumar, Sanatkumar Jayantibhai Kondhol Shardaben
{"title":"Comparative study of data mining clustering algorithms","authors":"Iyer Aurobind Venkatkumar, Sanatkumar Jayantibhai Kondhol Shardaben","doi":"10.1109/ICDSE.2016.7823946","DOIUrl":"https://doi.org/10.1109/ICDSE.2016.7823946","url":null,"abstract":"In today's world, where we generate large amount of data, we can harness the benefits of the hidden information i.e. patterns or correlations in these data. This information can be used in various constructive fields only if we are able to handle big data efficiently. One such process that is used to extract and handle the hidden information is data mining. There are various techniques in data mining namely Clustering, Prediction, Classification, Association etc. Clustering is dividing data set into related groups such that all the groups do not have anything in common. Prediction, as the name suggests, predictions are made with available data set. It does not give surety of any kind, it may predict right or may predict wrong. Classification is classification of data sets into some predefined sets using various mathematical models. Association is discovering a correlation hidden in large amount of data, that is, in a given transaction based on the relationships between the items a pattern is discovered. In this paper we study one of the most widely used methods to handle big data, that is, data mining clustering algorithms. Here we have studied and made a comparative analysis of four classic clustering algorithms namely K-means, BIRCH, DBSCAN, STING.","PeriodicalId":304765,"journal":{"name":"2016 International Conference on Data Science and Engineering (ICDSE)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115525609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Student profiling to improve teaching and learning: A data mining approach 学生剖析改进教与学:一种数据挖掘方法
2016 International Conference on Data Science and Engineering (ICDSE) Pub Date : 2016-08-01 DOI: 10.1109/ICDSE.2016.7823947
Anand Desai, Nemil Shah, Madhuri Dhodi
{"title":"Student profiling to improve teaching and learning: A data mining approach","authors":"Anand Desai, Nemil Shah, Madhuri Dhodi","doi":"10.1109/ICDSE.2016.7823947","DOIUrl":"https://doi.org/10.1109/ICDSE.2016.7823947","url":null,"abstract":"Data mining is a technology used in different disciplines to search for significant relationships among variables in large data sets. In this paper, we concentrate on the application of data mining in an educational environment. This study can be used to help teachers to classify students' academic success, along with their determination measured by a grit test, and thus modify their teaching for different groups of students. According to this classification, one can arrange remedial classes or extra tests for the required students. Also students can monitor their growth from semester to semester with the help of the application made.","PeriodicalId":304765,"journal":{"name":"2016 International Conference on Data Science and Engineering (ICDSE)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126357251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信