Handbook of Big Data Analytics. Volume 1: Methodologies最新文献

A review of fog and edge computing with big data analytics 雾和边缘计算与大数据分析的回顾

Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch8

C. Rajyalakshmi, K. R. Rao, Rajeswara Rao Ramisetty

引用次数: 0

The impact of Big Data on databases 大数据对数据库的影响

Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch1

Antonio Sarasa Cabezuelo

引用次数: 0

Role of artificial intelligence and big data in accelerating accessibility for persons with disabilities 人工智能和大数据在加快残疾人无障碍进程中的作用

Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch10

Kundumani Srinivasan Kuppusamy

引用次数: 0

Back Matter 回到问题

Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_bm

V. Ravi, A. Cherukuri

引用次数: 0

Toward real-time data processing: an advanced approach in big data analytics 走向实时数据处理:大数据分析中的一种先进方法

Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch5

Shafqat Ul Ahsaan, Harleen Kaur, Sameena Naaz

引用次数: 0

The role of data lake in big data analytics: recent developments and challenges 数据湖在大数据分析中的作用:最近的发展和挑战

Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch3

T. R. Rao, Pabitra Mitra, A. Goswami

引用次数: 0

Big data processing frameworks and architectures: a survey 大数据处理框架和架构:调查

Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch2

R. Chunduri, A. Cherukuri

{"title":"Big data processing frameworks and architectures: a survey","authors":"R. Chunduri, A. Cherukuri","doi":"10.1049/pbpc037f_ch2","DOIUrl":"https://doi.org/10.1049/pbpc037f_ch2","url":null,"abstract":"In recent times, there has been rapid growth in data generated from autonomous sources. The existing data processing techniques are not suitable to deal with these large volumes of complex data that can be structured, semi-structured or unstructured. This large data is referred to as Big data because of its main characteristics: volume, variety velocity, value and veracity. Extensive research on Big data is ongoing, and the primary focus of this research is on processing massive amounts of data effectively and efficiently. However, researchers are paying little attention on how to store and analyze the large volumes of data to get useful insights from it. In this chapter, the authors examine existing Big data processing frameworks like MapReduce, Apache Spark, Storm and Flink. In this chapter, the architectures of MapReduce, iterative MapReduce frameworks and components of Apache Spark are discussed in detail. Most of the widely used classical machine learning techniques are implemented using these Big data frameworks in the form of Apache Mahout and Spark MLlib libraries and these need to be enhanced to support all existing machine learning techniques like formal concept analysis (FCA) and neural embedding. In this chapter, authors have taken FCA as an application and provided scalable FCA algorithms using the Big data processing frameworks like MapReduce and Spark. Streaming data processing frameworks like Apache Flink and Apache Storm is also examined. Authors also discuss about the storage architectures like Hadoop Distributed File System (HDFS), Dynamo and Amazon S3 in detail while processing large Big data applications. The survey concludes with a proposal for best practices related to the studied architectures and frameworks.","PeriodicalId":162132,"journal":{"name":"Handbook of Big Data Analytics. Volume 1: Methodologies","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116205564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fog computing framework for Big Data processing using cluster management in a resource-constraint environment 资源约束环境下使用集群管理的大数据处理雾计算框架

Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch9

Srinivasa Raju Rudraraju, N. Suryadevara, A. Negi

引用次数: 0

Architectures of big data analytics: scaling out data mining algorithms using Hadoop–MapReduce and Spark 大数据分析架构:使用Hadoop-MapReduce和Spark扩展数据挖掘算法

Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch7

Sheikh Kamaruddin, V. Ravi

{"title":"Architectures of big data analytics: scaling out data mining algorithms using Hadoop–MapReduce and Spark","authors":"Sheikh Kamaruddin, V. Ravi","doi":"10.1049/pbpc037f_ch7","DOIUrl":"https://doi.org/10.1049/pbpc037f_ch7","url":null,"abstract":"Many statistical and machine learning (ML) techniques have been successfully applied to small-sized datasets during the past one and half decades. However, in today's world, different application domains, viz., healthcare, finance, bioinformatics, telecommunications, and meteorology, generate huge volumes of data on a daily basis. All these massive datasets have to be analyzed for discovering hidden insights. With the advent of big data analytics (BDA) paradigm, the data mining (DM) techniques were modified and scaled out to adapt to the distributed and parallel environment. This chapter reviewed 249 articles appeared between 2009 and 2019, which implemented different DM techniques in a parallel, distributed manner in the Apache Hadoop MapReduce framework or Apache Spark environment for solving various DM tasks. We present some critical analyses of these papers and bring out some interesting insights. We have found that methods like Apriori, support vector machine (SVM), random forest (RF), K-means and many variants of the previous along with many other approaches are made into parallel distributed environment and produced scalable and effective insights out of it. This review is concluded with a discussion of some open areas of research with future directions, which can be explored further by the researchers and practitioners alike.","PeriodicalId":162132,"journal":{"name":"Handbook of Big Data Analytics. Volume 1: Methodologies","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115592803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Query optimization strategies for big data 面向大数据的查询优化策略

Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch4

Nagesh Bhattu Sristy, Prashanth Kadari, Harini Yadamreddy

{"title":"Query optimization strategies for big data","authors":"Nagesh Bhattu Sristy, Prashanth Kadari, Harini Yadamreddy","doi":"10.1049/pbpc037f_ch4","DOIUrl":"https://doi.org/10.1049/pbpc037f_ch4","url":null,"abstract":"Query optimization for big data architectures like MapReduce, Spark, and Druid is challenging due to the numerosity of the algorithmic issues to be addressed. Conventional algorithmic design issues like memory, CPU time, IO cost should be analyzed in the context of additional parameters such as communication cost. The issue of data resident skew further complicates the analysis. This chapter studies the communication cost reduction strategies for conventional workloads such as joins, spatial queries, and graph queries. We review the algorithms for multi-way join using MapReduce. Multi-way θ-join algorithms address the multi-way join with inequality conditions. As θ-join output is much higher compared to the output of equi join, multi-way θ-join further poses difficulties for the analysis. An analysis of multi-way θ-join is presented on the basis of sizes of input sets, output sets as well as the communication cost. Data resident skew plays a key role in all the scenarios discussed. Addressing the skew in a general sense is discussed. Partitioning strategies that minimize the impact of skew on the skew in loads of computing nodes are also further presented. Application of join strategies for the spatial data has dragged the interest of researchers, and distribution of spatial join requires special emphasis for dealing with the spatial nature of the dataset. A controlled replicate strategy is reviewed to solve the problem of multi-way spatial join. Graph-based analytical queries such as triangle counting and subgraph enumeration in the context of distributed processing are presented. Being a primitive needed for many graph queries, triangle counting has been analyzed from the perspective of skew it brings using an elegant distribution scheme. Subgraph enumeration problem is also presented using various partitioning schemes and a brief analysis of their performance.","PeriodicalId":162132,"journal":{"name":"Handbook of Big Data Analytics. Volume 1: Methodologies","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132419428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0