Handbook of Big Data Analytics. Volume 1: Methodologies最新文献

筛选
英文 中文
A review of fog and edge computing with big data analytics 雾和边缘计算与大数据分析的回顾
Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch8
C. Rajyalakshmi, K. R. Rao, Rajeswara Rao Ramisetty
{"title":"A review of fog and edge computing with big data analytics","authors":"C. Rajyalakshmi, K. R. Rao, Rajeswara Rao Ramisetty","doi":"10.1049/pbpc037f_ch8","DOIUrl":"https://doi.org/10.1049/pbpc037f_ch8","url":null,"abstract":"In this review, we present and explore the cloud computing offloading strategies with fog and edge computing that has been accepted in recent years. It reflects a noticeable improvement in the information collection, transmission as well as the management of data in the field for computer consumers.This review also focuses on how various computing paradigms applied with fog and edge computing environment are used for realising recently emerging IoT applications and cyber security threats.","PeriodicalId":162132,"journal":{"name":"Handbook of Big Data Analytics. Volume 1: Methodologies","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116996037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The impact of Big Data on databases 大数据对数据库的影响
Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch1
Antonio Sarasa Cabezuelo
{"title":"The impact of Big Data on databases","authors":"Antonio Sarasa Cabezuelo","doi":"10.1049/pbpc037f_ch1","DOIUrl":"https://doi.org/10.1049/pbpc037f_ch1","url":null,"abstract":"The last decade, from the point of view of information management, is characterized by an exponential generation of data. In any interaction that is carried out by digital means, data is generated. Some popular examples are social networks on the Internet, mobile device apps, commercial transactions through online banking, the history of a user's browsing through the network, geolocation information generated by a user's mobile, etc. In general, all this information is stored by the companies or institutions with which the interaction is maintained (unless the user has expressly indicated that it cannot be stored).","PeriodicalId":162132,"journal":{"name":"Handbook of Big Data Analytics. Volume 1: Methodologies","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127809994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Role of artificial intelligence and big data in accelerating accessibility for persons with disabilities 人工智能和大数据在加快残疾人无障碍进程中的作用
Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch10
Kundumani Srinivasan Kuppusamy
{"title":"Role of artificial intelligence and big data in accelerating accessibility for persons with disabilities","authors":"Kundumani Srinivasan Kuppusamy","doi":"10.1049/pbpc037f_ch10","DOIUrl":"https://doi.org/10.1049/pbpc037f_ch10","url":null,"abstract":"Artificial intelligence (AI) and big data have emerged into mainstream tools from being niche tools in the recent past. These technological improvements have changed the manner in which software tools are designed and have provided unprecedented benefits to the users. This article analyses the impact of both of these technologies through the lens of accessibility computing which is a sub-domain of human- computer interaction. The rationales for incorporating accessibility for persons with disabilities in the digital ecosystem are illustrated. This article proposes a key term `perception porting' which is aimed towards converting of data suitable for one sense through another with the help of AI and big data. The specific tools and techniques that are available to assist persons with specific disabilities such as smart vision, smart exoskeletons, captioning techniques and Internet of Things-based solutions are explored.","PeriodicalId":162132,"journal":{"name":"Handbook of Big Data Analytics. Volume 1: Methodologies","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134382397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Back Matter 回到问题
Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_bm
V. Ravi, A. Cherukuri
{"title":"Back Matter","authors":"V. Ravi, A. Cherukuri","doi":"10.1049/pbpc037f_bm","DOIUrl":"https://doi.org/10.1049/pbpc037f_bm","url":null,"abstract":"","PeriodicalId":162132,"journal":{"name":"Handbook of Big Data Analytics. Volume 1: Methodologies","volume":"532 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116337590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward real-time data processing: an advanced approach in big data analytics 走向实时数据处理:大数据分析中的一种先进方法
Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch5
Shafqat Ul Ahsaan, Harleen Kaur, Sameena Naaz
{"title":"Toward real-time data processing: an advanced approach in big data analytics","authors":"Shafqat Ul Ahsaan, Harleen Kaur, Sameena Naaz","doi":"10.1049/pbpc037f_ch5","DOIUrl":"https://doi.org/10.1049/pbpc037f_ch5","url":null,"abstract":"Nowadays, a huge quantity of data are produced by means of multiple data sources. The existing tools and techniques are not capable of handling such voluminous data produced from a variety of sources. This continuous and varied generation of data requires advanced technologies for processing and storage, which seems to be a big challenge for data scientists. Some research studies are well defined in the area of streaming in big data. Streaming data are the real-time data or data in motion such as stock market data, sensor data, GPS data and twitter data. In stream processing, the data are not stored in databases instead it is processed and analyzed on the fly to get the value as soon as they are generated. There are a number of streaming frameworks proposed till date for big data applications that are used to pile up, evaluate and process the data that are generated and captured continuously. In this chapter, we provide an in-depth summary of various big data streaming approaches like Apache Storm, Apache Hive and Apache Samza. We also presented a comparative study regarding these streaming platforms.","PeriodicalId":162132,"journal":{"name":"Handbook of Big Data Analytics. Volume 1: Methodologies","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114615210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The role of data lake in big data analytics: recent developments and challenges 数据湖在大数据分析中的作用:最近的发展和挑战
Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch3
T. R. Rao, Pabitra Mitra, A. Goswami
{"title":"The role of data lake in big data analytics: recent developments and challenges","authors":"T. R. Rao, Pabitra Mitra, A. Goswami","doi":"10.1049/pbpc037f_ch3","DOIUrl":"https://doi.org/10.1049/pbpc037f_ch3","url":null,"abstract":"We explore the concept of a data lake (DL), big data fabric, DL architecture and various layers of a DL. We also present various components of each of the layers that exist in a DL. We compare and contrast the notion of data warehouses and DLs concerning some key characteristics. Moreover, we explore various commercial- and open-source-based DLs with their strengths and limitations. Also, we discuss some of the key best practices for DLs. Further, we present two case studies of DLs: Lumada data lake (LDL) and Temenos data lake (TDL) for digital banking. Finally, we explore some of the crucial challenges that are facing in the formation of DLs.","PeriodicalId":162132,"journal":{"name":"Handbook of Big Data Analytics. Volume 1: Methodologies","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133248684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Big data processing frameworks and architectures: a survey 大数据处理框架和架构:调查
Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch2
R. Chunduri, A. Cherukuri
{"title":"Big data processing frameworks and architectures: a survey","authors":"R. Chunduri, A. Cherukuri","doi":"10.1049/pbpc037f_ch2","DOIUrl":"https://doi.org/10.1049/pbpc037f_ch2","url":null,"abstract":"In recent times, there has been rapid growth in data generated from autonomous sources. The existing data processing techniques are not suitable to deal with these large volumes of complex data that can be structured, semi-structured or unstructured. This large data is referred to as Big data because of its main characteristics: volume, variety velocity, value and veracity. Extensive research on Big data is ongoing, and the primary focus of this research is on processing massive amounts of data effectively and efficiently. However, researchers are paying little attention on how to store and analyze the large volumes of data to get useful insights from it. In this chapter, the authors examine existing Big data processing frameworks like MapReduce, Apache Spark, Storm and Flink. In this chapter, the architectures of MapReduce, iterative MapReduce frameworks and components of Apache Spark are discussed in detail. Most of the widely used classical machine learning techniques are implemented using these Big data frameworks in the form of Apache Mahout and Spark MLlib libraries and these need to be enhanced to support all existing machine learning techniques like formal concept analysis (FCA) and neural embedding. In this chapter, authors have taken FCA as an application and provided scalable FCA algorithms using the Big data processing frameworks like MapReduce and Spark. Streaming data processing frameworks like Apache Flink and Apache Storm is also examined. Authors also discuss about the storage architectures like Hadoop Distributed File System (HDFS), Dynamo and Amazon S3 in detail while processing large Big data applications. The survey concludes with a proposal for best practices related to the studied architectures and frameworks.","PeriodicalId":162132,"journal":{"name":"Handbook of Big Data Analytics. Volume 1: Methodologies","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116205564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fog computing framework for Big Data processing using cluster management in a resource-constraint environment 资源约束环境下使用集群管理的大数据处理雾计算框架
Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch9
Srinivasa Raju Rudraraju, N. Suryadevara, A. Negi
{"title":"Fog computing framework for Big Data processing using cluster management in a resource-constraint environment","authors":"Srinivasa Raju Rudraraju, N. Suryadevara, A. Negi","doi":"10.1049/pbpc037f_ch9","DOIUrl":"https://doi.org/10.1049/pbpc037f_ch9","url":null,"abstract":"This article presents the implementation details related to the distributed storage and processing of big datasets in fog computing cluster environment. The implementation details of fog computing framework using Apache Spark for big data applications in a resource-constrained environment are given. The results related to Big Data processing, modeling, and prediction in a resource-constraint fog computing framework are presented by considering the evaluation of case studies using the e-commerce customer dataset and bank loan credit risk datasets.","PeriodicalId":162132,"journal":{"name":"Handbook of Big Data Analytics. Volume 1: Methodologies","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126882120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Architectures of big data analytics: scaling out data mining algorithms using Hadoop–MapReduce and Spark 大数据分析架构:使用Hadoop-MapReduce和Spark扩展数据挖掘算法
Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch7
Sheikh Kamaruddin, V. Ravi
{"title":"Architectures of big data analytics: scaling out data mining algorithms using Hadoop–MapReduce and Spark","authors":"Sheikh Kamaruddin, V. Ravi","doi":"10.1049/pbpc037f_ch7","DOIUrl":"https://doi.org/10.1049/pbpc037f_ch7","url":null,"abstract":"Many statistical and machine learning (ML) techniques have been successfully applied to small-sized datasets during the past one and half decades. However, in today's world, different application domains, viz., healthcare, finance, bioinformatics, telecommunications, and meteorology, generate huge volumes of data on a daily basis. All these massive datasets have to be analyzed for discovering hidden insights. With the advent of big data analytics (BDA) paradigm, the data mining (DM) techniques were modified and scaled out to adapt to the distributed and parallel environment. This chapter reviewed 249 articles appeared between 2009 and 2019, which implemented different DM techniques in a parallel, distributed manner in the Apache Hadoop MapReduce framework or Apache Spark environment for solving various DM tasks. We present some critical analyses of these papers and bring out some interesting insights. We have found that methods like Apriori, support vector machine (SVM), random forest (RF), K-means and many variants of the previous along with many other approaches are made into parallel distributed environment and produced scalable and effective insights out of it. This review is concluded with a discussion of some open areas of research with future directions, which can be explored further by the researchers and practitioners alike.","PeriodicalId":162132,"journal":{"name":"Handbook of Big Data Analytics. Volume 1: Methodologies","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115592803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Query optimization strategies for big data 面向大数据的查询优化策略
Handbook of Big Data Analytics. Volume 1: Methodologies Pub Date : 2021-07-07 DOI: 10.1049/pbpc037f_ch4
Nagesh Bhattu Sristy, Prashanth Kadari, Harini Yadamreddy
{"title":"Query optimization strategies for big data","authors":"Nagesh Bhattu Sristy, Prashanth Kadari, Harini Yadamreddy","doi":"10.1049/pbpc037f_ch4","DOIUrl":"https://doi.org/10.1049/pbpc037f_ch4","url":null,"abstract":"Query optimization for big data architectures like MapReduce, Spark, and Druid is challenging due to the numerosity of the algorithmic issues to be addressed. Conventional algorithmic design issues like memory, CPU time, IO cost should be analyzed in the context of additional parameters such as communication cost. The issue of data resident skew further complicates the analysis. This chapter studies the communication cost reduction strategies for conventional workloads such as joins, spatial queries, and graph queries. We review the algorithms for multi-way join using MapReduce. Multi-way θ-join algorithms address the multi-way join with inequality conditions. As θ-join output is much higher compared to the output of equi join, multi-way θ-join further poses difficulties for the analysis. An analysis of multi-way θ-join is presented on the basis of sizes of input sets, output sets as well as the communication cost. Data resident skew plays a key role in all the scenarios discussed. Addressing the skew in a general sense is discussed. Partitioning strategies that minimize the impact of skew on the skew in loads of computing nodes are also further presented. Application of join strategies for the spatial data has dragged the interest of researchers, and distribution of spatial join requires special emphasis for dealing with the spatial nature of the dataset. A controlled replicate strategy is reviewed to solve the problem of multi-way spatial join. Graph-based analytical queries such as triangle counting and subgraph enumeration in the context of distributed processing are presented. Being a primitive needed for many graph queries, triangle counting has been analyzed from the perspective of skew it brings using an elegant distribution scheme. Subgraph enumeration problem is also presented using various partitioning schemes and a brief analysis of their performance.","PeriodicalId":162132,"journal":{"name":"Handbook of Big Data Analytics. Volume 1: Methodologies","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132419428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信