Big Data, Big Challenges

2014 IEEE International Conference on Semantic Computing Pub Date : 2014-06-16 DOI:10.1109/ICSC.2014.65

Wei Wang

{"title":"Big Data, Big Challenges","authors":"Wei Wang","doi":"10.1109/ICSC.2014.65","DOIUrl":null,"url":null,"abstract":"Summary form only given. Big data analytics is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other useful information. Its revolutionary potential is now universally recognized. Data complexity, heterogeneity, scale, and timeliness make data analysis a clear bottleneck in many biomedical applications, due to the complexity of the patterns and lack of scalability of the underlying algorithms. Advanced machine learning and data mining algorithms are being developed to address one or more challenges listed above. It is typical that the complexity of potential patterns may grow exponentially with respect to the data complexity, and so is the size of the pattern space. To avoid an exhaustive search through the pattern space, machine learning and data mining algorithms usually employ a greedy approach to search for a local optimum in the solution space, or use a branch-and-bound approach to seek optimal solutions, and consequently, are often implemented as iterative or recursive procedures. To improve efficiency, these algorithms often exploit the dependencies between potential patterns to maximize in-memory computation and/or leverage special hardware (such as GPU and FPGA) for acceleration. These lead to strong data dependency, operation dependency, and hardware dependency, and sometimes ad hoc solutions that cannot be generalized to a broader scope. In this talk, I will present some open challenges faced by data scientist in biomedical fields and the current approaches taken to tackle these challenges.","PeriodicalId":175352,"journal":{"name":"2014 IEEE International Conference on Semantic Computing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Semantic Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSC.2014.65","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

Abstract

Summary form only given. Big data analytics is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other useful information. Its revolutionary potential is now universally recognized. Data complexity, heterogeneity, scale, and timeliness make data analysis a clear bottleneck in many biomedical applications, due to the complexity of the patterns and lack of scalability of the underlying algorithms. Advanced machine learning and data mining algorithms are being developed to address one or more challenges listed above. It is typical that the complexity of potential patterns may grow exponentially with respect to the data complexity, and so is the size of the pattern space. To avoid an exhaustive search through the pattern space, machine learning and data mining algorithms usually employ a greedy approach to search for a local optimum in the solution space, or use a branch-and-bound approach to seek optimal solutions, and consequently, are often implemented as iterative or recursive procedures. To improve efficiency, these algorithms often exploit the dependencies between potential patterns to maximize in-memory computation and/or leverage special hardware (such as GPU and FPGA) for acceleration. These lead to strong data dependency, operation dependency, and hardware dependency, and sometimes ad hoc solutions that cannot be generalized to a broader scope. In this talk, I will present some open challenges faced by data scientist in biomedical fields and the current approaches taken to tackle these challenges.

查看原文本刊更多论文

大数据，大挑战

只提供摘要形式。大数据分析是检查大量各种类型的数据(大数据)以发现隐藏的模式，未知的相关性和其他有用信息的过程。它的革命潜力现在已得到普遍认可。数据的复杂性、异构性、规模和时效性使得数据分析在许多生物医学应用中成为一个明显的瓶颈，这是由于数据模式的复杂性和底层算法缺乏可扩展性。正在开发先进的机器学习和数据挖掘算法来解决上面列出的一个或多个挑战。典型的情况是，潜在模式的复杂性可能会随着数据复杂性呈指数级增长，模式空间的大小也是如此。为了避免在模式空间中进行穷举搜索，机器学习和数据挖掘算法通常采用贪心方法在解空间中搜索局部最优，或使用分支定界方法寻求最优解，因此，通常以迭代或递归的方式实现。为了提高效率，这些算法通常利用潜在模式之间的依赖关系来最大化内存中的计算和/或利用特殊硬件(如GPU和FPGA)来加速。这将导致强烈的数据依赖性、操作依赖性和硬件依赖性，有时还会导致无法推广到更广泛范围的临时解决方案。在这次演讲中，我将介绍生物医学领域数据科学家面临的一些公开挑战以及目前应对这些挑战的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE International Conference on Semantic Computing

自引率

0.00%

发文量