查询大数据

International Conference on Computer Systems and Technologies Pub Date : 2012-06-22 DOI:10.1145/2383276.2383278

Boris Novikov, N. Vassilieva, A. Yarygina

{"title":"查询大数据","authors":"Boris Novikov, N. Vassilieva, A. Yarygina","doi":"10.1145/2383276.2383278","DOIUrl":null,"url":null,"abstract":"The term \"Big Data\" became a buzzword and is widely used in both research and industrial worlds. Typically the concept of big data assumes a variety of different sources of information and velocity of complex analytical processing, rather than just a huge and growing volume of data. All variety, velocity, and volume create new research challenges, as nearly all techniques and tools commonly used in data processing have to be re-considered. Variety and uncertainty of big data require a mixture of exact and similarity search and grouping of complex objects based on different attributes. High-level declarative query languages are important in this context due to expressiveness and potential for optimization.\n In this talk we are mostly interested in an algebraic layer for complex query processing which resides between user interface (most likely, graphical) and execution engine in layered system architecture. We analyze the applicability of existing models and query languages. We describe a systematic approach to similarity handling of complex objects, simultaneous application of different similarity measures and querying paradigms, complex searching and querying, combined semi-structured and unstructured search. We introduce the adaptive abstract operations based on the concept of fuzzy set, which are needed to support uniform handling of different kinds of similarity processing. To ensure an efficient implementation, approximate algorithms with controlled quality are required to enable quality versus performance trade-off for timeliness of similarity processing. Uniform and adaptive operations enable high-level declarative definition of complex queries and provide options for optimization.","PeriodicalId":316788,"journal":{"name":"International Conference on Computer Systems and Technologies","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Querying big data\",\"authors\":\"Boris Novikov, N. Vassilieva, A. Yarygina\",\"doi\":\"10.1145/2383276.2383278\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The term \\\"Big Data\\\" became a buzzword and is widely used in both research and industrial worlds. Typically the concept of big data assumes a variety of different sources of information and velocity of complex analytical processing, rather than just a huge and growing volume of data. All variety, velocity, and volume create new research challenges, as nearly all techniques and tools commonly used in data processing have to be re-considered. Variety and uncertainty of big data require a mixture of exact and similarity search and grouping of complex objects based on different attributes. High-level declarative query languages are important in this context due to expressiveness and potential for optimization.\\n In this talk we are mostly interested in an algebraic layer for complex query processing which resides between user interface (most likely, graphical) and execution engine in layered system architecture. We analyze the applicability of existing models and query languages. We describe a systematic approach to similarity handling of complex objects, simultaneous application of different similarity measures and querying paradigms, complex searching and querying, combined semi-structured and unstructured search. We introduce the adaptive abstract operations based on the concept of fuzzy set, which are needed to support uniform handling of different kinds of similarity processing. To ensure an efficient implementation, approximate algorithms with controlled quality are required to enable quality versus performance trade-off for timeliness of similarity processing. Uniform and adaptive operations enable high-level declarative definition of complex queries and provide options for optimization.\",\"PeriodicalId\":316788,\"journal\":{\"name\":\"International Conference on Computer Systems and Technologies\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-06-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Computer Systems and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2383276.2383278\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Computer Systems and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2383276.2383278","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

“大数据”一词成为一个流行词，在研究和工业领域都被广泛使用。通常，大数据的概念假设了各种不同的信息源和复杂分析处理的速度，而不仅仅是巨大且不断增长的数据量。所有种类、速度和数量都带来了新的研究挑战，因为几乎所有数据处理中常用的技术和工具都必须重新考虑。大数据的多样性和不确定性要求精确搜索和相似搜索的混合，以及基于不同属性对复杂对象进行分组。高级声明性查询语言在这种情况下非常重要，因为它具有表达能力和优化潜力。在这个演讲中，我们最感兴趣的是复杂查询处理的代数层，它位于分层系统架构中的用户界面(很可能是图形化的)和执行引擎之间。我们分析了现有模型和查询语言的适用性。我们描述了一种系统的方法来处理复杂对象的相似性，同时应用不同的相似性度量和查询范式，复杂搜索和查询，结合半结构化和非结构化搜索。引入了基于模糊集概念的自适应抽象运算，以支持对不同类型相似性处理的统一处理。为了确保有效的实现，需要具有受控质量的近似算法，以便在相似性处理的时效性方面实现质量与性能之间的权衡。统一和自适应操作支持复杂查询的高级声明性定义，并提供优化选项。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Querying big data

The term "Big Data" became a buzzword and is widely used in both research and industrial worlds. Typically the concept of big data assumes a variety of different sources of information and velocity of complex analytical processing, rather than just a huge and growing volume of data. All variety, velocity, and volume create new research challenges, as nearly all techniques and tools commonly used in data processing have to be re-considered. Variety and uncertainty of big data require a mixture of exact and similarity search and grouping of complex objects based on different attributes. High-level declarative query languages are important in this context due to expressiveness and potential for optimization. In this talk we are mostly interested in an algebraic layer for complex query processing which resides between user interface (most likely, graphical) and execution engine in layered system architecture. We analyze the applicability of existing models and query languages. We describe a systematic approach to similarity handling of complex objects, simultaneous application of different similarity measures and querying paradigms, complex searching and querying, combined semi-structured and unstructured search. We introduce the adaptive abstract operations based on the concept of fuzzy set, which are needed to support uniform handling of different kinds of similarity processing. To ensure an efficient implementation, approximate algorithms with controlled quality are required to enable quality versus performance trade-off for timeliness of similarity processing. Uniform and adaptive operations enable high-level declarative definition of complex queries and provide options for optimization.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference on Computer Systems and Technologies

自引率

0.00%

发文量