2014 IEEE 30th International Conference on Data Engineering最新文献_第3页

Head, modifier, and constraint detection in short texts 短文本中的标题、修饰语和约束检测

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816658

Zhongyuan Wang, Haixun Wang, Zhirui Hu

{"title":"Head, modifier, and constraint detection in short texts","authors":"Zhongyuan Wang, Haixun Wang, Zhirui Hu","doi":"10.1109/ICDE.2014.6816658","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816658","url":null,"abstract":"Head and modifier detection is an important problem for applications that handle short texts such as search queries, ads keywords, titles, captions, etc. In many cases, short texts such as search queries do not follow grammar rules, and existing approaches for head and modifier detection are coarse-grained, domain specific, and/or require labeling of large amounts of training data. In this paper, we introduce a semantic approach for head and modifier detection. We first obtain a large number of instance level head-modifier pairs from search log. Then, we develop a conceptualization mechanism to generalize the instance level pairs to concept level. Finally, we derive weighted concept patterns that are concise, accurate, and have strong generalization power in head and modifier detection. Furthermore, we identify a subset of modifiers that we call constraints. Constraints are usually specific and not negligible as far as the intent of the short text is concerned, while non-constraint modifiers are more subjective. The mechanism we developed has been used in production for search relevance and ads matching. We use extensive experiment results to demonstrate the effectiveness of our approach.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123793063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Adaptive parallel compressed event matching 自适应并行压缩事件匹配

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816665

Mohammad Sadoghi, H. Jacobsen

{"title":"Adaptive parallel compressed event matching","authors":"Mohammad Sadoghi, H. Jacobsen","doi":"10.1109/ICDE.2014.6816665","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816665","url":null,"abstract":"The efficient processing of large collections of patterns expressed as Boolean expressions over event streams plays a central role in major data intensive applications ranging from user-centric processing and personalization to real-time data analysis. On the one hand, emerging user-centric applications, including computational advertising and selective information dissemination, demand determining and presenting to an end-user the relevant content as it is published. On the other hand, applications in real-time data analysis, including push-based multi-query optimization, computational finance and intrusion detection, demand meeting stringent subsecond processing requirements and providing high-frequency event processing. We achieve these event processing requirements by exploiting the shift towards multi-core architectures by proposing novel adaptive parallel compressed event matching algorithm (A-PCM) and online event stream re-ordering technique (OSR) that unleash an unprecedented degree of parallelism amenable for highly parallel event processing. In our comprehensive evaluation, we demonstrate the efficiency of our proposed techniques. We show that the adaptive parallel compressed event matching algorithm can sustain an event rate of up to 233,863 events/second while state-of-the-art sequential event matching algorithms sustains only 36 events/second when processing up to five million Boolean expressions.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"358 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122741362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Engine independence for logical analytic flows 逻辑分析流的引擎独立性

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816723

P. Jovanovic, A. Simitsis, K. Wilkinson

{"title":"Engine independence for logical analytic flows","authors":"P. Jovanovic, A. Simitsis, K. Wilkinson","doi":"10.1109/ICDE.2014.6816723","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816723","url":null,"abstract":"A complex analytic flow in a modern enterprise may perform multiple, logically independent, tasks where each task uses a different processing engine. We term these multi-engine flows hybrid flows. Using multiple processing engines has advantages such as rapid deployment, better performance, lower cost, and so on. However, as the number and variety of these engines grows, developing and maintaining hybrid flows is a significant challenge because they are specified at a physical level and, so are hard to design and may break as the infrastructure evolves. We address this problem by enabling flow design at a logical level and automatic translation to physical flows. There are three main challenges. First, we describe how flows can be represented at a logical level, abstracting away details of any underlying processing engine. Second, we show how a physical flow, expressed in a programming language or some design GUI, can be imported and converted to a logical flow. In particular, we show how a hybrid flow comprising subflows in different languages can be imported and composed as a single, logical flow for subsequent manipulation. Third, we describe how a logical flow is translated into one or more physical flows for execution by the processing engines. The paper concludes with experimental results and example transformations that demonstrate the correctness and utility of our system.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"561 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134324599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

GQBE: Querying knowledge graphs by example entity tuples GQBE:通过示例实体元组查询知识图

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816753

Nandish Jayaram, Mahesh Gupta, Arijit Khan, Chengkai Li, Xifeng Yan, R. Elmasri

引用次数: 32

HOPE: Iterative and interactive database partitioning for OLTP workloads HOPE:用于OLTP工作负载的迭代和交互式数据库分区

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816759

Yu Cao, X. Guo, Baoyao Zhou, S. Todd

引用次数: 1

Large-scale frequent subgraph mining in MapReduce MapReduce中的大规模频繁子图挖掘

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816705

Wenqing Lin, Xiaokui Xiao, Gabriel Ghinita

{"title":"Large-scale frequent subgraph mining in MapReduce","authors":"Wenqing Lin, Xiaokui Xiao, Gabriel Ghinita","doi":"10.1109/ICDE.2014.6816705","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816705","url":null,"abstract":"Mining frequent subgraphs from a large collection of graph objects is an important problem in several application domains such as bio-informatics, social networks, computer vision, etc. The main challenge in subgraph mining is efficiency, as (i) testing for graph isomorphisms is computationally intensive, and (ii) the cardinality of the graph collection to be mined may be very large. We propose a two-step filter-and-refinement approach that is suitable to massive parallelization within the scalable MapReduce computing model. We partition the collection of graphs among worker nodes, and each worker applies the filter step to determine a set of candidate subgraphs that are locally frequent in its partition. The union of all such graphs is the input to the refinement step, where each candidate is checked against all partitions and only the globally frequent graphs are retained. We devise a statistical threshold mechanism that allows us to predict which subgraphs have a high chance to become globally frequent, and thus reduce the computational overhead in the refinement step. We also propose effective strategies to avoid redundant computation in each round when searching for candidate graphs, as well as a lightweight graph compression mechanism to reduce the communication cost between machines. Extensive experimental evaluation results on several real-world large graph datasets show that the proposed approach clearly outperforms the existing state-of-the-art and provides a practical solution to the problem of frequent subgraph mining for massive collections of graphs.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130892796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 85

Stock trade volume prediction with Yahoo Finance user browsing behavior 股票交易量预测与雅虎财经用户浏览行为

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816733

Ilaria Bordino, N. Kourtellis, N. Laptev, Youssef Billawala

引用次数: 26

Efficient support of XQuery Full Text in SQL/XML enabled RDBMS 在启用SQL/XML的RDBMS中有效支持XQuery全文

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816729

Z. Liu, Ying Lu, Hui J. Chang

{"title":"Efficient support of XQuery Full Text in SQL/XML enabled RDBMS","authors":"Z. Liu, Ying Lu, Hui J. Chang","doi":"10.1109/ICDE.2014.6816729","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816729","url":null,"abstract":"There has been more than decade of efforts of supporting storage, query and update XML documents in RDBMS. XML enabled RDBMS supports SQL/XML standard that defines XMLType as a SQL data type and allows XQuery/XPath embedded in XMLQuery(), XMLExists() and XMLTABLE() in SQL. In XML enabled RDBMS, both relational data and XML documents can be managed in one system and queried using SQL/XML language. However, the use case of management of document centric XML is not well-addressed due to the lacking of full text query constructs in XQuery. Recently, XQuery Full Text (XQFT) becomes the W3C recommendation. In this paper, we show how XQFT can be supported efficiently in SQL/XML for full text search of XML documents managed by XML enabled RDBMS, such as Oracle XMLDB. We present architecture of a new XML Full Text Index, XQuery compile time and run time enhancements to efficiently support XQFT in SQL/XML. We present our design rationale on how to exploit Information Retrieval (IR) techniques for XQFT support in RDBMS. The new XML Full Text Index can index common XML physical storage forms: such as text XML, binary XML, relational decomposition of the XML. Although our work is built within Oracle XMLDB, all of the presented principles and techniques in this paper are valuable enough to RDBMS industry that needs to effectively and efficiently support of XQFT over persisted XML documents.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"66 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126069349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A tunable compression framework for bitmap indices 位图索引的可调压缩框架

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816675

Gheorghi Guzun, G. Canahuate, David Chiu, Jason Sawin

{"title":"A tunable compression framework for bitmap indices","authors":"Gheorghi Guzun, G. Canahuate, David Chiu, Jason Sawin","doi":"10.1109/ICDE.2014.6816675","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816675","url":null,"abstract":"Bitmap indices are widely used for large read-only repositories in data warehouses and scientific databases. Their binary representation allows for the use of bitwise operations and specialized run-length compression techniques. Due to a trade-off between compression and query efficiency, bitmap compression schemes are aligned using a fixed encoding length size (typically the word length) to avoid explicit decompression during query time. In general, smaller encoding lengths provide better compression, but require more decoding during query execution. However, when the difference in size is considerable, it is possible for smaller encodings to also provide better execution time. We posit that a tailored encoding length for each bit vector will provide better performance than a one-size-fits-all approach. We present a framework that optimizes compression and query efficiency by allowing bitmaps to be compressed using variable encoding lengths while still maintaining alignment to avoid explicit decompression. Efficient algorithms are introduced to process queries over bitmaps compressed using different encoding lengths. An input parameter controls the aggressiveness of the compression providing the user with the ability to tune the tradeoff between space and query time. Our empirical study shows this approach achieves significant improvements in terms of both query time and compression ratio for synthetic and real data sets. Compared to 32-bit WAH, VAL-WAH produces up to 1.8× smaller bitmaps and achieves query times that are 30% faster.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128742308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 59

A hybrid machine-crowdsourcing system for matching web tables 用于匹配web表的混合机器-众包系统

2014 IEEE 30th International Conference on Data Engineering Pub Date : 2014-05-19 DOI: 10.1109/ICDE.2014.6816716

Ju Fan, Meiyu Lu, B. Ooi, W. Tan, Meihui Zhang

{"title":"A hybrid machine-crowdsourcing system for matching web tables","authors":"Ju Fan, Meiyu Lu, B. Ooi, W. Tan, Meihui Zhang","doi":"10.1109/ICDE.2014.6816716","DOIUrl":"https://doi.org/10.1109/ICDE.2014.6816716","url":null,"abstract":"The Web is teeming with rich structured information in the form of HTML tables, which provides us with the opportunity to build a knowledge repository by integrating these tables. An essential problem of web data integration is to discover semantic correspondences between web table columns, and schema matching is a popular means to determine the semantic correspondences. However, conventional schema matching techniques are not always effective for web table matching due to the incompleteness in web tables. In this paper, we propose a two-pronged approach for web table matching that effectively addresses the above difficulties. First, we propose a concept-based approach that maps each column of a web table to the best concept, in a well-developed knowledge base, that represents it. This approach overcomes the problem that sometimes values of two web table columns may be disjoint, even though the columns are related, due to incompleteness in the column values. Second, we develop a hybrid machine-crowdsourcing framework that leverages human intelligence to discern the concepts for “difficult” columns. Our overall framework assigns the most “beneficial” column-to-concept matching tasks to the crowd under a given budget and utilizes the crowdsourcing result to help our algorithm infer the best matches for the rest of the columns. We validate the effectiveness of our framework through an extensive experimental study over two real-world web table data sets. The results show that our two-pronged approach outperforms existing schema matching techniques at only a low cost for crowdsourcing.","PeriodicalId":159130,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128429698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 110