Proceedings of the 2017 ACM on Conference on Information and Knowledge Management最新文献_第7页

HyPerInsight: Data Exploration Deep Inside HyPer HyPerInsight: HyPerInsight内部的数据探索

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3133167

N. Hubig, Linnea Passing, Maximilian E. Schüle, Dimitri Vorona, A. Kemper, Thomas Neumann

{"title":"HyPerInsight: Data Exploration Deep Inside HyPer","authors":"N. Hubig, Linnea Passing, Maximilian E. Schüle, Dimitri Vorona, A. Kemper, Thomas Neumann","doi":"10.1145/3132847.3133167","DOIUrl":"https://doi.org/10.1145/3132847.3133167","url":null,"abstract":"Nowadays we are drowning in data of various varieties. For all these mixed types and categories of data there exist even more different analysis approaches, often done in single hand-written solutions. We propose to extend HyPer, a main memory database system to a uniform data agent platform following the one system fits all approach for solving a wide variety of data analysis problems. We achieve this by applying a flexible operator concept to a set of various important data exploration algorithms. With that, HyPer solves analytical questions using clustering, classification, association rule mining and graph mining besides standard HTAP (Hybrid Transaction and Analytical Processing) workloads on the same database state. It enables to approach the full variety and volume of HTAP extended for data exploration (HTAPx), and only needs knowledge of already introduced SQL extensions that are automatically optimized by the database's standard optimizer. In this demo we will focus on the benefits and flexibility we create by using the SQL extensions for several well-known mining workloads. In our interactive webinterface for this project named HyPerInsight we demonstrate how HyPer outperforms the best open source competitor Apache Spark in common use cases in social media, geo-data, recommender systems and several other.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72678469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

SemFacet: Making Hard Faceted Search Easier SemFacet:使硬面搜索更容易

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3133192

E. Kharlamov, Luca Giacomelli, Evgeny Sherkhonov, B. C. Grau, Egor V. Kostylev, Ian Horrocks

引用次数: 20

A Matrix-Vector Recurrent Unit Model for Capturing Compositional Semantics in Phrase Embeddings 基于矩阵-向量循环单元模型的短语嵌入组合语义捕获

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3132984

Rui Wang, Wei Liu, C. McDonald

{"title":"A Matrix-Vector Recurrent Unit Model for Capturing Compositional Semantics in Phrase Embeddings","authors":"Rui Wang, Wei Liu, C. McDonald","doi":"10.1145/3132847.3132984","DOIUrl":"https://doi.org/10.1145/3132847.3132984","url":null,"abstract":"The meaning of a multi-word phrase not only depends on the meaning of its constituent words, but also the rules of composing them to give the so-called compositional semantic. However, many deep learning models for learning compositional semantics target specific NLP tasks such as sentiment classification. Consequently, the word embeddings encode the lexical semantics, the weights of the networks are optimised for the classification task. Such models have no mechanisms to explicitly encode the compositional rules, and hence they are insufficient in capturing the semantics of phrases. We present a novel recurrent computational mechanism that specifically learns the compositionality by encoding the compositional rule of each word into a matrix. The network uses a recurrent architecture to capture the order of words for phrases with various lengths without requiring extra preprocessing such as part-of-speech tagging. The model is thoroughly evaluated on both supervised and unsupervised NLP tasks including phrase similarity, noun-modifier questions, sentiment distribution prediction, and domain specific term identification tasks. We demonstrate that our model consistently outperforms the LSTM and CNN deep learning models, simple algebraic compositions, and other popular baselines on different datasets.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74656988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Boolean Matrix Decomposition by Formal Concept Sampling 布尔矩阵的形式概念抽样分解

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3133054

P. Osicka, Martin Trnecka

引用次数: 2

BoostVHT: Boosting Distributed Streaming Decision Trees BoostVHT:增强分布式流决策树

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3132974

Theodore Vasiloudis, F. Beligianni, G. D. F. Morales

引用次数: 8

Learning Knowledge Embeddings by Combining Limit-based Scoring Loss 结合基于极限的评分损失学习知识嵌入

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3132939

Xiaofei Zhou, Qiannan Zhu, Ping Liu, Li Guo

{"title":"Learning Knowledge Embeddings by Combining Limit-based Scoring Loss","authors":"Xiaofei Zhou, Qiannan Zhu, Ping Liu, Li Guo","doi":"10.1145/3132847.3132939","DOIUrl":"https://doi.org/10.1145/3132847.3132939","url":null,"abstract":"In knowledge graph embedding models, the margin-based ranking loss as the common loss function is usually used to encourage discrimination between golden triplets and incorrect triplets, which has proved effective in many translation-based models for knowledge graph embedding. However, we find that the loss function cannot ensure the fact that the scoring of correct triplets must be low enough to fulfill the translation. In this paper, we present a limit-based scoring loss to provide lower scoring of a golden triplet, and then to extend two basic translation models TransE and TransH, separately to TransE-RS and TransH-RS by combining limit-based scoring loss with margin-based ranking loss. Both the presented models have low complexities of parameters benefiting for application on large scale graphs. In experiments, we evaluate our models on two typical tasks including triplet classification and link prediction, and also analyze the scoring distributions of positive and negative triplets by different models. Experimental results show that the introduced limit-based scoring loss is effective to improve the capacities of knowledge graph embedding.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81828128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37

Maintaining Densest Subsets Efficiently in Evolving Hypergraphs 演化超图中最密集子集的有效维护

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3132907

Shuguang Hu, Xiaowei Wu, T-H. Hubert Chan

{"title":"Maintaining Densest Subsets Efficiently in Evolving Hypergraphs","authors":"Shuguang Hu, Xiaowei Wu, T-H. Hubert Chan","doi":"10.1145/3132847.3132907","DOIUrl":"https://doi.org/10.1145/3132847.3132907","url":null,"abstract":"In this paper we study the densest subgraph problem, which plays a key role in many graph mining applications. The goal of the problem is to find a subset of nodes that induces a graph with maximum average degree. The problem has been extensively studied in the past few decades under a variety of different settings. Several exact and approximation algorithms were proposed. However, as normal graph can only model objects with pairwise relationships, the densest subgraph problem fails in identifying communities under relationships that involve more than 2 objects, e.g., in a network connecting authors by publications. We consider in this work the densest subgraph problem in hypergraphs, which generalizes the problem to a wider class of networks in which edges might have different cardinalities and contain more than 2 nodes. We present two exact algorithms and a near-linear time r-approximation algorithm for the problem, where r is the maximum cardinality of an edge in the hypergraph. We also consider the dynamic version of the problem, in which an adversary can insert or delete an edge from the hypergraph in each round and the goal is to maintain efficiently an approximation of the densest subgraph. We present two dynamic approximation algorithms in this paper with amortized polog update time, for any ε > 0. For the case when there are only insertions, the approximation ratio we maintain is r(1+ε), while for the fully dynamic case, the ratio is r2(1+ε). Extensive experiments are performed on large real datasets to validate the effectiveness and efficiency of our algorithms.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81361861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

A Neural Collaborative Filtering Model with Interaction-based Neighborhood 基于交互邻域的神经协同过滤模型

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3133083

Ting Bai, Ji-Rong Wen, Jun Zhang, Wayne Xin Zhao

引用次数: 93

Content Recommendation by Noise Contrastive Transfer Learning of Feature Representation 基于特征表示的噪声对比迁移学习的内容推荐

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3132855

Yiyang Li, Guanyu Tao, Weinan Zhang, Yong Yu, Jun Wang

{"title":"Content Recommendation by Noise Contrastive Transfer Learning of Feature Representation","authors":"Yiyang Li, Guanyu Tao, Weinan Zhang, Yong Yu, Jun Wang","doi":"10.1145/3132847.3132855","DOIUrl":"https://doi.org/10.1145/3132847.3132855","url":null,"abstract":"Personalized recommendation has been proved effective as a content discovery tool for many online news publishers. As fresh news articles are frequently coming to the system while the old ones are fading away quickly, building a consistent and coherent feature representation over the ever-changing articles pool is fundamental to the performance of the recommendation. However, learning a good feature representation is challenging, especially for some small publishers that have normally fewer than 10,000 articles each year. In this paper, we consider to transfer knowledge from a larger text corpus. In our proposed solution, an effective article recommendation engine can be established with a small number of target publisher articles by transferring knowledge from a large corpus of text with a different distribution. Specifically, we leverage noise contrastive estimation techniques to learn the word conditional distribution given the context words, where the noise conditional distribution is pre-trained from the large corpus. Our solution has been deployed in a commercial recommendation service. The large-scale online A/B testing on two commercial publishers demonstrates up to 9.97% relative overall performance gain of our proposed model on the recommendation click-though rate metric over the non-transfer learning baselines.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82199554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A Robust Named-Entity Recognition System Using Syllable Bigram Embedding with Eojeol Prefix Information 一种基于音节重图嵌入词形前缀信息的鲁棒命名实体识别系统

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management Pub Date : 2017-11-06 DOI: 10.1145/3132847.3133105

Sunjae Kwon, Youngjoong Ko, Jungyun Seo

引用次数: 4