N. Hubig, Linnea Passing, Maximilian E. Schüle, Dimitri Vorona, A. Kemper, Thomas Neumann
{"title":"HyPerInsight: Data Exploration Deep Inside HyPer","authors":"N. Hubig, Linnea Passing, Maximilian E. Schüle, Dimitri Vorona, A. Kemper, Thomas Neumann","doi":"10.1145/3132847.3133167","DOIUrl":"https://doi.org/10.1145/3132847.3133167","url":null,"abstract":"Nowadays we are drowning in data of various varieties. For all these mixed types and categories of data there exist even more different analysis approaches, often done in single hand-written solutions. We propose to extend HyPer, a main memory database system to a uniform data agent platform following the one system fits all approach for solving a wide variety of data analysis problems. We achieve this by applying a flexible operator concept to a set of various important data exploration algorithms. With that, HyPer solves analytical questions using clustering, classification, association rule mining and graph mining besides standard HTAP (Hybrid Transaction and Analytical Processing) workloads on the same database state. It enables to approach the full variety and volume of HTAP extended for data exploration (HTAPx), and only needs knowledge of already introduced SQL extensions that are automatically optimized by the database's standard optimizer. In this demo we will focus on the benefits and flexibility we create by using the SQL extensions for several well-known mining workloads. In our interactive webinterface for this project named HyPerInsight we demonstrate how HyPer outperforms the best open source competitor Apache Spark in common use cases in social media, geo-data, recommender systems and several other.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72678469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Kharlamov, Luca Giacomelli, Evgeny Sherkhonov, B. C. Grau, Egor V. Kostylev, Ian Horrocks
{"title":"SemFacet: Making Hard Faceted Search Easier","authors":"E. Kharlamov, Luca Giacomelli, Evgeny Sherkhonov, B. C. Grau, Egor V. Kostylev, Ian Horrocks","doi":"10.1145/3132847.3133192","DOIUrl":"https://doi.org/10.1145/3132847.3133192","url":null,"abstract":"Faceted search is a prominent search paradigm that became the standard in many Web applications and has also been recently proposed as a suitable paradigm for exploring and querying RDF graphs. One of the main challenges that hampers usability of faceted search systems especially in the RDF context is information overload, that is, when the size of faceted interfaces becomes comparable to the size of the data over which the search is performed. In this demo we present (an extension of) our faceted search system SemFacet and focus on features that address the information overload: ranking, aggregation, and reachability. The demo attendees will be able to try our system on an RDF graph that models online shopping over a catalogs with up to millions of products.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"80 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75110066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Matrix-Vector Recurrent Unit Model for Capturing Compositional Semantics in Phrase Embeddings","authors":"Rui Wang, Wei Liu, C. McDonald","doi":"10.1145/3132847.3132984","DOIUrl":"https://doi.org/10.1145/3132847.3132984","url":null,"abstract":"The meaning of a multi-word phrase not only depends on the meaning of its constituent words, but also the rules of composing them to give the so-called compositional semantic. However, many deep learning models for learning compositional semantics target specific NLP tasks such as sentiment classification. Consequently, the word embeddings encode the lexical semantics, the weights of the networks are optimised for the classification task. Such models have no mechanisms to explicitly encode the compositional rules, and hence they are insufficient in capturing the semantics of phrases. We present a novel recurrent computational mechanism that specifically learns the compositionality by encoding the compositional rule of each word into a matrix. The network uses a recurrent architecture to capture the order of words for phrases with various lengths without requiring extra preprocessing such as part-of-speech tagging. The model is thoroughly evaluated on both supervised and unsupervised NLP tasks including phrase similarity, noun-modifier questions, sentiment distribution prediction, and domain specific term identification tasks. We demonstrate that our model consistently outperforms the LSTM and CNN deep learning models, simple algebraic compositions, and other popular baselines on different datasets.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74656988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Boolean Matrix Decomposition by Formal Concept Sampling","authors":"P. Osicka, Martin Trnecka","doi":"10.1145/3132847.3133054","DOIUrl":"https://doi.org/10.1145/3132847.3133054","url":null,"abstract":"Finding interesting patterns is a classical problem in data mining. Boolean matrix decomposition is nowadays a standard tool that can find a set of patterns-also called factors-in Boolean data that explain the data well. We describe and experimentally evaluate a probabilistic algorithm for Boolean matrix decomposition problem. The algorithm is derived from GreCon algorithm which uses formal concepts-maximal rectangles or tiles-as factors in order to find a decomposition. We change the core of GreCon by substituting a sampling procedure for a deterministic computation of suitable formal concepts. This allows us to alleviate the greedy nature of GreCon, creates a possibility to bypass some of the its pitfalls and to preserve its features, e.g. an ability to explain the entire data.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"29 11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77515913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Theodore Vasiloudis, F. Beligianni, G. D. F. Morales
{"title":"BoostVHT: Boosting Distributed Streaming Decision Trees","authors":"Theodore Vasiloudis, F. Beligianni, G. D. F. Morales","doi":"10.1145/3132847.3132974","DOIUrl":"https://doi.org/10.1145/3132847.3132974","url":null,"abstract":"Online boosting improves the accuracy of classifiers for unbounded streams of data by chaining them into an ensemble. Due to its sequential nature, boosting has proven hard to parallelize, even more so in the online setting. This paper introduces BoostVHT, a technique to parallelize online boosting algorithms. Our proposal leverages a recently-developed model-parallel learning algorithm for streaming decision trees as a base learner. This design allows to neatly separate the model boosting from its training. As a result, BoostVHT provides a flexible learning framework which can employ any existing online boosting algorithm, while at the same time it can leverage the computing power of modern parallel and distributed cluster environments. We implement our technique on Apache SAMOA, an open-source platform for mining big data streams that can be run on several distributed execution engines, and demonstrate order of magnitude speedups compared to the state-of-the-art.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79077418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Knowledge Embeddings by Combining Limit-based Scoring Loss","authors":"Xiaofei Zhou, Qiannan Zhu, Ping Liu, Li Guo","doi":"10.1145/3132847.3132939","DOIUrl":"https://doi.org/10.1145/3132847.3132939","url":null,"abstract":"In knowledge graph embedding models, the margin-based ranking loss as the common loss function is usually used to encourage discrimination between golden triplets and incorrect triplets, which has proved effective in many translation-based models for knowledge graph embedding. However, we find that the loss function cannot ensure the fact that the scoring of correct triplets must be low enough to fulfill the translation. In this paper, we present a limit-based scoring loss to provide lower scoring of a golden triplet, and then to extend two basic translation models TransE and TransH, separately to TransE-RS and TransH-RS by combining limit-based scoring loss with margin-based ranking loss. Both the presented models have low complexities of parameters benefiting for application on large scale graphs. In experiments, we evaluate our models on two typical tasks including triplet classification and link prediction, and also analyze the scoring distributions of positive and negative triplets by different models. Experimental results show that the introduced limit-based scoring loss is effective to improve the capacities of knowledge graph embedding.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81828128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maintaining Densest Subsets Efficiently in Evolving Hypergraphs","authors":"Shuguang Hu, Xiaowei Wu, T-H. Hubert Chan","doi":"10.1145/3132847.3132907","DOIUrl":"https://doi.org/10.1145/3132847.3132907","url":null,"abstract":"In this paper we study the densest subgraph problem, which plays a key role in many graph mining applications. The goal of the problem is to find a subset of nodes that induces a graph with maximum average degree. The problem has been extensively studied in the past few decades under a variety of different settings. Several exact and approximation algorithms were proposed. However, as normal graph can only model objects with pairwise relationships, the densest subgraph problem fails in identifying communities under relationships that involve more than 2 objects, e.g., in a network connecting authors by publications. We consider in this work the densest subgraph problem in hypergraphs, which generalizes the problem to a wider class of networks in which edges might have different cardinalities and contain more than 2 nodes. We present two exact algorithms and a near-linear time r-approximation algorithm for the problem, where r is the maximum cardinality of an edge in the hypergraph. We also consider the dynamic version of the problem, in which an adversary can insert or delete an edge from the hypergraph in each round and the goal is to maintain efficiently an approximation of the densest subgraph. We present two dynamic approximation algorithms in this paper with amortized polog update time, for any ε > 0. For the case when there are only insertions, the approximation ratio we maintain is r(1+ε), while for the fully dynamic case, the ratio is r2(1+ε). Extensive experiments are performed on large real datasets to validate the effectiveness and efficiency of our algorithms.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81361861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Neural Collaborative Filtering Model with Interaction-based Neighborhood","authors":"Ting Bai, Ji-Rong Wen, Jun Zhang, Wayne Xin Zhao","doi":"10.1145/3132847.3133083","DOIUrl":"https://doi.org/10.1145/3132847.3133083","url":null,"abstract":"Recently, deep neural networks have been widely applied to recommender systems. A representative work is to utilize deep learning for modeling complex user-item interactions. However, similar to traditional latent factor models by factorizing user-item interactions, they tend to be ineffective to capture localized information. Localized information, such as neighborhood, is important to recommender systems in complementing the user-item interaction data. Based on this consideration, we propose a novel Neighborhood-based Neural Collaborative Filtering model (NNCF). To the best of our knowledge, it is the first time that the neighborhood information is integrated into the neural collaborative filtering methods. Extensive experiments on three real-world datasets demonstrate the effectiveness of our model for the implicit recommendation task.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90321861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiyang Li, Guanyu Tao, Weinan Zhang, Yong Yu, Jun Wang
{"title":"Content Recommendation by Noise Contrastive Transfer Learning of Feature Representation","authors":"Yiyang Li, Guanyu Tao, Weinan Zhang, Yong Yu, Jun Wang","doi":"10.1145/3132847.3132855","DOIUrl":"https://doi.org/10.1145/3132847.3132855","url":null,"abstract":"Personalized recommendation has been proved effective as a content discovery tool for many online news publishers. As fresh news articles are frequently coming to the system while the old ones are fading away quickly, building a consistent and coherent feature representation over the ever-changing articles pool is fundamental to the performance of the recommendation. However, learning a good feature representation is challenging, especially for some small publishers that have normally fewer than 10,000 articles each year. In this paper, we consider to transfer knowledge from a larger text corpus. In our proposed solution, an effective article recommendation engine can be established with a small number of target publisher articles by transferring knowledge from a large corpus of text with a different distribution. Specifically, we leverage noise contrastive estimation techniques to learn the word conditional distribution given the context words, where the noise conditional distribution is pre-trained from the large corpus. Our solution has been deployed in a commercial recommendation service. The large-scale online A/B testing on two commercial publishers demonstrates up to 9.97% relative overall performance gain of our proposed model on the recommendation click-though rate metric over the non-transfer learning baselines.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82199554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Robust Named-Entity Recognition System Using Syllable Bigram Embedding with Eojeol Prefix Information","authors":"Sunjae Kwon, Youngjoong Ko, Jungyun Seo","doi":"10.1145/3132847.3133105","DOIUrl":"https://doi.org/10.1145/3132847.3133105","url":null,"abstract":"Korean named-entity recognition (NER) systems have been developed mainly on the morphological-level, and they are commonly based on a pipeline framework that identifies named-entities (NEs) following the morphological analysis. However, this framework can mean that the performance of NER systems is degraded, because errors from the morphological analysis propagate into NER systems. This paper proposes a novel syllable-level NER system, which does not require a morphological analysis and can achieve a similar or better performance compared with the morphological-level NER systems. In addition, because the proposed system does not require a morphological analysis step, its processing speed is about 1.9 times faster than those of the previous morphological-level NER systems.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"71 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86285267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}