N. Hubig, Linnea Passing, Maximilian E. Schüle, Dimitri Vorona, A. Kemper, Thomas Neumann
{"title":"HyPerInsight: Data Exploration Deep Inside HyPer","authors":"N. Hubig, Linnea Passing, Maximilian E. Schüle, Dimitri Vorona, A. Kemper, Thomas Neumann","doi":"10.1145/3132847.3133167","DOIUrl":"https://doi.org/10.1145/3132847.3133167","url":null,"abstract":"Nowadays we are drowning in data of various varieties. For all these mixed types and categories of data there exist even more different analysis approaches, often done in single hand-written solutions. We propose to extend HyPer, a main memory database system to a uniform data agent platform following the one system fits all approach for solving a wide variety of data analysis problems. We achieve this by applying a flexible operator concept to a set of various important data exploration algorithms. With that, HyPer solves analytical questions using clustering, classification, association rule mining and graph mining besides standard HTAP (Hybrid Transaction and Analytical Processing) workloads on the same database state. It enables to approach the full variety and volume of HTAP extended for data exploration (HTAPx), and only needs knowledge of already introduced SQL extensions that are automatically optimized by the database's standard optimizer. In this demo we will focus on the benefits and flexibility we create by using the SQL extensions for several well-known mining workloads. In our interactive webinterface for this project named HyPerInsight we demonstrate how HyPer outperforms the best open source competitor Apache Spark in common use cases in social media, geo-data, recommender systems and several other.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72678469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Boolean Matrix Decomposition by Formal Concept Sampling","authors":"P. Osicka, Martin Trnecka","doi":"10.1145/3132847.3133054","DOIUrl":"https://doi.org/10.1145/3132847.3133054","url":null,"abstract":"Finding interesting patterns is a classical problem in data mining. Boolean matrix decomposition is nowadays a standard tool that can find a set of patterns-also called factors-in Boolean data that explain the data well. We describe and experimentally evaluate a probabilistic algorithm for Boolean matrix decomposition problem. The algorithm is derived from GreCon algorithm which uses formal concepts-maximal rectangles or tiles-as factors in order to find a decomposition. We change the core of GreCon by substituting a sampling procedure for a deterministic computation of suitable formal concepts. This allows us to alleviate the greedy nature of GreCon, creates a possibility to bypass some of the its pitfalls and to preserve its features, e.g. an ability to explain the entire data.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"29 11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77515913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Enhanced Topic Modeling Approach to Multiple Stance Identification","authors":"Junjie Lin, W. Mao, Yuhao Zhang","doi":"10.1145/3132847.3133145","DOIUrl":"https://doi.org/10.1145/3132847.3133145","url":null,"abstract":"People often publish online texts to express their stances, which reflect the essential viewpoints they stand. Stance identification has been an important research topic in text analysis and facilitates many applications in business, public security and government decision making. Previous work on stance identification solely focuses on classifying the supportive or unsupportive attitude towards a certain topic/entity. The other important type of stance identification, multiple stance identification, was largely ignored in previous research. In contrast, multiple stance identification focuses on identifying different standpoints of multiple parties involved in online texts. In this paper, we address the problem of recognizing distinct standpoints implied in textual data. As people are inclined to discuss the topics favorable to their standpoints, topics thus can provide distinguishable information of different standpoints. We propose a topic-based method for standpoint identification. To acquire more distinguishable topics, we further enhance topic model by adding constraints on document-topic distributions. We finally conduct experimental studies on two real datasets to verify the effectiveness of our approach to multiple stance identification.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"46 29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78785308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding Database Performance Inefficiencies in Real-world Web Applications","authors":"Cong Yan, Alvin Cheung, Junwen Yang, Shan Lu","doi":"10.1145/3132847.3132954","DOIUrl":"https://doi.org/10.1145/3132847.3132954","url":null,"abstract":"Many modern database-backed web applications are built upon Object Relational Mapping (ORM) frameworks. While such frame- works ease application development by abstracting persistent data as objects, such convenience comes with a performance cost. In this paper, we studied 27 real-world open-source applications built on top of the popular Ruby on Rails ORM framework, with the goal to understand the database-related performance inefficiencies in these applications. We discovered a number of inefficiencies rang- ing from physical design issues to how queries are expressed in the application code. We applied static program analysis to identify and measure how prevalent these issues are, then suggested techniques to alleviate these issues and measured the potential performance gain as a result. These techniques significantly reduce database query time (up to 91%) and the webpage response time (up to 98%). Our study provides guidance to the design of future database engines and ORM frameworks to support database application that are performant yet without sacrificing programmability.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"84 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76799483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junxing Zhu, Jiawei Zhang, Lifang He, Quanyuan Wu, Bin Zhou, Chenwei Zhang, Philip S. Yu
{"title":"Broad Learning based Multi-Source Collaborative Recommendation","authors":"Junxing Zhu, Jiawei Zhang, Lifang He, Quanyuan Wu, Bin Zhou, Chenwei Zhang, Philip S. Yu","doi":"10.1145/3132847.3132976","DOIUrl":"https://doi.org/10.1145/3132847.3132976","url":null,"abstract":"Anchor links connect information entities, such as entities of movies or products, across networks from different sources, and thus information in these networks can be transferred directly via anchor links. Therefore, anchor links have great value to many cross-network applications, such as cross-network social link prediction and cross-network recommendation. In this paper, we focus on studying the recommendation problem that can provide ratings of items or services. To address the problem, we propose a Cross-network Collaborative Matrix Factorization (CCMF) recommendation framework based on broad learning setting, which can effectively integrate multi-source information and alleviate the sparse information problem in each individual network. Based on item anchor links CCMF can fuse item similarity information and item latent information across networks from different sources. And different from most of the traditional works, CCMF can make multi-source recommendation tasks collaborate together via the information transfer based on the broad learning setting. During the transfer process, a novel cross-network similarity transfer method is applied to keep the consistency of item similarities between two different networks, and a domain adaptation matrix is used to overcome the domain difference problem. We conduct experiments to compare the proposed CCMF method with both classic and state-of-the-art recommendation techniques. The experimental results illustrate that CCMF outperforms other methods in different experimental circumstances, and has great advantages on dealing with different data sparse problems.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85643842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting Records from the Web Using a Signal Processing Approach","authors":"R. P. Velloso, C. Dorneles","doi":"10.1145/3132847.3132875","DOIUrl":"https://doi.org/10.1145/3132847.3132875","url":null,"abstract":"Extracting records from web pages enables a number of important applications and has immense value due to the amount and diversity of available information that can be extracted. This problem, although vastly studied, remains open because it is not a trivial one. Due to the scale of data, a feasible approach must be both automatic and efficient (and of course effective). We present here a novel approach, fully automatic and computationally efficient, using signal processing techniques to detect regularities and patterns in the structure of web pages. Our approach segments the web page, detects the data regions within it, identifies the records boundaries and aligns the records. Results show high f-score and linearithmic time complexity behaviour.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85967110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonia Saravanou, I. Katakis, G. Valkanas, V. Kalogeraki, D. Gunopulos
{"title":"Revealing the Hidden Links in Content Networks: An Application to Event Discovery","authors":"Antonia Saravanou, I. Katakis, G. Valkanas, V. Kalogeraki, D. Gunopulos","doi":"10.1145/3132847.3133148","DOIUrl":"https://doi.org/10.1145/3132847.3133148","url":null,"abstract":"Social networks have become the de facto online resource for people to share, comment on and be informed about events pertinent to their interests and livelihood, ranging from road traffic or an illness to concerts and earthquakes, to economics and politics. This has been the driving force behind research endeavors that analyse such data. In this paper, we focus on how Content Networks can help us identify events effectively. Content Networks incorporate both structural and content-related information of a social network in a unified way, at the same time, bringing together two disparate lines of research: graph-based and content-based event discovery in social media. We model interactions of two types of nodes, users and content, and introduce an algorithm that builds heterogeneous, dynamic graphs, in addition to revealing content links in the network's structure. By linking similar content nodes and tracking connected components over time, we can effectively identify different types of events. Our evaluation on social media streaming data suggests that our approach outperforms state-of-the-art techniques, while showcasing the significance of hidden links to the quality of the results.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84993552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge-based Question Answering by Jointly Generating, Copying and Paraphrasing","authors":"Shuguang Zhu, Xiang Cheng, Sen Su, S. Lang","doi":"10.1145/3132847.3133064","DOIUrl":"https://doi.org/10.1145/3132847.3133064","url":null,"abstract":"With the development of large-scale knowledge bases, people are building systems which give simple answers to questions based on consolidate facts. In this paper, we focus on simple questions, which ask about only a subject and relation in the knowledge base. Observing that certain parts of a question usually overlap with names of its corresponding subject and relation in the knowledge base, we argue that a question is formed by a mixture of copying and generation. To model that, we propose a sequence-to-sequence (seq2seq) architecture which encodes a candidate subject-relation pair and decodes it into the given question, where the decoding probability is used to select the best candidate. In our decoder, the copying mode points the subject or relation and duplicates its name, while the generating mode summarizes the meaning of the subject-relation pair and produces a word to smooth the question. Realizing that although sometimes a subject or relation is pointed, different names or keywords might be used, we also incorporate a paraphrasing mode to supplement the copying mode using an automatically mined lexicon. Extensive experiments on the largest dataset exhibit our better performance compared with the state-of-the-art methods.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82542172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sergey Paramonov, Samuel Kolb, Tias Guns, L. D. Raedt
{"title":"TaCLe: Learning Constraints in Tabular Data","authors":"Sergey Paramonov, Samuel Kolb, Tias Guns, L. D. Raedt","doi":"10.1145/3132847.3133193","DOIUrl":"https://doi.org/10.1145/3132847.3133193","url":null,"abstract":"Spreadsheet data is widely used today by many different people and across industries. However, writing, maintaining and identifying good formulae for spreadsheets can be time consuming and error-prone. To address this issue we have introduced the TaCLe system (Tabular Constraint Learner). The system tackles an inverse learning problem: given a plain comma separated file, it reconstructs the spreadsheet formulae that hold in the tables. Two important considerations are the number of cells and constraints to check, and how to deal with multiple formulae for the same cell. Our system reasons over entire rows and columns and has an intuitive user interface for interacting with the learned constraints and data. It can be seen as an intelligent assistance tool for discovering formulae from data. As a result, the user obtains a spreadsheet that can automatically recompute dependent cells when updating or adding data.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88037067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"When Deep Learning Meets Transfer Learning","authors":"Qiang Yang","doi":"10.1145/3132847.3137175","DOIUrl":"https://doi.org/10.1145/3132847.3137175","url":null,"abstract":"Deep learning has achieved great success as evidenced by many practical applications and contests. However, deep learning developed so far also has some inherent limitations. In particular, deep learning is not yet very adaptable to different related domains and cannot handle small data. In this talk, I will give an overview of how transfer learning can help alleviate these problems. In particular, I will present some recent progress on integrating deep learning and transfer learning together and show some interesting applications in sentiment analysis, image processing and urban computing.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"305 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91456292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}