E. Kharlamov, Ognjen Savkovic, Guohui Xiao, R. Peñaloza, G. Mehdi, M. Roshchin, Ian Horrocks
{"title":"Semantic Rules for Machine Diagnostics: Execution and Management","authors":"E. Kharlamov, Ognjen Savkovic, Guohui Xiao, R. Peñaloza, G. Mehdi, M. Roshchin, Ian Horrocks","doi":"10.1145/3132847.3133159","DOIUrl":"https://doi.org/10.1145/3132847.3133159","url":null,"abstract":"Rule-based diagnostics of equipment is an important task in industry. In this paper we present how semantic technologies can enhance diagnostics. In particular, we present our semantic rule language sigRL that is inspired by the real diagnostic languages used in Siemens. SigRL allows to write compact yet powerful diagnostic programs by relying on a high level data independent vocabulary, diagnostic ontologies, and queries over these ontologies. We study computational complexity of SigRL: execution of diagnostic programs, provenance computation, as well as automatic verification of redundancy and inconsistency in diagnostic programs.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"81 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77122661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huandong Wang, Chen Gao, Yong Li, Zhi-Li Zhang, Depeng Jin
{"title":"From Fingerprint to Footprint: Revealing Physical World Privacy Leakage by Cyberspace Cookie Logs","authors":"Huandong Wang, Chen Gao, Yong Li, Zhi-Li Zhang, Depeng Jin","doi":"10.1145/3132847.3132998","DOIUrl":"https://doi.org/10.1145/3132847.3132998","url":null,"abstract":"It is well-known that online services resort to various cookies to track users through users' online service identifiers (IDs) - in other words, when users access online services, various \"fingerprints\" are left behind in the cyberspace. As they roam around in the physical world while accessing online services via mobile devices, users also leave a series of \"footprints\" -- i.e., hints about their physical locations - in the physical world. This poses a potent new threat to user privacy: one can potentially correlate the \"fingerprints\" left by the users in the cyberspace with \"footprints\" left in the physical world to infer and reveal leakage of user physical world privacy, such as frequent user locations or mobility trajectories in the physical world - we refer to this problem as user physical world privacy leakage via user cyberspace privacy leakage. In this paper we address the following fundamental question: what kind - and how much - of user physical world privacy might be leaked if we could get hold of such diverse network datasets even without any physical location information. In order to conduct an in-depth investigation of these questions, we utilize the network data collected via a DPI system at the routers within one of the largest Internet operator in Shanghai, China over a duration of one month. We decompose the fundamental question into the three problems: i) linkage of various online user IDs belonging to the same person via mobility pattern mining; ii) physical location classification via aggregate user mobility patterns over time; and iii) tracking user physical mobility. By developing novel and effective methods for solving each of these problems, we demonstrate that the question of user physical world privacy leakage via user cyberspace privacy leakage is not hypothetical, but indeed poses a real potent threat to user privacy.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"137 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76772906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siyuan Wu, Leong Hou U, S. Bhowmick, Wolfgang Gatterbauer
{"title":"Conflict of Interest Declaration and Detection System in Heterogeneous Networks","authors":"Siyuan Wu, Leong Hou U, S. Bhowmick, Wolfgang Gatterbauer","doi":"10.1145/3132847.3133134","DOIUrl":"https://doi.org/10.1145/3132847.3133134","url":null,"abstract":"Peer review is the most critical process in evaluating an article to be accepted for publication in an academic venue. When assigning a reviewer to evaluate an article, the assignment should be aware of conflicts of interest (COIs) such that the reviews are fair to everyone. However, existing conference management systems simply ask reviewers and authors to declare their explicit COIs through a plain search user interface guided by some simple conflict rules. We argue that such declaration system is not enough to discover all latent COI cases. In this work, we study a graphical declaration system that visualizes the relationships of authors and reviewers based on a heterogeneous co-authorship network. With the help of the declarations, we attempt to detect the latent COIs automatically based on the meta-paths of a heterogeneous network.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"301 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79725385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical Module Classification in Mixed-initiative Conversational Agent System","authors":"Sia Xin Yun Suzanna, A. Li","doi":"10.1145/3132847.3133185","DOIUrl":"https://doi.org/10.1145/3132847.3133185","url":null,"abstract":"Our operational context is a task-oriented dialog system where no single module satisfactorily addresses the range of conversational queries from humans. Such systems must be equipped with a range of technologies to address semantic, factual, task-oriented, open domain conversations using rule-based, semantic-web, traditional machine learning and deep learning. This raises two key challenges. First, the modules need to be managed and selected appropriately. Second, the complexity of troubleshooting on such systems is high. We address these challenges with a mixed-initiative model that controls conversational logic through hierarchical classification. We also developed an interface to increase interpretability for operators and to aggregate module performance.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79934227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Algorithms for Pareto Optimal Group-based Skyline","authors":"Wenhui Yu, Zheng Qin, Jinfei Liu, Li Xiong, Xu Chen, Huidi Zhang","doi":"10.1145/3132847.3132950","DOIUrl":"https://doi.org/10.1145/3132847.3132950","url":null,"abstract":"Skyline, aiming at finding a Pareto optimal subset of points in a multi-dimensional dataset, has gained great interest due to its extensive use for multi-criteria analysis and decision making. Skyline consists of all points that are not dominated by, or not worse than other points. It is a candidate set of optimal solution, which depends on a specific evaluation criterion for optimum. However, conventional skyline queries, which return individual points, are inadequate in group querying case since optimal combinations are required. To address this gap, we study the skyline computation in group case and propose fast methods to find the group-based skyline (G-skyline), which contains Pareto optimal groups. For computing the front k skyline layers, we lay out an efficient approach that does the search concurrently on each dimension and investigates each point in subspace. After that, we present a novel structure to construct the G-skyline with a queue of combinations of the first-layer points. Experimental results show that our algorithms are several orders of magnitude faster than the previous work.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86917147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linking News across Multiple Streams for Timeliness Analysis","authors":"I. Mele, Seyed Ali Bahrainian, F. Crestani","doi":"10.1145/3132847.3132988","DOIUrl":"https://doi.org/10.1145/3132847.3132988","url":null,"abstract":"Linking multiple news streams based on the reported events and analyzing the streams' temporal publishing patterns are two very important tasks for information analysis, discovering newsworthy stories, studying the event evolution, and detecting untrustworthy sources of information. In this paper, we propose techniques for cross-linking news streams based on the reported events with the purpose of analyzing the temporal dependencies among streams. Our research tackles two main issues: (1) how news streams are connected as reporting an event or the evolution of the same event and (2) how timely the newswires report related events using different publishing platforms. Our approach is based on dynamic topic modeling for detecting and tracking events over the timeline and on clustering news according to the events. We leverage the event-based clustering to link news across different streams and present two scoring functions for ranking the streams based on their timeliness in publishing news about a specific event.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83671755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Kruse, David Hahn, Marius Walter, Felix Naumann
{"title":"Metacrate: Organize and Analyze Millions of Data Profiles","authors":"Sebastian Kruse, David Hahn, Marius Walter, Felix Naumann","doi":"10.1145/3132847.3133180","DOIUrl":"https://doi.org/10.1145/3132847.3133180","url":null,"abstract":"Databases are one of the great success stories in IT. However, they have been continuously increasing in complexity, hampering operation, maintenance, and upgrades. To face this complexity, sophisticated methods for schema summarization, data cleaning, information integration, and many more have been devised that usually rely on data profiles, such as data statistics, signatures, and integrity constraints. Such data profiles are often extracted by automatic algorithms, which entails various problems: The profiles can be unfiltered and huge in volume; different profile types require different complex data structures; and the various profile types are not integrated with each other. We introduce Metacrate, a system to store, organize, and analyze data profiles of relational databases, thereby following the proven design of databases. In particular, we (i) propose a logical and a physical data model to store all kinds of data profiles in a scalable fashion; (ii) describe an analytics layer to query, integrate, and analyze the profiles efficiently; and (iii) implement on top a library of established algorithms to serve use cases, such as schema discovery, database refactoring, and data cleaning.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87698284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computing Betweenness Centrality in B-hypergraphs","authors":"Kwang Hee Lee, Myoung-Ho Kim","doi":"10.1145/3132847.3133093","DOIUrl":"https://doi.org/10.1145/3132847.3133093","url":null,"abstract":"The directed hypergraph (especially B-hypergraph) has hyperedges that represent relations of a set of source nodes to a single target node. Author-cited networks and cellular signaling pathways can be modeled as a B-hypergraph. In this paper every source node of a hyperedge in the shortest path p in a B-hypergraph is considered a participant of p. We propose a betweenness centrality in the B-hypergraph that measures the number of shortest paths in which a node participates. The algorithm for computing the approximated betweenness centrality scores is also proposed. Through various performance experiments such as attack robustness and reachability tests, we show that our proposed betweenness centrality is a more appropriate measure in real-world B-hypergraph applications than ordinary betweenness centrality.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90637452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AliMe Assist: An Intelligent Assistant for Creating an Innovative E-commerce Experience","authors":"Feng-Lin Li, Minghui Qiu, Haiqing Chen, Xiongwei Wang, Xing Gao, Jun Huang, Juwei Ren, Zhongzhou Zhao, Weipeng Zhao, Lei Wang, Guwei Jin, Wei Chu","doi":"10.1145/3132847.3133169","DOIUrl":"https://doi.org/10.1145/3132847.3133169","url":null,"abstract":"We present AliMe Assist, an intelligent assistant designed for creating an innovative online shopping experience in E-commerce. Based on question answering (QA), AliMe Assist offers assistance service, customer service, and chatting service. It is able to take voice and text input, incorporate context to QA, and support multi-round interaction. Currently, it serves millions of customer questions per day and is able to address 85% of them. In this paper, we demonstrate the system, present the underlying techniques, and share our experience in dealing with real-world QA in the E-commerce field.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86025610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"pm-SCAN: an I/O Efficient Structural Clustering Algorithm for Large-scale Graphs","authors":"J. Seo, Myoung-Ho Kim","doi":"10.1145/3132847.3133121","DOIUrl":"https://doi.org/10.1145/3132847.3133121","url":null,"abstract":"Most existing algorithms for graph clustering, including SCAN, are not designed to cope with large volumes of data that cannot fit in main memory. When there is not enough memory, those algorithms will incur thrashing, i.e. result in huge I/O costs. We propose an I/O-efficient algorithm for structural clustering, pm-SCAN. The main idea of our scheme is to partition a large graph into several subgraphs that can fit into main memory. We first find clusters in each subgraph, and then merge them to produce final clustering of the input graph. Experimental results show that while other existing algorithms are not scalable to the graph size, our proposed method produces scalable performance for limited memory space.","PeriodicalId":20449,"journal":{"name":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2017-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88641395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}