跨多方存储库的协作相似度搜索

Proceedings of the 19th International Conference on Distributed Computing and Networking Pub Date : 2018-01-04 DOI:10.1145/3154273.3154352

Malek Athamnah, Anis Alazzawe, K. Kant

{"title":"跨多方存储库的协作相似度搜索","authors":"Malek Athamnah, Anis Alazzawe, K. Kant","doi":"10.1145/3154273.3154352","DOIUrl":null,"url":null,"abstract":"The expanding role of online data collection and analytics from a variety of sources in the operation of emerging cyber and cyberphysical systems brings in two crucial issues: (a) collaboration across multiple parties that generate and own parts of the data with only limited access rights to others, and (b) need to efficiently identify suitable patterns in the data in order to drive the decision making. In this paper, we examine such scenarios where we assume that all collected data is organized in form of a database and the relevant patterns are those that concern similarities across the entities represented by the data. An entity of interest is either a physical or logical item with multiple attributes (e.g., a shipped product with price and size as attributes, traffic sensors measuring the volume of traffic and weather conditions at intersections). We assume a that all data regarding the entities is maintained in a standard relational form so that it is possible to describe the queries on it precisely. The similarities are then considered in terms of attribute values. In some cases, the attributes of the entities may themselves be partitioned across parties and thus stored on different nodes. We consider queries in this environment that must comply with access rules across parties and seek entities that are similar to a given entity in terms of their attributes. We propose efficient methods for getting similar entities across multiple attributes when the threshold for similarity may vary across searches. Through extensive experimentation, we show that our mechanism is significantly more efficient than a direct search through the entire dataset.","PeriodicalId":276042,"journal":{"name":"Proceedings of the 19th International Conference on Distributed Computing and Networking","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Collaborative Similarity Search Across Multi-party Repositories\",\"authors\":\"Malek Athamnah, Anis Alazzawe, K. Kant\",\"doi\":\"10.1145/3154273.3154352\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The expanding role of online data collection and analytics from a variety of sources in the operation of emerging cyber and cyberphysical systems brings in two crucial issues: (a) collaboration across multiple parties that generate and own parts of the data with only limited access rights to others, and (b) need to efficiently identify suitable patterns in the data in order to drive the decision making. In this paper, we examine such scenarios where we assume that all collected data is organized in form of a database and the relevant patterns are those that concern similarities across the entities represented by the data. An entity of interest is either a physical or logical item with multiple attributes (e.g., a shipped product with price and size as attributes, traffic sensors measuring the volume of traffic and weather conditions at intersections). We assume a that all data regarding the entities is maintained in a standard relational form so that it is possible to describe the queries on it precisely. The similarities are then considered in terms of attribute values. In some cases, the attributes of the entities may themselves be partitioned across parties and thus stored on different nodes. We consider queries in this environment that must comply with access rules across parties and seek entities that are similar to a given entity in terms of their attributes. We propose efficient methods for getting similar entities across multiple attributes when the threshold for similarity may vary across searches. Through extensive experimentation, we show that our mechanism is significantly more efficient than a direct search through the entire dataset.\",\"PeriodicalId\":276042,\"journal\":{\"name\":\"Proceedings of the 19th International Conference on Distributed Computing and Networking\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-01-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 19th International Conference on Distributed Computing and Networking\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3154273.3154352\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th International Conference on Distributed Computing and Networking","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3154273.3154352","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在新兴的网络和网络物理系统的操作中，来自各种来源的在线数据收集和分析的作用不断扩大，带来了两个关键问题:(a)生成和拥有部分数据的多方协作，对其他人的访问权限有限;(b)需要有效地识别数据中的合适模式，以推动决策制定。在本文中，我们研究了这样的场景，我们假设所有收集的数据都以数据库的形式组织，相关模式是那些与数据所表示的实体之间的相似性有关的模式。感兴趣的实体是具有多个属性的物理或逻辑项(例如，将价格和尺寸作为属性的运输产品，测量十字路口交通量和天气条件的交通传感器)。我们假设所有关于实体的数据都以标准的关系形式维护，以便可以精确地描述对其的查询。然后根据属性值考虑相似性。在某些情况下，实体的属性本身可能跨各方进行分区，从而存储在不同的节点上。我们在这个环境中考虑必须遵守各方访问规则的查询，并寻找与给定实体在属性方面相似的实体。当相似性阈值在不同的搜索中可能不同时，我们提出了在多个属性中获得相似实体的有效方法。通过大量的实验，我们证明了我们的机制比直接搜索整个数据集要有效得多。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Collaborative Similarity Search Across Multi-party Repositories

The expanding role of online data collection and analytics from a variety of sources in the operation of emerging cyber and cyberphysical systems brings in two crucial issues: (a) collaboration across multiple parties that generate and own parts of the data with only limited access rights to others, and (b) need to efficiently identify suitable patterns in the data in order to drive the decision making. In this paper, we examine such scenarios where we assume that all collected data is organized in form of a database and the relevant patterns are those that concern similarities across the entities represented by the data. An entity of interest is either a physical or logical item with multiple attributes (e.g., a shipped product with price and size as attributes, traffic sensors measuring the volume of traffic and weather conditions at intersections). We assume a that all data regarding the entities is maintained in a standard relational form so that it is possible to describe the queries on it precisely. The similarities are then considered in terms of attribute values. In some cases, the attributes of the entities may themselves be partitioned across parties and thus stored on different nodes. We consider queries in this environment that must comply with access rules across parties and seek entities that are similar to a given entity in terms of their attributes. We propose efficient methods for getting similar entities across multiple attributes when the threshold for similarity may vary across searches. Through extensive experimentation, we show that our mechanism is significantly more efficient than a direct search through the entire dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 19th International Conference on Distributed Computing and Networking

自引率

0.00%

发文量