Yusuke Kosaka, Shu Murakami, Thomas Laurent, Kento Goto, Motomichi Toyama
{"title":"RTA: A Framework for the Integration of Local and Relational Open Data","authors":"Yusuke Kosaka, Shu Murakami, Thomas Laurent, Kento Goto, Motomichi Toyama","doi":"10.1145/3105831.3105852","DOIUrl":null,"url":null,"abstract":"There are currently massive amounts of public data, also refereed to as open data, for example stock price data or weather data. However, such data is distributed in a variety of ways, such as downloadable files like CSV or XML files, or through API calls to web services. Each data source thus requires a specific workflow, making it a burden for the users to process and use this data. This barrier to use diminishes the openness of this data We thus propose the Remote Table Access (RTA) system, a simple and safe architecture for publishing, i.e. giving open read only access to relational data, and easily integrating it with the user's local data. RTA enables the user to query relational open data and their own local data seamlessly through a single SQL query. To allow this, we designed a three parties architecture featuring a client-side application, an optional server-side module and a \"Public Table Library\" (PTL). The client side application processes the RTA query and fetches the necessary data, the server side system acts as an agent between the remote database and the client, offering added security as well as scalability in terms of connections, and the PTL list all the published data and stores its access information. We implemented an early prototype of this architecture as a proof of concept. We validated it against two datasets, including data from the TPC-C benchmark and make it available1. Our results show the feasability of RTA and possible significant reduction of query processing time mainly because of the reduction on transmission volume by condition pushing and semijoin.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st International Database Engineering & Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3105831.3105852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
There are currently massive amounts of public data, also refereed to as open data, for example stock price data or weather data. However, such data is distributed in a variety of ways, such as downloadable files like CSV or XML files, or through API calls to web services. Each data source thus requires a specific workflow, making it a burden for the users to process and use this data. This barrier to use diminishes the openness of this data We thus propose the Remote Table Access (RTA) system, a simple and safe architecture for publishing, i.e. giving open read only access to relational data, and easily integrating it with the user's local data. RTA enables the user to query relational open data and their own local data seamlessly through a single SQL query. To allow this, we designed a three parties architecture featuring a client-side application, an optional server-side module and a "Public Table Library" (PTL). The client side application processes the RTA query and fetches the necessary data, the server side system acts as an agent between the remote database and the client, offering added security as well as scalability in terms of connections, and the PTL list all the published data and stores its access information. We implemented an early prototype of this architecture as a proof of concept. We validated it against two datasets, including data from the TPC-C benchmark and make it available1. Our results show the feasability of RTA and possible significant reduction of query processing time mainly because of the reduction on transmission volume by condition pushing and semijoin.