{"title":"Defining Similarity Spaces for Large-Scale Image Retrieval Through Scientific Workflows","authors":"Luis Fernando Milano Oliveira, D. S. Kaster","doi":"10.1145/3105831.3105863","DOIUrl":"https://doi.org/10.1145/3105831.3105863","url":null,"abstract":"Content-Based Image Retrieval (CBIR) employs visual features from images for searching and retrieving of data. Systems based on this concept depend on a similarity space instance definition, but achieving an ideal instance is a very complex process and is dependent on domain knowledge. At the same time, domain experts are often unable to interact fully with systems because of technicalities. In this paper, we propose an architecture, based on scientific workflows, which allows users with no prior programming experience to build processes on images, creating Similarity Spaces and evaluating them when running similarity queries. Through this architecture, they can use domain expertise to improve image retrieval in a coordinated, auditable and reproducible manner, while being able to process very large image collections. We describe a prototype system and carry out experiments evaluating its performance in various scenarios. The current implementation supports both similarity space definition and querying workflows, achieving suitable speedups with the increase in the number of machines.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123118264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IoT: Imminent ownership Threat","authors":"B. Desai","doi":"10.1145/3105831.3105843","DOIUrl":"https://doi.org/10.1145/3105831.3105843","url":null,"abstract":"Internet of things (IoT) is the current trend to connect all types of devices to the internet with the purpose of making remote control of these devices possible from anywhere. This allows for convenience, efficiency and the benefit of collecting data from these devices. However, as has been pointed out, there is an imminent threat to privacy, security and personal control including the threat to real ownership. Concerns ought to be raised, not only with respect to matters of privacy and security of personal information, but also in light of the trend whereby devices and appliances, including software, are not owned but rented with the real owner making changes at their convenience; the rental being a constant cost.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128645251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Temporal Abstraction for Health Diagnosis Prediction using Deep Recurrent Networks","authors":"Alireza Manashty, Janet V. Light-Thompson","doi":"10.1145/3105831.3105858","DOIUrl":"https://doi.org/10.1145/3105831.3105858","url":null,"abstract":"Temporal health data, either as electronic health record or from nursery home care units, usually include multivariate sparse temporal health data different from a regular time-series. Conventional neural network models cannot be used in such data; recurrent neural networks (RNN) (such as with long-term short memory (LSTM) cells) are used to model time-series. However, long-term variable-length sparse temporal data are not suitable for an efficient learning with RNN models. This research presents a novel pattern extraction technique for use in diagnosis prediction using deep learning techniques in recurrent neural networks. To predict diagnosis from such data, a window-based data abstraction technique called intensity temporal sequence (ITS) is proposed and tested. ITS enables presenting long-term sparse temporal data as a fixed-length sequence suitable for training by deep recurrent networks. To evaluate the method against other techniques, such as recent temporal patterns (RTP), a pattern simulator and anomaly injection method is developed to generate 100,000 patient records with 10 possible diseases over 10,000 units of time. The results indicate that ITS performs slightly better than RTP in terms of accuracy when using techniques other than LSTM. However, only ITS is suitable for learning LSTM; a model which performs better in terms of accuracy.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121387255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving the Querying Efficiency of the PLWAH Bitmap Algorithm","authors":"Benjamin Taufen, Jason Sawin, David Chiu","doi":"10.1145/3105831.3105868","DOIUrl":"https://doi.org/10.1145/3105831.3105868","url":null,"abstract":"Bitmap indices are commonly used for accessing large, read-only data. A bitmap is a simplified model of the underlying data in secondary storage. Its coarse representation enables the use of fast CPU operations to answer common database queries. Additionally, bitmaps are very compressible. Several known compression algorithms allow the compressed form of the bitmap to be queried directly, and one of which is Position List Word-Aligned Hybrid (PLWAH). PLWAH is modified hybrid run-length encoding scheme that can achieve better compression than traditional schemes such as Word-Aligned Hybrid (WAH). This improved compression introduces an increased query processing cost, of which we address in this paper. We present a technique that uses metadata to allow PLWAH's query algorithm to exploit logical short-circuiting opportunities, reducing the cost of certain queries. In our empirical study, we found that our approach achieved an average speedup of 1.41x over PLWAH for real scientific data sets. For specific queries, our approach realized speedups as high as 8000x.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130888989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bettina Fazzinga, S. Flesca, F. Furfaro, E. Masciari
{"title":"WFinger: a joint-decoder for very short Tardos fingerprinting codes","authors":"Bettina Fazzinga, S. Flesca, F. Furfaro, E. Masciari","doi":"10.1145/3105831.3105860","DOIUrl":"https://doi.org/10.1145/3105831.3105860","url":null,"abstract":"","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132685365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extraction of Frequent Patterns Based on Users' Interests from Semantic Trajectories with Photographs","authors":"Yoshiaki Takimoto, Kento Sugiura, Y. Ishikawa","doi":"10.1145/3105831.3105870","DOIUrl":"https://doi.org/10.1145/3105831.3105870","url":null,"abstract":"Along with the popularization of location-based social networking (LBSN), semantic trajectories, which are trajectories with additional information such as photographs and texts, are increasing, and their utilization is required. We consider frequent pattern extraction as applicable to analysis of semantic trajectories and extraction of regions of interest (ROIs). In this research, we propose SimDBSCAN, which considers both spatial density and similarity of points, by extending DBSCAN, which uses density-based clustering, in order to capture users' interests. Since SimDBSCAN identifies points that are interested in the same object in the neighborhood as ROIs, it is possible to detect not only known ROIs such as tourist sites but also unknown ROIs. In this paper, we explain the algorithm of SimDBSCAN and present the experimental results using photographs collected from Flickr. The experiments show that useful ROIs and patterns can be extracted by the proposed method.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127410583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roney Reis de C. e Silva, Bruno de C. Leal, Felipe T. Brito, V. Vidal, Javam C. Machado
{"title":"A Differentially Private Approach for Querying RDF Data of Social Networks","authors":"Roney Reis de C. e Silva, Bruno de C. Leal, Felipe T. Brito, V. Vidal, Javam C. Machado","doi":"10.1145/3105831.3105838","DOIUrl":"https://doi.org/10.1145/3105831.3105838","url":null,"abstract":"As the amount of collected social network information in RDF format grows, the development of solutions for the privacy of individuals, their attributes and relationships with others becomes an important subject of study. However, data privacy solutions are not well suitable for this specific type of data, mainly because they usually do not consider relationships between individuals, which are crucial to semantic data and social networks. Differential privacy is one of the most suitable techniques for statistical queries and, although it has been extensively studied in many papers, there is still much research to be done in this context. This paper presents two main contributions for privacy preserving statistic queries containing sensitive information about relationships between individuals. The first one is a complete approach to applying ϵ-differential privacy for RDF data and the second one presents an index-like data structure to efficiently compute parameters for the differential privacy mechanism: the query's actual value and data sensitivity for the given query. We conclude by evaluating our contributions over three real social network datasets presenting utility analysis for different values of ϵ. We also show the performance benefit of our index-like data structure for sensitivity calculation.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117226655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rapid automatic vehicle manufacturer recognition using Random forest","authors":"J. Sedlák, L. Popelínský","doi":"10.1145/3105831.3105869","DOIUrl":"https://doi.org/10.1145/3105831.3105869","url":null,"abstract":"This paper studies the applicability of machine learning methods in identifying the individual vehicle attributes based on camera images from the real environment. We focus on a vehicle manufacturer recognition. Classification based on the front vehicle mask makes possible to identify also vehicles without manufacturer's logo. The algorithm has been evaluated on 2988 samples collected directly from cameras in real environment. Random forest algorithm has achieved the best results in classification. Accuracy for classifying the most frequent two manufacturers, Skoda and Volkswagen has been 97.21% and 98.10% respectively. It is also fast enough to use it in real-time, even on low-cost devices like mobile phones or single-board computers like Raspberry Pi. Functional implementation of this method has been successfully deployed in a real-world environment.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124047345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yusuke Kosaka, Shu Murakami, Thomas Laurent, Kento Goto, Motomichi Toyama
{"title":"RTA: A Framework for the Integration of Local and Relational Open Data","authors":"Yusuke Kosaka, Shu Murakami, Thomas Laurent, Kento Goto, Motomichi Toyama","doi":"10.1145/3105831.3105852","DOIUrl":"https://doi.org/10.1145/3105831.3105852","url":null,"abstract":"There are currently massive amounts of public data, also refereed to as open data, for example stock price data or weather data. However, such data is distributed in a variety of ways, such as downloadable files like CSV or XML files, or through API calls to web services. Each data source thus requires a specific workflow, making it a burden for the users to process and use this data. This barrier to use diminishes the openness of this data We thus propose the Remote Table Access (RTA) system, a simple and safe architecture for publishing, i.e. giving open read only access to relational data, and easily integrating it with the user's local data. RTA enables the user to query relational open data and their own local data seamlessly through a single SQL query. To allow this, we designed a three parties architecture featuring a client-side application, an optional server-side module and a \"Public Table Library\" (PTL). The client side application processes the RTA query and fetches the necessary data, the server side system acts as an agent between the remote database and the client, offering added security as well as scalability in terms of connections, and the PTL list all the published data and stores its access information. We implemented an early prototype of this architecture as a proof of concept. We validated it against two datasets, including data from the TPC-C benchmark and make it available1. Our results show the feasability of RTA and possible significant reduction of query processing time mainly because of the reduction on transmission volume by condition pushing and semijoin.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129496256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis using spectral clustering to predict Internet gaming behaviours","authors":"M. Rupert, Nazir S. Hawi","doi":"10.1145/3105831.3105867","DOIUrl":"https://doi.org/10.1145/3105831.3105867","url":null,"abstract":"As computers are becoming more powerful and with the availability of sophisticated data visualization software, the capability to extract knowledge from data for the benefit of individuals and society is imperative. Contemporary research barely started to reap the power of data science techniques to process and analyze data, and communicate results. Actually, research studies from various disciplines have been relying heavily on traditional statistical methods such as correlations and regressions. In this study, we propose a novel approach using clustering analysis to tackle the problem of the impact of Internet gaming. The special interest in Internet gaming is intensifying as it is spreading widely among students and as it is believed to have detrimental effects on their academic performance and sleep habits. We aim to identify patterns related to Internet gamers by using spectral clustering to determine the structure among Internet gaming, academic achievement, and sleep habits. Results show three distinctive clusters and a strong associations between Internet gaming disorder on one hand, and decline in both sleep hours and academic performance on the other hand.","PeriodicalId":319729,"journal":{"name":"Proceedings of the 21st International Database Engineering & Applications Symposium","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127264290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}