Joseph W. Hu, Ivan T. Bowman, A. Nica, Anil K. Goel
{"title":"Distribution-Driven, Embedded Synthetic Data Generation System and Tool for RDBMS","authors":"Joseph W. Hu, Ivan T. Bowman, A. Nica, Anil K. Goel","doi":"10.1109/ICDEW.2019.00-25","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-25","url":null,"abstract":"Many self-managing relational database management systems (RDBMS) need to programmatically generate synthetic data to train machine learning models. This paper proposes the concept of shadow database and a framework to derive shadow database from production database that matches distribution properties of source data. Moreover, we have designed and implemented an embedded synthetic data generation tool that takes data distribution profile as input and generates a shadow database according to histograms of source data. The distribution profile is passed into the tool either through an export-import mechanism or as a JSON string. The shadow database can scale to be larger or smaller than the original database and serve as a testbed to train learning models. Unlike most other data generation tools, our tool is implemented as SQL procedures that can be embedded in the underlying RDBMS.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127641287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blockchain Enabled Distributed Data Management - A Vision","authors":"Furqan Baig, Fusheng Wang","doi":"10.1109/ICDEW.2019.00-39","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-39","url":null,"abstract":"Blockchain has gained much attention in recent academic and research works not only in crypto-currency but also in many other fields such as supply chain, health, storage etc. The application of blockchain in data management domain, however, is mostly geared towards the aspect of security and immutability. In this paper we propose integrating blockchain with distributed data management and study some open challenges and assumptions in doing so. We claim that, from data management perspective, blockchain's ability to handle unequal participants is more important than security and immutability. Finally, we propose possible ideas to integrate blockchain into distributed data transaction and management workflows to design a globally consistent data store ensuring availability guarantees along with support for unifying heterogeneous data backends.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132862037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting Online User Purchase Behavior Based on Browsing History","authors":"Yunghui Chu, Hui-Kuo Yang, Wen-Chih Peng","doi":"10.1109/ICDEW.2019.00-13","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-13","url":null,"abstract":"Recently, people tend to purchase through websites. This change allows e-commerce sites to collect user behavior data from web logs. E-commerce marketing forces usually make use of such data to come up with subsequent promotional campaign to drive more traffic, and converting into paying customers. In this paper we consider a special kind of e-commerce companies which sell products with similar property and usually at a high price. Therefore, the recommendation becomes less important than prediction of items(if any) bought. We want to discover potential buyers and deliver ads or even coupons to them, expecting them to be real buyers. In this paper, we model the buying behaviors from clicking records with patterns extracted using feature engineering approach. Our solution was to model two kinds of browsing behaviors, namely hesitant and impulsive respectively. In the model, we define some interaction features from click-streams which uncover users' purchase intention with the product pages, how long the user stays on that page, and then build a model which can predict users' preference. Experimental results on a real dataset from an e-commerce company demonstrate the effectiveness of the proposed method. The approaches in our work can be used to model user purchasing intent and applied to e-commerce sites which sell high-end products.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121193056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AutoCache: Employing Machine Learning to Automate Caching in Distributed File Systems","authors":"H. Herodotou","doi":"10.1109/ICDEW.2019.00-21","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-21","url":null,"abstract":"The use of computational platforms such as Hadoop and Spark is growing rapidly as a successful paradigm for processing large-scale data residing in distributed file systems like HDFS. Increasing memory sizes have recently led to the introduction of caching and in-memory file systems. However, these systems lack any automated caching mechanisms for storing data in memory. This paper presents AutoCache, a caching framework that automates the decisions for when and which files to store in, or remove from, the cache for increasing system performance. The decisions are based on machine learning models that track and predict file access patterns from evolving data processing workloads. Our evaluation using real-world workload traces from a Facebook production cluster compares our approach with several other policies and showcases significant benefits in terms of both workload performance and cluster efficiency.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129318398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Georgiou, Aristodemos Paphitis, Michael Sirivianos, H. Herodotou
{"title":"Towards Auto-Scaling Existing Transactional Databases with Strong Consistency","authors":"M. Georgiou, Aristodemos Paphitis, Michael Sirivianos, H. Herodotou","doi":"10.1109/ICDEW.2019.00-26","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-26","url":null,"abstract":"Existing relational database systems often suffer from rapid increases or significant variability of transactional workloads but lack support for scalability or elasticity. Database replication has been employed to scale workload performance but past approaches make various performance versus consistency tradeoffs and typically lack the mechanisms and policies for dynamically adding and removing replicas. This paper presents Hihooi, a replication-based middleware system that is able to achieve scalability, strong consistency, and elasticity for existing transactional databases. These features are enabled by (i) a novel replication algorithm for propagating database modifications asynchronously and consistently to all replicas at high speeds, and (ii) a new routing algorithm for directing incoming transactions to consistent replicas. Our experimental evaluation validates the high scalability and elasticity benefits offered by Hihooi, which form the key ingredients towards a truly auto-scaling system.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125350496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Distributed Multi-model Learning on Apache Spark for Model-Based Recommender","authors":"Anas Alzogbi, Polina Koleva, G. Lausen","doi":"10.1109/ICDEW.2019.00-12","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-12","url":null,"abstract":"Model-based approaches for Content-based Filtering (CBF) recommendation have the potential of generating representative users models owing to their ability to learn from users actions. However, the need for training an individual model for each user leads to a scalability issue and brings a high computational cost that contributes to the limited adaptation of model-based approaches as efficient CBF recommenders. This is particularly relevant for production systems where the recommender is expected to serve a large number of users. In this work, we address the efficiency issue of model-based CBF recommender systems and present a new approach for distributed multi-model learning based on Apache Spark. We use Ranking SVM as the underlying recommendation algorithm and present a distributed implementation that allows efficient training of multiple models in parallel using a collection of machines. We demonstrate the efficiency of our approach on a real-world dataset from citeulike and show that our approach can reduce the cost of multi-model learning without affecting the prediction accuracy.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130217506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards self-managing cloud-scale computing platforms: Experiences and challenges","authors":"Jingren Zhou","doi":"10.1109/ICDEW.2019.00-24","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-24","url":null,"abstract":"Summary form only given, as follows. The complete presentation was not made available for publication as part of the conference proceedings. More and more companies heavily rely on massive data analysis of many kinds to understand data insights and drive business decisions. To support this ever-increasing need, big data computing platforms have grown to an unprecedented scale, way beyond human manageability. In this talk, I'll share our experiences at Alibaba to enable our big data platforms to configure, optimize, monitor, and protect themselves automatically, including automatic version testing and deployment control, system health monitoring and alert, automatic physical design/data placement/storage optimization, etc. I'll also outline some outstanding research and engineering challenges.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126654203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nilesh Nilesh, Madhu Kumari, Pritom Hazarika, Vishal Raman
{"title":"Recommendation of Indian Cuisine Recipes Based on Ingredients","authors":"Nilesh Nilesh, Madhu Kumari, Pritom Hazarika, Vishal Raman","doi":"10.1109/ICDEW.2019.00-28","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-28","url":null,"abstract":"there are lots of varieties of Indian cuisine available with same ingredients. In India, Traditional cuisines consist of wide varieties due to locally available spices, herbs, vegetables, and fruits. In this paper, we purposed a method that recommends recipes of Indian cuisine on the basis of available ingredients and liked cuisine. For this work, we did web scraping to make a collection of recipes' varieties and after that apply the content-based approach of machine learning to recommend the recipes. This system gives the recommendation of Indian Cuisines based on ingredients.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129773303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Group Recommendation Approach Based on Neural Network Collaborative Filtering","authors":"J. Du, Lin Li, Peng Gu, Qing Xie","doi":"10.1109/ICDEW.2019.00-18","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-18","url":null,"abstract":"At present, the most popular recommendation algorithms belong to the class of latent factor models(LFM). Compared with traditional user-based and item-based collaborative filtering methods, the latent factor model effectively improves recommendation accuracy. In recent years, deep neural networks have succeeded in many research fields, such as computer vision, speech recognition, and natural language processing. However, there are few studies combining recommendation systems and deep neural networks, especially for group recommendation. Some academic studies have adopted deep learning methods, but they mainly use it to process auxiliary information, such as acoustic features of sounds, and semantic analysis of texts, the inner product is still used to deal with latent features of users and items. In this paper, we first obtain the nonlinear interaction of latent feature vectors between users and projects through multi-layer perceptron(MLP), and use the combination of LFM and MLP to achieve collaborative filtering recommendation between users and items. Secondly, based on the individual's recommendation score, a fusion strategy based on Nash equilibrium is designed to ensure the average satisfaction of the group users. Our experiments are conducted on the Track 1 of KDD CUP 2012 public data set, taking the square root mean square error(RMSE) as the evaluation index. The experiment compares the traditional LFM optimization model, the MLP model and the LFM-MLP hybrid model in individual recommendation, and compares the strategy proposed in this paper with the traditional three single group strategies, the most pleasure, the average strategy and the least misery. The experimental results show that the proposed method can effectively improve the accuracy of group recommendation.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133656192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dominik Schmelz, K. Pinter, S. Strobl, Lei Zhu, Phillip Niemeier, T. Grechenig
{"title":"Technical Mechanics of a Trans-Border Waste Flow Tracking Solution Based on Blockchain Technology","authors":"Dominik Schmelz, K. Pinter, S. Strobl, Lei Zhu, Phillip Niemeier, T. Grechenig","doi":"10.1109/ICDEW.2019.00-38","DOIUrl":"https://doi.org/10.1109/ICDEW.2019.00-38","url":null,"abstract":"When it comes to waste management and waste tracking, the main objectives are: prevent and reuse as much waste as possible. In the last decade, there has been a strong shift from the disposal of waste to recycling thereof. Within the EU, EU member states are required to establish waste prevention programs through the \"Waste Framework EU Directive\" which attempts to turn the EU into a recycling society. Similar endeavors also exist in other countries and regions. With globalization, the disposal of waste became a lucrative business case, often without transparency whether the waste has been disposed of properly and according to regulations or directives installed by each country. Waste flow management is the basis of sustainable waste prevention and recycling. Short-term solutions and financial profit accelerated malpractice in waste disposal and recycling. The currently used systems cannot track waste across borders in a transparent, but data-protected and tamper-proof way. Therefore, solutions to address these concerns have to be found and implemented. Blockchain enables new approaches, especially in ecosystems with distrust in every participating stakeholder, which is the case in waste flow tracking. We introduce and discuss a novel solution for waste tracking, on the basis of blockchain technology and smart contracts, that fulfills the aforementioned requirements.","PeriodicalId":186190,"journal":{"name":"2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129352291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}