A. Keskinarkaus, Ekaterina Gilman, Lauri Lovén, S. Tamminen, M. Hippi, G. Xiong, F. Zhu, T. Seppanen, J. Riekki, S. Pirttikangas
{"title":"Revealing reliable information from taxi traces: from raw data to information discovery","authors":"A. Keskinarkaus, Ekaterina Gilman, Lauri Lovén, S. Tamminen, M. Hippi, G. Xiong, F. Zhu, T. Seppanen, J. Riekki, S. Pirttikangas","doi":"10.1109/icdew55742.2022.00011","DOIUrl":"https://doi.org/10.1109/icdew55742.2022.00011","url":null,"abstract":"In this paper we present procedures for processing raw data collected with moving vehicles and for fusing this data with digital map data. The goal is to have a better understanding of the city traffic via quantitative research on collected taxi data in relation to digital map properties. Map attributes are provided by Digiroad, which is a database of Finnish road and street network. We define methods to clean up data that has been collected with taxis equipped with on-board vehicle tracking devices from real customer service situations. Consequently, the driving behavior may be inconsistent and sensor data can be limited and contain errors. We explain procedures of preparing data; filtering the most obvious errors from the data set, map-matching moving object data, and fetching map attributes along the routes of the moving vehicles. The fetched properties, as well as other measurement data, are used for deriving statistics and illustrations to study driving behavior in downtown Oulu, Finland.","PeriodicalId":429378,"journal":{"name":"2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129721257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FoodPrint: Computing Carbon Footprint of Recipes","authors":"Pulkit Piplani, Pritish Gulati, Shayna Malik, Shashwat Goyal, M. Gurbaxani, Ganesh Bagler","doi":"10.1109/icdew55742.2022.00020","DOIUrl":"https://doi.org/10.1109/icdew55742.2022.00020","url":null,"abstract":"Carbon footprint (CF) is an estimate of the total amount of greenhouse gases emitted by a product, thus assessing the contribution to climate change from the product or service provided [1]. Carbon footprint has acquired center stage in environmental debate as there has been increasing pressure on manufacturers to declare their product's impact on the environment. Calculating the CF of a recipe is a complex task because natural processes are inherently variable. In this study, we calculate the CF of more than 67,000 recipes from RecipeDB [2]. This study aims to facilitate the design of an effective consumer communication strategy through a software developed for computing CF for a diverse set of recipes. It also considers ingredients in a particular recipe that are not available in the dataset upto a certain threshold and grants them an average value of CF.","PeriodicalId":429378,"journal":{"name":"2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117305921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ratatouille: A tool for Novel Recipe Generation","authors":"Mansi Goel, Pallab Chakraborty, Vijay Ponnaganti, Minnet Khan, Sritanaya Tatipamala, Aakanksha Saini, Ganesh Bagler","doi":"10.48550/arXiv.2206.08267","DOIUrl":"https://doi.org/10.48550/arXiv.2206.08267","url":null,"abstract":"Due to availability of a large amount of cooking recipes online, there is a growing interest in using this as data to create novel recipes. Novel Recipe Generation is a problem in the field of Natural Language Processing in which our main interest is to generate realistic, novel cooking recipes. To come up with such novel recipes, we trained various Deep Learning models such as LSTMs and GPT-2 with a large amount of recipe data. We present Ratatouille (https://cosylab.iiitd.edu.in/ratatouille2/), a web based application to generate novel recipes.","PeriodicalId":429378,"journal":{"name":"2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129908194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Laptev, Wenbo Tao, C. Komurlu, Jason Xu, Deke Sun, T. Lux, Luo Mi
{"title":"Smarter Warehouse","authors":"N. Laptev, Wenbo Tao, C. Komurlu, Jason Xu, Deke Sun, T. Lux, Luo Mi","doi":"10.1109/icdew55742.2022.00005","DOIUrl":"https://doi.org/10.1109/icdew55742.2022.00005","url":null,"abstract":"Warehouse users often have to make too many decisions about their queries, pipelines, workflows and data to optimize the resources they use as well as the quality and the availability of their data. For example, whether to use Spark or Presto, how to best partition their data or what hyper-parameters to tune to resolve various query or pipeline problems. Furthermore, warehouse users are often unaware of big performance opportunities around data skew, multi-query optimization, query materialization and more. In this paper we describe the Smarter Warehouse initiative that aims to automate or simplify many of these optimization decisions. Our long term vision is for a large portion of the Smarter Warehouse optimizations to be seamlessly incorporated into the compute and I/O layers of the stack, leading to a simpler warehouse user experience and large amounts of resource savings.","PeriodicalId":429378,"journal":{"name":"2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123031984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting Sensory Textures with Rheological Characteristics from Recipe Sharing Sites","authors":"Hiroshi Uehara, D. Mochihashi","doi":"10.1109/icdew55742.2022.00019","DOIUrl":"https://doi.org/10.1109/icdew55742.2022.00019","url":null,"abstract":"This study tries to estimate texture of gel related dishes based on both the data from recipe sharing sites, and research results of food science. Since most of recipes are not accompanied by sufficient information about what kind of textures they realize, we propose a method to estimate characteristics of textures for each recipe by applying a joint topic model to bridge sensory texture terms in a recipe sharing site with corresponding quantitative textures resulting from food science research. The result shows that the estimated texture terms for dishes are consistent with rheology, the quantitative textures provided by related food science research.","PeriodicalId":429378,"journal":{"name":"2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115456780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trustworthy Distributed Intelligence for Smart Cities","authors":"Xiaoli Liu, S. Tamminen, S. Tarkoma, Xiang Su","doi":"10.1109/icdew55742.2022.00013","DOIUrl":"https://doi.org/10.1109/icdew55742.2022.00013","url":null,"abstract":"The future of smart cities has been significantly impacted by Internet of Things (IoT) and distributed intelligence, where a large scale of data are collected from massive amounts of heterogeneous devices and distributed intelligence brings storage, computing, and Artificial Intelligence (AI) functionality close to the end devices where data are generated for providing novel services and applications. However, AI empowered systems face many challenges due to the inscrutability of complex AI models which weakens the trust of users. This paper provides a general understanding of the underlying concepts and challenges in trustworthy distributed intelligence. A use case of district heating network is illustrated to explore the proposed concepts, technologies, and challenges for enabling trustworthy distributed intelligence for smart cities.","PeriodicalId":429378,"journal":{"name":"2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125644437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Object Detection in Indian Food Platters using Transfer Learning with YOLOv4","authors":"Deepanshu Pandey, Purva Parmar, Gauri Toshniwal, Mansi Goel, Vishesh Agrawal, Shivangi Dhiman, Lavanya Gupta, Ganesh Bagler","doi":"10.48550/arXiv.2205.04841","DOIUrl":"https://doi.org/10.48550/arXiv.2205.04841","url":null,"abstract":"Object detection is a well-known problem in computer vision. Despite this, its usage and pervasiveness in the traditional Indian food dishes has been limited. Particularly, recognizing Indian food dishes present in a single photo is challenging due to three reasons: 1. Lack of annotated Indian food datasets 2. Non-distinct boundaries between the dishes 3. High intra-class variation. We solve these issues by providing a comprehensively labelled Indian food dataset- IndianFood10, which contains 10 food classes that appear frequently in a staple Indian meal and using transfer learning with YOLOv4 object detector model. Our model is able to achieve an overall mAP score of 91.8% and f1-score of 0.90 for our 10 class dataset. We also provide an extension of our 10 class dataset- IndianFood20, which contains 10 more traditional Indian food classes.","PeriodicalId":429378,"journal":{"name":"2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115630259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas Woltmann, Jonathan Drechsel, Claudio Hartmann, Wolfgang Lehner
{"title":"Ingredient-based Forecast of Sold Dish Portions in Campus Canteen Kitchens","authors":"Lucas Woltmann, Jonathan Drechsel, Claudio Hartmann, Wolfgang Lehner","doi":"10.1109/icdew55742.2022.00023","DOIUrl":"https://doi.org/10.1109/icdew55742.2022.00023","url":null,"abstract":"In the catering industry, one major challenge is the unknown short-term demand for dish portions. Customers want to avoid queuing and desire their favorite dish according to their preferences. Meeting these demands is important for the industry but predicting future sales is a challenging task. Often, the predictions are derived manually and automated approaches are rarely applied in practice. This paper presents an ML-based forecast model using a set of derived features to predict shares and absolute numbers of dish portions per day. In particular, these features include text-based extractions of ingredients, calendar effects to model time dependencies, and favorite features to model customers' preferences. As the detailed real world evaluation shows, our approach achieves a relative model error of 15% for the prediction of dishes. Furthermore, we discuss the influence of beneficial features and assess their influence on the overall prediction quality.","PeriodicalId":429378,"journal":{"name":"2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114679083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dylan Koh, Gui-Shan Tan, Eldrick Xie, Linshan Tiong, M. Khoo, C. Yap
{"title":"Secured Data Management and Infrastructures in Smart Aquaculture","authors":"Dylan Koh, Gui-Shan Tan, Eldrick Xie, Linshan Tiong, M. Khoo, C. Yap","doi":"10.1109/icdew55742.2022.00012","DOIUrl":"https://doi.org/10.1109/icdew55742.2022.00012","url":null,"abstract":"To support UN sustainable development goals (UNSDGs), as well as Singapore “30 by 30” vision (to meet 30% of food supply nutrition by 2030 through local produce), this project implemented a Secured Smart Fish Feeder system through data acquisition, data pre-processing, IoT wireless technology (LoRA and MQTT) together with cellular solutions. To resolve power management issues, distance and coverage issues, data size issues, a new data application frame and transmission method are created to support this. At the same time, appropriate cipher AES/ASCON was used to secure the connections. Last but not least, the system is designed with a data management frame and dashboarding to support the over command and control of the system.","PeriodicalId":429378,"journal":{"name":"2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128661306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Till Döhmen, Madelon Hulsebos, C. Beecks, Sebastian Schelter
{"title":"GitSchemas: A Dataset for Automating Relational Data Preparation Tasks","authors":"Till Döhmen, Madelon Hulsebos, C. Beecks, Sebastian Schelter","doi":"10.1109/icdew55742.2022.00016","DOIUrl":"https://doi.org/10.1109/icdew55742.2022.00016","url":null,"abstract":"The preparation of relational data for machine learning (ML) has largely remained a manual, labor-intensive process, while automated machine learning has made great strides in recent years. Long-standing challenges, such as reliable foreign key detection still pose a major hurdle towards more automation of data integration and preparation tasks. We created a new dataset aimed at increasing the level of automation of data preparation tasks for relational data. The dataset, called GITSCHEMAS, consists of schema metadata for almost 50k real-world databases, collected from public GitHub repositories. To our knowledge, this is the largest dataset of such kind, containing approximately 300k table names, 2M column names including data types, and 100k real (not semantically inferred) foreign key relationships. In this paper, we describe how Gitschemaswas created, and provide key insights into the dataset. Furthermore, we show how GITSCHEMAS can be used to find relevant tables for data augmentation in an AutoML setting.","PeriodicalId":429378,"journal":{"name":"2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130987461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}