{"title":"The DevOps Paradigm with Cloud Data Analytics for Green Business Applications","authors":"Michael J. Pawlish, A. Varde","doi":"10.1145/3229329.3229334","DOIUrl":"https://doi.org/10.1145/3229329.3229334","url":null,"abstract":"This paper reviews the emergence of the DevOps (development and operations) paradigm in the industry and the influence it has along with cloud based data management and analytics in the greening of business applications. It considers the geoscience domain as an example discussing usefulness in a GIS (geographic information system). Similar claims can be applied to other domains. Investigating the emergence of DevOps technologies and examining the dramatic shift in IT towards cloud and hybrid models for data analytics, the paper paints a picture of systems that have the ability to green their impact on society. It also addresses concerns from a privacy and security perspective and concludes with open issues for further research.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"2 1","pages":"51-59"},"PeriodicalIF":0.0,"publicationDate":"2018-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88947066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward a Progress Indicator for Machine Learning Model Building and Data Mining Algorithm Execution: A Position Paper.","authors":"Gang Luo","doi":"10.1145/3166054.3166057","DOIUrl":"https://doi.org/10.1145/3166054.3166057","url":null,"abstract":"<p><p>For user-friendliness, many software systems offer progress indicators for long-duration tasks. A typical progress indicator continuously estimates the remaining task execution time as well as the portion of the task that has been finished. Building a machine learning model often takes a long time, but no existing machine learning software supplies a non-trivial progress indicator. Similarly, running a data mining algorithm often takes a long time, but no existing data mining software provides a nontrivial progress indicator. In this article, we consider the problem of offering progress indicators for machine learning model building and data mining algorithm execution. We discuss the goals and challenges intrinsic to this problem. Then we describe an initial framework for implementing such progress indicators and two advanced, potential uses of them, with the goal of inspiring future research on this topic.</p>","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"19 2","pages":"13-24"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3166054.3166057","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35586696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Measurement and Prediction of Web Content Utility: A Review","authors":"Yuan Yao, Hanghang Tong, Feng Xu, Jian Lu","doi":"10.1145/3166054.3166056","DOIUrl":"https://doi.org/10.1145/3166054.3166056","url":null,"abstract":"Nowadays, various types and large amount of content are available on the Web. Characterizing the Web content and predicting its inherent usefulness become important problems that may benefit many applications such as information filtering and content recommendation. In this article, we present a brief review of the existing measurements and the corresponding prediction methods for Web content utility. Specially, we focus on three close and widely studied tasks, i.e., content popularity prediction, content quality prediction, and scientific article impact prediction. While reviewing the existing work in each of the above three tasks, we mainly aim to answer the following two fundamental questions: how to measure the Web content utility, and how to make the predictions under the measurement. We find that while the three tasks are closely related, they bear subtle differences in terms of prediction urgency, feature extraction, and algorithm design. After that, we discuss some future directions in measuring and predicting Web content utility","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"28 1","pages":"1-12"},"PeriodicalIF":0.0,"publicationDate":"2017-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81337634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Delve: A Dataset-Driven Scholarly Search and Analysis System","authors":"Uchenna Akujuobi, Xiangliang Zhang","doi":"10.1145/3166054.3166059","DOIUrl":"https://doi.org/10.1145/3166054.3166059","url":null,"abstract":"Research and experimentation in various scientific fields are based on the observation, analysis and benchmarking on datasets. The advancement of research and development has thus, strengthened the importance of dataset access. However, without enough knowledge of relevant datasets, researchers usually have to go through a process which we term manual dataset retrieval\". With the accelerated rate of scholarly publications, manually finding the relevant dataset for a given research area based on its usage or popularity is increasingly becoming more and more difficult and tedious. In this paper, we present Delve, a web-based dataset retrieval and document analysis system. Unlike traditional academic search engines and dataset repositories, Delve is dataset driven and provides a medium for dataset retrieval based on the suitability or usage in a given field. It also visualizes dataset and document citation relationship, and enables users to analyze a scientific document by uploading its full PDF. In this paper, we first discuss the reasons why the scientific community needs a system like Delve. We then proceed to introduce its internal design and explain how Delve works and how it is beneficial to researchers of all levels","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"31 1","pages":"36-46"},"PeriodicalIF":0.0,"publicationDate":"2017-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75291029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intelligent Disaster Response via Social Media Analysis A Survey","authors":"Tahora H. Nazer, G. Xue, Yusheng Ji, Huan Liu","doi":"10.1145/3137597.3137602","DOIUrl":"https://doi.org/10.1145/3137597.3137602","url":null,"abstract":"The success of a disaster relief and response process is largely dependent on timely and accurate information regarding the status of the disaster, the surrounding environment, and the a ected people. This information is primarily provided by rst responders on-site and can be enhanced by the firsthand reports posted in real-time on social media. Many tools and methods have been developed to automate disaster relief by extracting, analyzing, and visualizing actionable information from social media. However, these methods are not well integrated in the relief and response processes and the relation between the two requires exposition for further advancement. In this survey, we review the new frontier of intelligent disaster relief and response using social media, show stages of disasters which are reflected on social media, establish a connection between proposed methods based on social media and relief efforts by rst responders, and outline pressing challenges and future research directions.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"29 1","pages":"46-59"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88711113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Koen Verstrepen, Kanishka Bhaduri, B. Cule, Bart Goethals
{"title":"Collaborative Filtering for Binary, Positiveonly Data","authors":"Koen Verstrepen, Kanishka Bhaduri, B. Cule, Bart Goethals","doi":"10.1145/3137597.3137599","DOIUrl":"https://doi.org/10.1145/3137597.3137599","url":null,"abstract":"Traditional collaborative ltering assumes the availability of explicit ratings of users for items. However, in many cases these ratings are not available and only binary, positive-only data is available. Binary, positive-only data is typically associated with implicit feedback such as items bought, videos watched, ads clicked on, etc. However, it can also be the results of explicit feedback such as likes on social networking sites. Because binary, positive-only data contains no negative information, it needs to be treated differently than rating data. As a result of the growing relevance of this problem setting, the number of publications in this field increases rapidly. In this survey, we provide an overview of the existing work from an innovative perspective that allows us to emphasize surprising commonalities and key differences.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"1 1","pages":"1-21"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88590796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Common Pitfalls in Training and Evaluating Recommender Systems","authors":"Hung-Hsuan Chen, Chu-An Chung, Hsin-Chien Huang, Wen Tsui","doi":"10.1145/3137597.3137601","DOIUrl":"https://doi.org/10.1145/3137597.3137601","url":null,"abstract":"This paper formally presents four common pitfalls in training and evaluating recommendation algorithms for information systems. Specifically, we show that it could be problematic to separate the server logs into training and test data for model generation and model evaluation if the training and the test data are selected improperly. In addition, we show that click through rate { a common metric to measure and compare the performance of different recommendation algorithms -- may not be a good measurement of profitability { the income a recommendation module brings to a website. Moreover, we demonstrate that evaluating recommendation revenue may not be a straightforward task as it first looks. Unfortunately, these pitfalls appeared in many previous studies on recommender systems and information systems. We explicitly explain these problems and propose methods to address them. We conducted experiments to support our claims. Finally, we review previous papers and competitions that may suffer from these problems.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"4 1","pages":"37-45"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82522365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supervised Learning Techniques in Mobile Device Apps for Androids","authors":"Priyanka Basavaraju, A. Varde","doi":"10.1145/3068777.3068782","DOIUrl":"https://doi.org/10.1145/3068777.3068782","url":null,"abstract":"Mobile devices have become an integral part of our daily lives. Most people carry smartphones today almost everywhere; and have other mobile devices such as tablets, often more convenient than full-fledged laptops for work transit, short trips etc. This had led to development of apps for mobile devices, easy to download and access anywhere anytime. An important field improving human experiences on mobile devices is machine learning. This constitutes technqiues involving acquisition of knowledge, skills and understanding by machines from examples, guidance, experience or reflection to learn analogous to humans. Among learning paradigms herein, supervised learning comprises situations where labeled training samples are provided to administer the process, making it more regulated, similar to human instructors providing such examples with notions of correctness to guide human learners. Supervised learning techniques are useful in designing mobile apps as they entail guided examples capturing specific human needs and their reasoning in activities, e.g., classification. This paper gives a comprehensive review of a few useful supervised learning approaches along with their implementation in mobile apps, focusing on Androids as they constitute over 50% of the global smartphone market. It includes description of the approaches and portrays interesting Android apps deploying them, addressing classification and regression problems. We discuss the contributions and critiques of the apps and also present open issues with the potential for further research in related areas. This paper is expected to be useful to students, researchers and developers in mobile computing, human computer interaction, data mining and machine learning.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"48 1","pages":"18-29"},"PeriodicalIF":0.0,"publicationDate":"2017-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79086647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Abe, Yiqun Xie, S. Shekhar, C. Apté, Vipin Kumar, M. Tuinstra, Ranga Raju Vatsavai
{"title":"Data Science for Food, Energy and Water: A Workshop Report","authors":"N. Abe, Yiqun Xie, S. Shekhar, C. Apté, Vipin Kumar, M. Tuinstra, Ranga Raju Vatsavai","doi":"10.1145/3068777.3068779","DOIUrl":"https://doi.org/10.1145/3068777.3068779","url":null,"abstract":"At the 22nd ACM SIGKDD conference on Knowledge and Data Discovery (KDD), a workshop on Data Science for Food, Energy andWater (DSFEW) was held to foster an interdisciplinary community intersecting data science and societally important domains of food, energy and water. The workshop included keynotes, panel discussion, presentations and posters, and introduced the emerging area of DSFEW to ACM SIGKDD audience, and triggered interdisciplinary idea-sharing in DSFEW research. The workshop website is sites.google.com/site/2016dsfew.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"41 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2017-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77122132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai Shu, Suhang Wang, Jiliang Tang, R. Zafarani, Huan Liu
{"title":"User Identity Linkage across Online Social Networks: A Review","authors":"Kai Shu, Suhang Wang, Jiliang Tang, R. Zafarani, Huan Liu","doi":"10.1145/3068777.3068781","DOIUrl":"https://doi.org/10.1145/3068777.3068781","url":null,"abstract":"The increasing popularity and diversity of social media sites has encouraged more and more people to participate on multiple online social networks to enjoy their services. Each user may create a user identity, which can includes profile, content, or network information, to represent his or her unique public figure in every social network. Thus, a fundamental question arises -- can we link user identities across online social networks? User identity linkage across online social networks is an emerging task in social media and has attracted increasing attention in recent years. Advancements in user identity linkage could potentially impact various domains such as recommendation and link prediction. Due to the unique characteristics of social network data, this problem faces tremendous challenges. To tackle these challenges, recent approaches generally consist of (1) extracting features and (2) constructing predictive models from a variety of perspectives. In this paper, we review key achievements of user identity linkage across online social networks including stateof- the-art algorithms, evaluation metrics, and representative datasets. We also discuss related research areas, open problems, and future research directions for user identity linkage across online social networks.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"42 1","pages":"5-17"},"PeriodicalIF":0.0,"publicationDate":"2017-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91112234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}