Andrew Starnes, Anton Dereventsov, E. S. Blazek, Folasade Phillips
{"title":"Modeling Non-deterministic Human Behaviors in Discrete Food Choices","authors":"Andrew Starnes, Anton Dereventsov, E. S. Blazek, Folasade Phillips","doi":"10.1109/ICDMW58026.2022.00131","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00131","url":null,"abstract":"We establish a non-deterministic model that predicts a user's food preferences from their demographic information. Our simulator is based on NHANES dataset and domain expert knowledge in the form of established behavioral studies. Our model can be used to generate an arbitrary amount of synthetic datapoints that are similar in distribution to the original dataset and align with behavioral science expectations. Such a simulator can be used in a variety of machine learning tasks and especially in applications requiring human behavior prediction.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133009751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Distributed Algorithms for Minimum Spanning Tree in Dense Graphs","authors":"M. Bateni, Morteza Monemizadeh, Kees Voorintholt","doi":"10.1109/ICDMW58026.2022.00106","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00106","url":null,"abstract":"In recent years, the Massively Parallel Computation (MPC) model capturing the MapReduce framework has become the de facto standard model for large-scale data analysis, given the ubiquity of efficient and affordable cloud implementations. In this model, an input of size $m$ is initially distributed among $t$ machines, each with a local space of size $s$. Computation proceeds in synchronous rounds in which each machine performs arbitrary local computation on its data and then sends messages to other machines. In this paper, we study the Minimum Spanning Tree (MST) problem for dense graphs in the MPC model. We say a graph $G(V, E)$ is relatively dense if $m=Theta(n^{1+c})$ where $n=vert Vvert$ is the number of vertices, $m=vert Evert$ is the number of edges in this graph, and $0 < cleq 1$. We develop the first work- and space-efficient MPC algorithm that with high probability computes an MST of $G$ using $lceillogfrac{c}{epsilon}rceil+1$ rounds of communication. As an MPC algorithm, our algorithm uses $t=O(n^{c-epsilon})$ machines each one having local storage of size $s=O(n^{1+epsilon})$ for any $0 < epsilonleq c$. Indeed, not only is this algorithm very simple and easy to implement, it also simultaneously achieves optimal total work, per-machine space, and number of rounds.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131828476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DeepDive: Deep Latent Factor Model for Enhancing Diversity in Recommender Systems","authors":"Kriti Kumar, A. Majumdar, M. Chandra","doi":"10.1109/ICDMW58026.2022.00031","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00031","url":null,"abstract":"Most collaborative filtering techniques concentrate on increasing the accuracy of business-to-customer recommender systems. Emphasis on accuracy alone leads to repetitive recommendations based on user's past preferences; such predictions pose a problem from both business and user's perspective as they fail to recommend niche items and maintain the user's interest. Incorporating diversity in recommendations can overcome these issues. Most prior studies include diversity by randomizing the item-set predicted by the collaborating filtering technique. These techniques do not have control over the accuracy vs. diversity trade-off; one needs to be mindful that a drastic loss in accuracy is not acceptable from the recommender system. Our work proposes a deep latent factor model with a diversity cost/penalty that allows us to control the trade-off between diversity and accuracy. Experimental results obtained with the Movielens dataset demonstrate the superior performance of our proposed method in providing relevant, novel, and diverse recommendations compared to state-of-the-art techniques; with a slight drop in accuracy, our proposed method provides an improvement in different established measures of diversity.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131830047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TPE-AutoClust: A Tree-based Pipline Ensemble Framework for Automated Clustering","authors":"Radwa El Shawi, S. Sakr","doi":"10.1109/ICDMW58026.2022.00149","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00149","url":null,"abstract":"Novel technologies in automated machine learning ease the complexity of building well-performed machine learning pipelines. However, these are usually restricted to supervised learning tasks such as classification and regression, while unsu-pervised learning, particularly clustering, remains a largely un-explored problem due to the ambiguity involved when evaluating the clustering solutions. Motivated by this shortcoming, in this paper, we introduce TPE-AutoClust, a genetic programming-based automated machine learning framework for clustering. TPE-AutoCl ust optimizes a series of feature preprocessors and machine learning models to optimize the performance on an unsupervised clustering task. TPE-AutoClust mainly consists of three main phases: meta-learning phase, optimization phase and clustering ensemble construction phase. The meta-learning phase suggests some instantiations of pipelines that are likely to perform well on a new dataset. These pipelines are used to warmstart the optimization phase that adopts a multi-objective optimization technique to select pipelines based on the Pareto front of the trade-off between the pipeline length and performance. The ensemble construction phase develops a collaborative mechanism based on a clustering ensemble to combine optimized pipelines based on different internal cluster validity indices and construct a well-performing solution for a new dataset. The proposed framework is based on scikit-learn with 4 preprocessors and 6 clustering algorithms. Extensive experiments are conducted on 27 real and synthetic benchmark datasets to validate the superiority of TPE-AutoCl ust. The results show that TPE-AutoClust outperforms the state-of-the-art techniques for building automated clustering solutions.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134143316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Abedi, Faranak Dayyani, Charlene H. Chu, Shehroz S. Khan
{"title":"MAISON - Multimodal AI-based Sensor platform for Older Individuals","authors":"A. Abedi, Faranak Dayyani, Charlene H. Chu, Shehroz S. Khan","doi":"10.1109/ICDMW58026.2022.00040","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00040","url":null,"abstract":"There is a global aging population requiring the need for the right tools that can enable older adults' greater independence and the ability to age at home, as well as assist healthcare workers. It is feasible to achieve this objective by building predictive models that assist healthcare workers in monitoring and analyzing older adults' behavioral, functional, and psychological data. To develop such models, a large amount of multimodal sensor data is typically required. In this paper, we propose MAISON, a scalable cloud-based platform of commercially available smart devices capable of collecting desired multimodal sensor data from older adults and patients living in their own homes. The MAISON platform is novel due to its ability to collect a greater variety of data modalities than the existing platforms, as well as its new features that result in seamless data collection and ease of use for older adults who may not be digitally literate. We demonstrated the feasibility of the MAISON platform with two older adults discharged home from a large rehabilitation center. The results indicate that the MAISON platform was able to collect and store sensor data in a cloud without functional glitches or performance degradation. This paper will also discuss the challenges faced during the development of the platform and data collection in the homes of the older adults. MAISON is a novel platform designed to collect multimodal data and facilitate the development of predictive models for detecting key health indicators, including social isolation, depression, and functional decline, and is feasible to use with older adults in the community.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134090460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised DeepView: Global Uncertainty Visualization for High Dimensional Data","authors":"Carina Newen, Emmanuel Müller","doi":"10.1109/ICDMW58026.2022.00086","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00086","url":null,"abstract":"In recent years, more and more visualization methods for explanations of artificial intelligence have been proposed that focus on untangling black box models for single instances of the data set. While the focus often lies on supervised learning algorithms, the study of uncertainty estimations in the unsupervised domain for high-dimensional data sets in the explainability domain has been neglected so far. As a result, existing visualization methods struggle to visualize global uncertainty patterns over whole datasets. We propose Unsupervised DeepView, the first global uncertainty visualization method for high dimensional data based on a novel unsupervised proxy for local uncertainties. In this paper, we exploit the mathematical notion of local intrinsic dimensionality as a measure of local data complexity. As a label-agnostic measure of model uncertainty in unsupervised machine learning, it shows two highly desirable features: It can be used for global structure visualization as well as for the detection of local adversarials. In our empirical evaluation, we demonstrate its ability both in visualizations and quantitative analysis for unsupervised models on multiple datasets.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132889585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-purpose Recommender Platform using Perceiver IO","authors":"Ali Cevahir, Kentaro Kanada","doi":"10.1109/ICDMW58026.2022.00126","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00126","url":null,"abstract":"Web services usually require many different types of recommender systems using large amount of user log and content data, in order to provide personalized content to their customers. Different recommenders may share the same customer-base or cross-use models/data. It is challenging to design different models for each recommendation task. In this work, we propose a general-purpose framework for various recommendation tasks based on Perceiver IO model. Perceiver lOis a general ma-chine learning architecture based on transformer-style attention modules, which helps eliminating feature engineering for various tasks. Different type of recommenders can be developed with minimal modifications and models can be transferred among dif- ferent tasks. Our experiments with a variety of recommendation scenarios confirm that our framework is able to handle those tasks while achieving state-of-the-art accuracy.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116079730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Recommendation System Framework to Generalize AutoRec and Neural Collaborative Filtering","authors":"Ramin Raziperchikolaei, Young-joo Chung","doi":"10.1109/ICDMW58026.2022.00151","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00151","url":null,"abstract":"AutoRec and neural collaborative filtering (NCF) are two widely used neural network-based frameworks in the recommendation system literature. In this paper, we show that these two apparently very different frameworks have a lot in common. We propose a general neural network-based frame-work, which gives us flexibility in choosing elements in the input sources, prediction functions, etc. Then, we show that AutoRec and NCF are special forms of our generalized framework. In our experimental results, first, we compare different variants of NCF and Autorec. Then, we indicate that it is necessary to use our general framework since there is no specific structure that performs well in all datasets. Finally, we show that by choosing the right elements, our framework outperforms the state-of-the-art methods with complicated structures.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125822967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable Joins over Big Data Streams: Actual and Future Research Trends","authors":"A. Cuzzocrea","doi":"10.1109/ICDMW58026.2022.00132","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00132","url":null,"abstract":"Joins are at the basis of a plethora of big data analytics tools over massive big data streams. Developed in the context of static data sets, joins have emerged as of tremendous interest in the context of streaming data sets, due to their versatility in a wide range of applicative settings, ranging from environmental networks to logistics systems, from smart city applications to healthcare systems, from energy management systems to prognostic tools, and so forth. Joins over big data streams has traditionally attracted the attention of a growing part of the database and data mining community, then landing in the wider big data community. Following these considerations, this paper proposes a critical review of actual and future trends in the context of scalable joins over big data streams.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125963153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ismael Villanueva-Miranda, M. Hossain, Monika Akbar
{"title":"Human Mobility Driven Modeling of an Infectious Disease","authors":"Ismael Villanueva-Miranda, M. Hossain, Monika Akbar","doi":"10.1109/ICDMW58026.2022.00155","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00155","url":null,"abstract":"In conventional disease models, disease properties are dominant parameters (e.g., infection rate, incubation pe-riod). As seen in the recent literature on infectious diseases, human behavior - particularly mobility - plays a crucial role in spreading diseases. This paper proposes an epidemiological model named SEIRD+m that considers human mobility instead of modeling disease properties alone. SEIRD+m relies on the core deterministic epidemic model SEIR (Susceptible, Exposed, Infected, and Recovered), adds a new compartment D - Dead, and enhances each SEIRD component by human mobility information (such as time, location, and movements) retrieved from cell-phone data collected by SafeGraph. We demonstrate a way to reduce the number of infections and deaths due to COVID-19 by restricting mobility on specific Census Block Groups (CBGs) detected as COVID-19 hotspots. A case study in this paper depicts that a reduction of mobility by 50 % could help reduce the number of infections and deaths in significant percentages in different population groups based on race, income, and age.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123548756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}