{"title":"Classes versus Communities: Outlier Detection and Removal in Tabular Datasets via Social Network Analysis (ClaCO)","authors":"Serkan Üçer, Tansel Özyer, R. Alhajj","doi":"10.1109/ASONAM55673.2022.10068694","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068694","url":null,"abstract":"In this research, we introduce a model to detect inconsistent & anomalous samples in tabular labeled datasets which are used in machine learning classification tasks, frequently. Our model, abbreviated as the ClaCO (Classes vs. Communities: SNA for Outlier Detection), first converts tabular data with labels into an attributed and labeled undirected network graph. Following the enrichment of the graph, it analyses the edge structure of the individual egonets, in terms of the class and community belongings, by introducing a new SNA metric named as ‘the Consistency Score of a Node - CSoN’. Through an exhaustive analysis of the ego network of a node, CSoN tries to exhibit consistency of a node by examining the similarity of its immediate neighbors in terms of shared class and/or shared community belongings. To prove the efficiency of the proposed ClaCO, we employed it as a subsidiary method for detecting anomalous samples in the train part in the traditional ML classification task. With the help of this new consistency score, the least CSoN scored set of nodes flagged as outliers and removed from the training dataset, and remaining part fed into the ML model to see the effect on classification performance with the ‘whole’ dataset through competing outlier detection methods. We have shown this outlier detection model as an efficient method since it improves classification performance both on the whole dataset and reduced datasets with competing outlier detection methods, over several known both real-life and synthetic datasets.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124028485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Faster Greedy Optimization of Resistance-based Graph Robustness","authors":"Maria Predari, R. Kooij, Henning Meyerhenke","doi":"10.1109/ASONAM55673.2022.10068613","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068613","url":null,"abstract":"The total effective resistance, also called the Kirchhoff index, provides a robustness measure for a graph $G$. We consider the optimization problem of adding $k$ new edges to $G$ such that the resulting graph has minimal total effective resistance (i. e., is most robust). The total effective resistance and effective resistances between nodes can be computed using the pseudoinverse of the graph Laplacian. The pseudoinverse may be computed explicitly via pseudoinversion; yet, this takes cubic time in practice and quadratic space. We instead exploit combinatorial and algebraic connections to speed up gain computations in established generic greedy heuristics. Moreover, we leverage existing randomized techniques to boost the performance of our approaches by introducing a sub-sampling step. Our different graph- and matrix-based approaches are indeed significantly faster than the state-of-the-art greedy algorithm, while their quality remains reasonably high and is often quite close. Our experiments show that we can now process large graphs for which the application of the state-of-the-art greedy approach was infeasible before. As far as we know, we are the first to be able to process graphs with $100K+$ nodes in the order of minutes.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121359365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Machine Learning Approach to Identify Toxic Language in the Online Space","authors":"Lisa Kaati, A. Shrestha, N. Akrami","doi":"10.1109/ASONAM55673.2022.10068619","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068619","url":null,"abstract":"In this study, we trained three machine learning models to detect toxic language on social media. These models were trained using data from diverse sources to ensure that the models have a broad understanding of toxic language. Next, we evaluate the performance of our models on a dataset with samples of data from a large number of diverse online forums. The test dataset was annotated by three independent annotators. We also compared the performance of our models with Perspective API - a toxic language detection model created by Jigsaw and Google's Counter Abuse Technology team. The results showed that our classification models performed well on data from the domains they were trained on (Fl = 0.91, 0.91, & 0.84, for the RoBERTa, BERT, & SVM respectively), but the performance decreased when they were tested on annotated data from new domains (Fl = 0.80, 0.61, 0.49, & 0.77, for the RoBERTa, BERT, SVM, & Google perspective, respectively). Finally, we used the best-performing model on the test data (RoBERTa, ROC = 0.86) to examine the frequency (/proportion) of toxic language in 21 diverse forums. The results of these analyses showed that forums for general discussions with moderation (e.g., Alternate history) had much lower proportions of toxic language compared to those with minimal moderation (e.g., 8Kun). Although highlighting the complexity of detecting toxic language, our results show that model performance can be improved by using a diverse dataset when building new models. We conclude by discussing the implication of our findings and some directions for future research.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128523453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joobin Gharibshah, Jakapun Tachaiya, Arman Irani, E. Papalexakis, M. Faloutsos
{"title":"IKEA: Unsupervised domain-specific keyword-expansion","authors":"Joobin Gharibshah, Jakapun Tachaiya, Arman Irani, E. Papalexakis, M. Faloutsos","doi":"10.1109/ASONAM55673.2022.10068656","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068656","url":null,"abstract":"How can we expand an initial set of keywords with a target domain in mind? A possible application is to use the expanded set of words to search for specific information within the domain of interest. Here, we focus on online forums and specifically security forums. We propose IKEA, an iterative embedding-based approach to expand a set of keywords with a domain in mind. The novelty of our approach is three-fold: (a) we use two similarity expansions in the word-word and post-post spaces, (b) we use an iterative approach in each of these expansions, and (c) we provide a flexible ranking of the identified words to meet the user needs. We evaluate our method with data from three security forums that span five years of activity and the widely-used Fire benchmark. IKEA outperforms previous solutions by identifying more relevant keywords: it exhibits more than 0.82 MAP and 0.85 NDCG in a wide range of initial keyword sets. We see our approach as an essential building block in developing methods for harnessing the wealth of information available in online forums.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130001663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shailik Sarkar, Abdulaziz Alhamadani, Lulwah Alkulaib, Chang-Tien Lu
{"title":"Predicting Depression and Anxiety on Reddit: a Multi-task Learning Approach","authors":"Shailik Sarkar, Abdulaziz Alhamadani, Lulwah Alkulaib, Chang-Tien Lu","doi":"10.1109/ASONAM55673.2022.10068655","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068655","url":null,"abstract":"One of the strongest indicators of a mental health crisis is how people interact with each other or express them-selves. Hence, social media is an ideal source to extract user-level information about the language used to express personal feelings. In the wake of the ever-increasing mental health crisis in the United States, it is imperative to analyze the general well-being of a population and investigate how their public social media posts can be used to detect different underlying mental health conditions. For that purpose, we propose a study that collects posts from “reddits” related to different mental health topics to detect the type of the post and the nature of the mental health issues that correlate to the post. The task of detecting mental health related issues indicates the mental health conditions connected to the posts. To achieve this, we develop a multi-task learning model that leverages, for each post, both the latent embedding space of words and topics for prediction with a message passing mechanism enabling the sharing of information for related tasks. We train the model through an active learning approach in order to tackle the lack of standardized fine-grained label data for this specific task.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130847519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding the Impact of Culture in Assessing Helpfulness of Online Reviews","authors":"Khaled Alanezi, Nuha Albadi, Omar Hammad, Maram Kurdi, Shivakant Mishra","doi":"10.1109/ASONAM55673.2022.10068664","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068664","url":null,"abstract":"Online reviews have become essential for users to make informed decisions in everyday tasks ranging from planning summer vacations to purchasing groceries and making financial investments. A key problem in using online reviews is the overabundance of online that overwhelms the users. As a result, recommendation systems for providing helpfulness of reviews are being developed. This paper argues that cultural background is an important feature that impacts the nature of a review written by the user, and must be considered as a feature in assessing the helpfulness of online reviews. The paper provides an in-depth study of differences in online reviews written by users from different cultural backgrounds and how incorporating culture as a feature can lead to better review helpfulness recommendations. In particular, we analyze online reviews originating from two distinct cultural spheres, namely Arabic and Western cultures, for two different products, hotels and books. Our analysis demonstrates that the nature of reviews written by users differs based on their cultural backgrounds and that this difference varies based on the specific product being reviewed. Finally, we have developed six different review helpfulness recommendation models that demonstrate that taking culture into account leads to better recommendations.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129321133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Social Network Analysis on Interpretable Compressed Sparse Networks","authors":"Connor C. J. Hryhoruk, C. Leung","doi":"10.1109/ASONAM55673.2022.10068716","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068716","url":null,"abstract":"Big data are everywhere. World Wide Web is an example of these big data. It has become a vast data production and consumption platform, at which threads of data evolve from multiple devices, by different human interactions, over worldwide locations, under divergent distributed settings. Embedded in these big web data is implicit, previously unknown and potentially useful information and knowledge that awaited to be discovered. This calls for web intelligence solutions, which make good use of data science and data mining (especially, web mining or social network mining) to discover useful knowledge and important information from the web. As a web mining task, web structure mining aims to examine incoming and outgoing links on web pages and make recommendations of frequently referenced web pages to web surfers. As another web mining task, web usage mining aims to examine web surfer patterns and make recommendations of frequently visited pages to web surfers. While the size of the web is huge, the connection among all web pages may be sparse. In other words, the number of vertex nodes (i.e., web pages) on the web is huge, the number of directed edges (i.e., incoming and outgoing hyperlinks between web pages) may be small. This leads to a sparse web. In this paper, we present a solution for interpretable mining of frequent patterns from sparse web. In particular, we represent web structure and usage information by bitmaps to capture connections to web pages. Due to the sparsity of the web, we compress the bitmaps, and use them in mining influential patterns (e.g., popular web pages). For explainability of the mining process, we ensure the compressed bitmaps are interpretable. Evaluation on real-life web data demonstrates the effectiveness, interpretability and practicality of our solution for interpretable mining of influential patterns from sparse web.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122996405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Customer Lifetime Value Prediction with K-means Clustering and XGBoost","authors":"Marius Myburg, S. Berman","doi":"10.1109/ASONAM55673.2022.10068602","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068602","url":null,"abstract":"Customer lifetime value (CLV) is the revenue expected from a customer over a given time period. CLV customer segmentation is used in marketing, resource management and business strategy. Practically, it is customer segmentation rather than revenue, and a specific timeframe rather than entire lifetimes, that is of interest. A long-standing method of CLV segmentation involves using a variant of the RFM model - an approach based on Recency, Frequency and Monetary value of past purchases. RFM is popular due to its simplicity and understandability, but it is not without its pitfalls. In this work, XGBoost and K-means clustering were used to address problems with the RFM approach: determining relative weightings of the three variables, choice of CLV segmentation method, and ability to predict future CLV segments based on current data. The system was able to predict CLV, loyalty and marketability segments with 77-78% accuracy for the immediate future, and 74-75% accuracy for the longer term. Experimentation also showed that using RFM alone is sufficient, as augmenting the features with additional purchase data did not improve results.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126900659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Attention Mechanism indicating Item Novelty for Sequential Recommendation","authors":"Li-Chia Wang, Hao-Shang Ma, Jen-Wei Huang","doi":"10.1109/ASONAM55673.2022.10068599","DOIUrl":"https://doi.org/10.1109/ASONAM55673.2022.10068599","url":null,"abstract":"Most sequential recommendation systems, including those that employ a variety of features and state-of-the-art network models, tend to favor items that are the most popular or of greatest relevance to the historic behavior of the user. Recommendations made under these conditions tend to be repetitive; i.e., many options that might be of interest to users are entirely disregarded. This paper presents a novel algorithm that assigns a novelty score to potential recommendation items. We also present an architecture by which to incorporate this functionality in existing recommendation systems. In experiments, the proposed NASM system outperformed state-of-the-art sequential recommender systems, thereby verifying that the inclusion of novelty score can indeed improve recommendation performance.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120994487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FOSINT-SI 2022 Symposium Organizing Committee","authors":"R. Alhajj","doi":"10.1109/asonam.2014.6921537","DOIUrl":"https://doi.org/10.1109/asonam.2014.6921537","url":null,"abstract":"","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126569257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}