{"title":"Adaptive Noisy Data Augmentation for Regularized Construction of Undirected Graphical Models","authors":"Yinan Li, Fang Liu, Xiao Liu","doi":"10.1109/DSAA53316.2021.9564128","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564128","url":null,"abstract":"We develop the AdaPtive Noise Augmentation (PANDA) technique to regularize the estimation of undirected graphical models. PANDA iteratively optimizes the objective function given adaptively augmented data to achieve regularization on model parameters. The augmented noisy data is designed to deliver various regularization effects on single graph estimation as well as simultaneous construction of multiple graphs, including but not limited to $l_{gamma}$ for $gammain[0,2]$, elastic net, SCAD, group lasso, and adaptive lasso in single graph estimations; and the joint group lasso and the joint fused ridge regularizations for multiple graph estimation. PANDA can be seamlessly implemented in practice in software that implements generalized linear models and users do not have to employ ad-hoc optimizers to minimize regularized loss functions for graph construction. We show the non-inferiority of PANDA in various types of graph estimation in simulated data, benchmarked against some common graph estimation methods. We also apply PANDA to an autism spectrum disorder dataset to construct a graph with mixed node types and to a lung cancer microarray data set to simultaneously construct four protein networks, demonstrating the effectiveness of PANDA in constructing practically interpretable and meaningful graphical models.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133945893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alessia Galdeman, Cheick Tidiane Ba, Matteo Zignani, S. Gaito
{"title":"A Multilayer Network Perspective on Customer Segmentation Through Cashless Payment Data","authors":"Alessia Galdeman, Cheick Tidiane Ba, Matteo Zignani, S. Gaito","doi":"10.1109/DSAA53316.2021.9564187","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564187","url":null,"abstract":"Customer segmentation is a central problem in different business processes. In the last few years, it is also becoming important for banking and financial institutions given the ever-growing volume of cashless payments. When dealing with customer segmentation with transactional data, the clustering approach is widely used. In this work, we propose a different modeling approach for customer segmentation based on a graph-based representation. Specifically, we reformulate customer segmentation as a community detection problem on a similarity multi-layer network, where each layer depends on a specific cashless payment method. We introduce a vector-based representation of the cardholders' spending patterns, namely the purchase profile, to build the similarity multi-layer network. The profiles capture how customers allocate their spending capacity among merchant categories through different payment systems. From purchase profiles, we evaluate the similarity of the cardholders in terms of consumption allocation and we infer different similarity graphs based on credit and debit card payments. Different segmentation strategies based on multi-layer community detection methods have been evaluated on a large-scale dataset of credit and debit card transactions of a banking group. Since one of the main goals is verifying the feasibility of graph-based approaches for customer segmentation, we discuss the outcomes of the methods in terms of explainability of the resulting segments. Specifically, methods based on random walks, such as Infomap, return more stable and insightful results than modularity-based ones, in different settings. To sum up, we experiment with community detection algorithms to cope with the customer segmentation problem starting from a large set of credit and debit card transactions. The outcome of the solutions may support recently developed methods for bank risk assessment based on clients' behavior or targeted applications for cashless payment management.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132178057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rocio Nahime Torres, P. Fraternali, Andrea Biscontini
{"title":"On the Use of Class Activation Maps in Remote Sensing: the case of Illegal Landfills","authors":"Rocio Nahime Torres, P. Fraternali, Andrea Biscontini","doi":"10.1109/DSAA53316.2021.9564243","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564243","url":null,"abstract":"Remote sensing image scene classification consists of classifying images of the Earth surface into scene categories that represent different semantic ones based on the ground objects and their spatial arrangement. Finding the objects within a scene is not trivial, because they can appear in different sizes and mutual positions. An open issue in scene classification with CNNs is understating if the network prediction relies on the clues that human Earth Observation experts consider. A suitable approach for investigating the inference process of neural models relies on Class Activation Maps, which emphasize the areas of an image contributing the most to the classification. This work evaluates CAMs for different CNNs methods, in terms of their capacity to identify the objects that determine the classification of scenes for the illegal landfill detection. Quantitative and qualitative analyses show that ECA-Net has consistent performance across all metrics, resulting the most promising approach to obtain CNNs that focus on the most relevant points with the higher IoU. The illustrated analysis is a step towards the computer-aided study of the variations of scene elements positioning and spatial relations that constitute hints of the presence of illegal waste dumps and opens the way to the application of weakly supervised techniques for training detectors of illegal landfills in large scale remote sensing image repositories.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127685273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
O. S. Pabón, Alberto Blázquez-Herranz, M. Torrente, A. R. González, M. Provencio, Ernestina Menasalvas Ruiz
{"title":"Extracting Cancer Treatments from Clinical Text written in Spanish: A Deep Learning Approach","authors":"O. S. Pabón, Alberto Blázquez-Herranz, M. Torrente, A. R. González, M. Provencio, Ernestina Menasalvas Ruiz","doi":"10.1109/DSAA53316.2021.9564137","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564137","url":null,"abstract":"Extracting accurate information about cancer patients' treatments is crucial to support clinical research, treatment planning, and to improve clinical care outcomes. However, treatment information resides in unstructured clinical text, making the task of data structuring especially challenging. Although several approaches have been proposed to extract treatments from clinical text, most of these proposals have focused on the English language. In this paper, we propose a deep learning-based approach to extract cancer treatments from clinical text written in Spanish. This approach uses a Bidirectional Long Short Memory (BiLSTM) neural net with a CRF layer to perform Named Entity Recognition. An annotated corpus from clinical text written about lung cancer patients is used to train the BiLSTM-based model. Performed tests have shown a performance of 90% in the F1-score, suggesting the feasibility of our approach to extract cancer treatments from clinical narratives.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117004347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Overfitting Measurement of Deep Neural Networks Using No Data","authors":"Satoru Watanabe, H. Yamana","doi":"10.1109/DSAA53316.2021.9564119","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564119","url":null,"abstract":"Overfitting reduces the generalizability of deep neural networks (DNNs). Overfitting is generally detected by comparing the accuracies and losses of training and validation data; however, the detection method requires vast amounts of training data and is not always effective for forthcoming data due to the heterogeneity between training and forthcoming data. The dropout technique has been employed to prevent DNNs from overfitting, where the neurons in DNNs are invalidated randomly during their training. It has been hypothesized that this technique prevents DNNs from overfitting by restraining the co-adaptions among neurons. This hypothesis implies that overfitting of a DNN is a result of the co-adaptions among neurons and can be detected by investigating the inner representation of DNNs. Thus, we propose a method to detect overfitting of DNNs using no training and test data. The proposed method measures the degree of co-adaptions among neurons using persistent homology (PH). The proposed PH-based overfitting measure (PHOM) method constructs clique complexes on DNNs using the trained parameters of DNNs, and the one-dimensional PH investigates the co-adaptions among neurons. Thus, PHOM requires no training and test data to measure overfitting. We applied PHOM to convolutional neural networks trained for the classification problems of the CIFAR-10, SVHN, and Tiny ImageNet data sets. The experimental results demonstrate that PHOM reveals the degree of overfitting of DNNs to the training data, which suggests that PHOM enables us to filter overfitted DNNs without requiring the training and test data.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123242423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arturo Redondo, Belén Ríos-Sánchez, G. Vigueras, B. Otero, R. López, M. Torrente, Ernestina Menasalvas Ruiz, M. Provencio, A. R. González
{"title":"Towards Treatment Patterns Validation in Lung Cancer Patients","authors":"Arturo Redondo, Belén Ríos-Sánchez, G. Vigueras, B. Otero, R. López, M. Torrente, Ernestina Menasalvas Ruiz, M. Provencio, A. R. González","doi":"10.1109/DSAA53316.2021.9564176","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564176","url":null,"abstract":"Lung cancer is the leading cause of cancer death. From the estimation of cases that will be in 2021, more than 230,000 new cases are expected to be of lung cancer patients, with an estimation of more than 131,000 deaths. Improving the survival rates or the patient's quality of life is partially covered by a common element: treatments. Collective knowledge about cancer treatment recommendations is typically included in clinical guidelines, intended to optimize patient care and assist clinicians in lung cancer treatment. These guidelines define a set of treatment paths, where recommendations depend on cancer disease aspects and individual features for a concrete patient. Although oncologists are expected to follow clinical guidelines, the inter and intrapatients' variability of response to the possible treatment combinations, makes it necessary to personalize different treatment-patterns on certain cases. Additionally, clinical guidelines are not frequently updated with new findings or lack a consistent methodology when they are frequently updated. For that reason, the analysis of patterns on both patients treated following the standard of care, or outside it, would allow to validate clinical guidelines and identify potential new treatment recommendations. In this work, we have analysed whether actual treatments prescribed to lung cancer patients follow clinical guidelines or not. Using a machine learning method that provides as output association rules (Apriori), we identify patterns based on cancer stage. These preliminary results show that treatments patterns found mostly match with clinical guidelines recommendations, validating the information included in the consulted guidelines.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115382497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adversarially Generating Rank-Constrained Graphs","authors":"William Shiao, E. Papalexakis","doi":"10.1109/DSAA53316.2021.9564202","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564202","url":null,"abstract":"Graph generation is a task that has been explored with a wide variety of methods. Recently, several papers have applied Generative Adversarial Networks (GANs) to this task, but most of these methods result in graphs of full or unknown rank. Many real-world graphs have low rank, which roughly translates to the number of communities in that graph. Furthermore, it has been shown that taking the low rank approximation of a graph can defend against adversarial attacks. This suggests that testing models against graphs of different rank may be useful. However, current methods provide no way to control the rank of generated graphs. In this paper, we propose two variants of BRGAN: GAN architectures that generates synthetic graphs, which in addition to having realistic graph features, also have bounded rank. Our first variant, BRGAN-A, generates synthetic graphs competitive with state-of-the-art models, with rank equal to or lower than the desired rank. Our second variant, BRGAN-B, generates graphs of almost exactly the desired rank, but results in less realistic results. We also propose a novel rank penalty term on the generator, which allows us to control this realism-rank tradeoff.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115449718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-Phase Multi-armed Bandit for Online Recommendation","authors":"Cairong Yan, Haixia Han, Zijian Wang, Yanting Zhang","doi":"10.1109/DSAA53316.2021.9564225","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564225","url":null,"abstract":"Personalized online recommendations strive to adapt their services to individual users by making use of both item and user information. Despite recent progress, the issue of balancing exploitation-exploration (EE) [1] remains challenging. In this paper, we model the personalized online recommendation of e-commence as a two-phase multi-armed bandit problem. This is the first time that “big arm” and “small arm” are introduced into multi-armed bandit (MAB), and a two-stage strategy is adopted to provide target users with the most suitable recommendation list. In the first phase, MAB is used to obtain an item subset that users may be interested in from a large number of items. We use item categories as arms instead of individual items in existing related models to control the arm scale and reduce computational complexity. In the second phase, we directly use the items generated in the first phase as arms of MAB and obtain rewards through fine-grained implicit feedback from users. Empirical studies on three real-world datasets show that our proposed method TPBandit performs better than state-of-the-art bandit-based recommendation methods in several evaluation metrics such as Precision, Recall, and Hit Ratio. Moreover, the two-phase method improves the recommendation performance by nearly 50% compared to the one-phase method in the best case.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130990074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sequential Dependency Enhanced Graph Neural Networks for Session-based Recommendations","authors":"Wei Guo, Shoujin Wang, Wenpeng Lu, Hao Wu, Qian Zhang, Zhufeng Shao","doi":"10.1109/DSAA53316.2021.9564224","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564224","url":null,"abstract":"Session-based recommendations (SBR) play an important role in many real-world applications, such as e-commerce and media streaming. To perform accurate session-based recommendations, it is crucial to capture both sequential dependencies over a sequence of adjacent items and complex item transitions over a set of items within sessions. Note that item transitions are not necessarily dependent on sequential dependencies, e.g., the transition from one item to the other distant item in a session is often not sequential. However, almost all the existing session-based recommender systems (SBRS) fail to consider both kinds of information, which leads to their limited performance improvement. Aiming at this deficiency, we propose a novel sequential dependency enhanced graph neural network (SDE-GNN) to capture both sequential dependencies and item transition relations over items within sessions for more accurate next-item recommendations. Specifically, we first devise a sequential dependency learning module to capture the sequential dependencies over a sequence of adjacent items in each session. Then, we propose an item transition learning module to capture complex transitions between items. In the module, a novel residual gate and a specialized attention mechanism are integrated into gate-GNN to build an attention augmented GNN, called AU-GNN. Finally, we devise a gated fusion component to combine the learned sequential dependencies and item transitions together in preparation for the subsequent next-item recommendations. Exhaustive experiments on two public real-world data sets demonstrate the superiority of SDE-GNN over the state-of-the-art methods.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130653290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Soft-Churn: Optimal Switching between Prepaid Data Subscriptions on E-SIM support Smartphones","authors":"Patrick Hosein, Gabriela Sewdhan, Aviel Jailal","doi":"10.1109/DSAA53316.2021.9564163","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564163","url":null,"abstract":"One new trending feature of Smartphones is the support for E-SIM (Embedded Subscriber Identification Module) cards. These allow the user to simultaneously subscribe to multiple cellular providers while also supporting at most one physical SIM (Subscriber Identification Module) card. This feature allows customers to easily switch between providers and is especially useful for those who use prepaid plans which are popular in developing countries. A customer may have multiple providers and, at any point in time, can choose the provider with the most cost effective data plan. This means that cellular providers must now take into account “soft-churn”, where the consumer dynamically switches between multiple plans from multiple providers, in addition to the more traditional churn where a consumer switches providers. This means that data pricing for such consumers must now be more personalized in order to be competitive and maximize profits. We determine the optimal personalized prepaid plan for such users while providing a competitive advantage to the provider. Examples are provided to demonstrate the benefit and numerical results corroborate our premise that these personalized pricing plans can, in fact, increase provider revenue.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126841622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}