2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)最新文献_第7页

Analyzing Cultural Assimilation through the Lens of Yelp Restaurant Reviews 从Yelp餐厅评论的角度分析文化同化

2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-10-06 DOI: 10.1109/DSAA53316.2021.9564170

Zaiqian Chen, Joonsuk Park

{"title":"Analyzing Cultural Assimilation through the Lens of Yelp Restaurant Reviews","authors":"Zaiqian Chen, Joonsuk Park","doi":"10.1109/DSAA53316.2021.9564170","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564170","url":null,"abstract":"Given the steady stream of immigrants from around the world, cultural assimilation in North America has long been a topic of interest. However, existing research focuses only on assimilation to North American culture, overlooking the mutual influence, with a very limited use of data-driven approaches. In this paper, we investigate assimilation among various cultures in North America through the lens of discussions surrounding food. We first present Cross-Cuisine Cross-Region LDA (c3rLDA), a novel probabilistic graphical model to jointly uncover latent topics shared across cuisines, as well as their regional variants for each cuisine. Then, we employ the model on 3.7 million Yelp restaurant reviews to find that cuisines assimilate to one another in varying degrees depending on the cuisines involved, the topic, and the region: A cuisine tends to be more influenced by other cuisines if it is regularly fused with others (e.g. Japanese), for certain topics (e.g. breakfast and dessert), and in specific regions (e.g. stronger Mexican influence in the Southwestern US and French influence in the East Canada). Lastly, we demonstrate that the topics generated by our model, on which the qualitative analysis is based, are more coherent than or comparable to those generated by existing neural and non-neural topic models. This work represents the first step toward large-scale data-driven analysis of cultural assimilation in North America, which is made possible by the abundant data available in social media.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127849795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discovery of Patient Phenotypes through Multi-layer Network Analysis on the Example of Tinnitus 以耳鸣为例，通过多层网络分析发现患者表型

2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-10-06 DOI: 10.1109/DSAA53316.2021.9564158

Clara Puga, Uli Niemann, Vishnu Unnikrishnan, Miro Schleicher, W. Schlee, M. Spiliopoulou

{"title":"Discovery of Patient Phenotypes through Multi-layer Network Analysis on the Example of Tinnitus","authors":"Clara Puga, Uli Niemann, Vishnu Unnikrishnan, Miro Schleicher, W. Schlee, M. Spiliopoulou","doi":"10.1109/DSAA53316.2021.9564158","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564158","url":null,"abstract":"Electronic health records (EHR) often include multiple perspectives on a patient's current state of well-being (e.g. vital signs and subjective indicators measured by questionnaires). In this study, we use these perspectives to build phenotypes of chronic tinnitus patients and investigate how these phenotypes are associated with response to treatment. Therefore, we model patients as nodes in a network, where those perspectives are interpreted as layers of a multi-layer network. To identify phenotypes of patients in the network, we implement a community detection algorithm. Some of these communities can be considered as phenotypes if they represent subgroups of patients that are similar according to the investigated perspectives. Furthermore, we analyze the influence of the layers on the final community structure of patients. We then propose a method to add layers given their community structure similarity. Finally, we fit a model, per community, to predict the treatment outcome. In some communities, this prediction outperformed the baseline scenario where the predictor was fitted to all patients.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116811002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

The use of machine learning to identify the correctness of HS Code for the customs import declarations 利用机器学习识别海关进口报关单HS编码的正确性

2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-10-06 DOI: 10.1109/DSAA53316.2021.9564203

Hao Chen, Ben van Rijnsoever, Marcel Molenhuis, Dennis van Dijk, Yao-Hua Tan, B. Rukanova

{"title":"The use of machine learning to identify the correctness of HS Code for the customs import declarations","authors":"Hao Chen, Ben van Rijnsoever, Marcel Molenhuis, Dennis van Dijk, Yao-Hua Tan, B. Rukanova","doi":"10.1109/DSAA53316.2021.9564203","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564203","url":null,"abstract":"As an increasing volume of international trade activities around the world, the amount of cross-boarder import declarations grows rapidly, resulting in an unprecedented scale of potentially fraudulent transactions, in particular false commodity code (e.g., HS Code). The incorrect HS Code will cause duty risk and adversely impact the revenue collection. Physical investigation by the customs administrations is impractical due to the substantial quantity of declarations. This paper provides an automatic approach by harnessing the power of machine learning techniques to relief the burden of customs targeting officers. We introduced a novel model based on the off-the-shelf embedding encoder to identify the correctness of HS Code without any human effort. Determining whether the HS Code is correctly matched with commodity description is a classification task, so the labelled data is typically required. However, the lack of gold standard labelled data sets in customs domain limits the development of supervised-based approach. Our model is developed by the unsupervised mechanism and trained on the unlabelled historical declaration records, which is robust and able to be smoothly adapted by the different customs administrations. Rather than typically classifying whether the HS Code is correct or not, our model predicts the score to indicate the degree of the HS Code being correct. We have evaluated our proposed model on the ground-truth data set provided by Dutch customs officers. Results show promising performance of 71% overall accuracy.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115228141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Dynamic Graph Convolutional LSTM application for traffic flow estimation from error-prone measurements: results and transferability analysis 动态图卷积LSTM应用于从容易出错的测量中估计交通流量:结果和可转移性分析

2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-10-06 DOI: 10.1109/DSAA53316.2021.9564206

Safa Boudabous, S. Clémençon, H. Labiod, Julian Garbiso

{"title":"Dynamic Graph Convolutional LSTM application for traffic flow estimation from error-prone measurements: results and transferability analysis","authors":"Safa Boudabous, S. Clémençon, H. Labiod, Julian Garbiso","doi":"10.1109/DSAA53316.2021.9564206","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564206","url":null,"abstract":"The technological advances in the transportation and automotive industry led to the use of new types of sensing systems more cost-effective and adapted to large-scale dense deployment. Those sensing techniques allow continuously gathering traffic measurements times series in different geospatial locations. The accuracy of the obtained raw measurements is often hindered by different factors related to the sensing environment and the sensing process itself and thus fail to capture the short-term traffic variations crucial for real-time traffic monitoring. In this paper, we propose the DGC-LSTM model for area-wide traffic estimation from error-prone measurements time series. The backbone of the DGC-LSTM model is a graph convolutional Long Short Term Memory model with a dynamic adjacency matrix. The adjacency matrix is learned and optimized during the model training. The adjacency matrix values are estimated from the set of contextual features that impact the dynamicity of the dependencies in both the spatial and temporal dimensions. Experiments on a realistic synthetic labelled Bluetooth counts dataset is used for model evaluation. Lastly, we highlight the importance of transfer learning methods to improve the model applicability by ensuring model adaptation to the new deployment site while avoiding the extensive data-labelling effort.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122236358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

XPROAX-Local explanations for text classification with progressive neighborhood approximation xproax -渐进邻域逼近文本分类的局部解释

2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-09-30 DOI: 10.1109/DSAA53316.2021.9564153

Yi Cai, A. Zimek, Eirini Ntoutsi

{"title":"XPROAX-Local explanations for text classification with progressive neighborhood approximation","authors":"Yi Cai, A. Zimek, Eirini Ntoutsi","doi":"10.1109/DSAA53316.2021.9564153","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564153","url":null,"abstract":"The importance of the neighborhood for training a local surrogate model to approximate the local decision boundary of a black box classifier has been already highlighted in the literature. Several attempts have been made to construct a better neighborhood for high dimensional data, like texts, by using generative autoencoders. However, existing approaches mainly generate neighbors by selecting purely at random from the latent space and struggle under the curse of dimensionality to learn a good local decision boundary. To overcome this problem, we propose a progressive approximation of the neighborhood using counterfactual instances as initial landmarks and a careful 2-stage sampling approach to refine counterfactuals and generate factuals in the neighborhood of the input instance to be explained. Our work focuses on textual data and our explanations consist of both word-level explanations from the original instance (intrinsic) and the neighborhood (extrinsic) and factual- and counterfactual-instances discovered during the neighborhood generation process that further reveal the effect of altering certain parts in the input text. Our experiments on real-world datasets demonstrate that our method outperforms the competitors in terms of usefulness and stability (for the qualitative part) and completeness, compactness and correctness (for the quantitative part).","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128604362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Towards optimized actions in critical situations of soccer games with deep reinforcement learning 基于深度强化学习的足球比赛关键情况下的优化动作研究

2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-09-14 DOI: 10.1109/DSAA53316.2021.9564207

Pegah Rahimian, Afshin Oroojlooy, László Toka

{"title":"Towards optimized actions in critical situations of soccer games with deep reinforcement learning","authors":"Pegah Rahimian, Afshin Oroojlooy, László Toka","doi":"10.1109/DSAA53316.2021.9564207","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564207","url":null,"abstract":"Soccer is a sparse rewarding game: any smart or careless action in critical situations can change the result of the match. Therefore players, coaches, and scouts are all curious about the best action to be performed in critical situations, such as the times with a high probability of losing ball possession or scoring a goal. This work proposes a new state representation for the soccer game and a batch reinforcement learning to train a smart policy network. This network gets the contextual information of the situation and proposes the optimal action to maximize the expected goal for the team. We performed extensive numerical experiments on the soccer logs made by InStat for 104 European soccer matches. The results show that in all 104 games, the optimized policy obtains higher rewards than its counterpart in the behavior policy. Besides, our framework learns policies that are close to the expected behavior in the real world. For instance, in the optimized policy, we observe that some actions such as foul, or ball out can be sometimes more rewarding than a shot in specific situations.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131075345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Parallel Multi-Graph Convolution Network For Metro Passenger Volume Prediction 基于并行多图卷积网络的地铁客运量预测

2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-08-29 DOI: 10.1109/DSAA53316.2021.9564196

Fuchen Gao, Zhanquan Wang, Zhenguang Liu

{"title":"Parallel Multi-Graph Convolution Network For Metro Passenger Volume Prediction","authors":"Fuchen Gao, Zhanquan Wang, Zhenguang Liu","doi":"10.1109/DSAA53316.2021.9564196","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564196","url":null,"abstract":"Accurate prediction of metro passenger volume (number of passengers) is valuable to realize real-time metro system management, which is a pivotal yet challenging task in intelligent transportation. Due to the complex spatial correlation and temporal variation of urban subway ridership behavior, deep learning has been widely used to capture nonlinear spatial-temporal dependencies. Unfortunately, the current deep learning methods only adopt graph convolutional network as a component to model spatial relationship, without making full use of the different spatial correlation patterns between stations. In order to further improve the accuracy of metro passenger volume prediction, a deep learning model composed of Parallel multi-graph convolution and stacked Bidirectional unidirectional Gated Recurrent Unit (PB-GRU) was proposed in this paper. The parallel multi-graph convolution captures the origin-destination (OD) distribution and similar flow pattern between the metro stations, while bidirectional gated recurrent unit considers the passenger volume sequence in forward and backward directions and learns complex temporal features. Extensive experiments on two real-world datasets of subway passenger flow show the efficacy of the model. Surprisingly, compared with the existing methods, PB-GRU achieves much lower prediction error.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131191914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A Neural Approach for Detecting Morphological Analogies 一种检测形态学类比的神经方法

2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-08-09 DOI: 10.1109/DSAA53316.2021.9564186

Safa Alsaidi, Amandine Decker, Puthineath Lay, Esteban Marquer, Pierre-Alexandre Murena, Miguel Couceiro

引用次数: 17

Interpretable Summaries of Black Box Incident Triaging with Subgroup Discovery 带有子组发现的黑匣子事件分类的可解释摘要

2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-08-06 DOI: 10.1109/DSAA53316.2021.9564164

Youcef Remil, Anes Bendimerad, M. Plantevit, C. Robardet, Mehdi Kaytoue-Uberall

{"title":"Interpretable Summaries of Black Box Incident Triaging with Subgroup Discovery","authors":"Youcef Remil, Anes Bendimerad, M. Plantevit, C. Robardet, Mehdi Kaytoue-Uberall","doi":"10.1109/DSAA53316.2021.9564164","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564164","url":null,"abstract":"The need of predictive maintenance comes with an increasing number of incidents reported by monitoring systems and equipment/software users. In the front line, on-call engineers (OCEs) have to quickly assess the degree of severity of an incident and decide which service to contact for corrective actions. To automate these decisions, several predictive models have been proposed, but the most efficient models are opaque (say, black box), strongly limiting their adoption. In this paper, we propose an efficient black box model based on 170K incidents reported to our company over the last 7 years and emphasize on the need of automating triage when incidents are massively reported on thousands of servers running our product, an ERP. Recent developments in eXplainable Artificial Intelligence (XAI) help in providing global explanations to the model, but also, and most importantly, with local explanations for each model prediction/outcome. Sadly, providing a human with an explanation for each outcome is not conceivable when dealing with an important number of daily predictions. To address this problem, we propose an original data-mining method rooted in Subgroup Discovery, a pattern mining technique with the natural ability to group objects that share similar explanations of their black box predictions and provide a description for each group. We evaluate this approach and present our preliminary results which give us good hope towards an effective OCE's adoption. We believe that this approach provides a new way to address the problem of model agnostic outcome explanation.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134329690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Reducing Unintended Bias of ML Models on Tabular and Textual Data 减少机器学习模型在表格和文本数据上的意外偏差

2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-08-05 DOI: 10.1109/DSAA53316.2021.9564112

Guilherme Alves, M. Amblard, Fabien Bernier, Miguel Couceiro, A. Napoli

引用次数: 10