2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)最新文献

筛选
英文 中文
Combining Static and Dynamic Analysis to Improve Machine Learning-based Malware Classification 结合静态和动态分析改进基于机器学习的恶意软件分类
2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-10-06 DOI: 10.1109/DSAA53316.2021.9564144
Rajchada Chanajitt, B. Pfahringer, Heitor Murilo Gomes
{"title":"Combining Static and Dynamic Analysis to Improve Machine Learning-based Malware Classification","authors":"Rajchada Chanajitt, B. Pfahringer, Heitor Murilo Gomes","doi":"10.1109/DSAA53316.2021.9564144","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564144","url":null,"abstract":"Windows Portable Executable files can be malformed for malicious purposes. There are many ways and tricks to circumvent standard security detection and protection measures. For example, one can bypass Windows Defender Firewall by creating a writable file in a user's temporary folder whose filename look like a legitimate process (e.g. svchost.exe, chrome32.exe, and dllhost32.exe) and executing them without user intervention. In this work, we leverage static properties and dynamic behaviour analysis for malware classification. For dynamic analysis, information is retrieved from the Falcon Sandbox malware website. On top of that, we also run malware in a virtualised Windows 10 environment to analyse memory dumps and generate even more features that may capture potential malicious behaviour. Three different classifiers are analysed in our empirical experiments: random forests, gradient boosting, and neural networks. The combination of static and dynamic features consistently yields a higher F1-score for every model compared to the same model trained using only static or dynamic features. The best models achieve F1-scores of up to 98.9%.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132634160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Generalized Optimization Embedded Framework of Undersampling Ensembles for Imbalanced Classification 不平衡分类欠采样集成的广义优化嵌入框架
2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-10-06 DOI: 10.1109/DSAA53316.2021.9564116
Hongjiao Guan, Yingtao Zhang, Bin Ma, Jian Li, Chun-peng Wang
{"title":"A Generalized Optimization Embedded Framework of Undersampling Ensembles for Imbalanced Classification","authors":"Hongjiao Guan, Yingtao Zhang, Bin Ma, Jian Li, Chun-peng Wang","doi":"10.1109/DSAA53316.2021.9564116","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564116","url":null,"abstract":"Imbalanced classification exists commonly in practical applications, and it has always been a challenging issue. Traditional classification methods have poor performance on imbalanced data, especially, on the minority class. However, the minority class is usually of our interest, and its misclassification cost is higher. The critical factor is the intrinsic complicated distribution characteristics in imbalanced data itself. Resampling ensemble learning achieves promising results and is a research focus recently. However, some resampling ensembles do not consider complicated distribution characteristics, thus limiting the performance improvement. In this paper, a generalized optimization embedded framework (GOEF) is proposed based on undersampling bagging. The GOEF aims to pay more attention to the learning of local regions to handle the complicated distribution characteristics. Specifically, the GOEF utilizes out-of-bag data to explore heterogeneous local areas and chooses misclassified examples to optimize base classifiers. The optimization can focus on a single class or both classes. Extensive experiments over synthetic and real datasets demonstrate that GOEF with the minority class optimization performs the best in terms of AUC, G-mean, and sensitivity, compared with five resampling ensemble methods.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128421224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
NEDRL-CIM:Network Embedding Meets Deep Reinforcement Learning to Tackle Competitive Influence Maximization on Evolving Social Networks NEDRL-CIM:网络嵌入与深度强化学习在不断发展的社会网络中解决竞争影响最大化问题
2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-10-06 DOI: 10.1109/DSAA53316.2021.9564111
Khurshed Ali, Chih-Yu Wang, Mi-Yen Yeh, Cheng-te Li, Yi-Shin Chen
{"title":"NEDRL-CIM:Network Embedding Meets Deep Reinforcement Learning to Tackle Competitive Influence Maximization on Evolving Social Networks","authors":"Khurshed Ali, Chih-Yu Wang, Mi-Yen Yeh, Cheng-te Li, Yi-Shin Chen","doi":"10.1109/DSAA53316.2021.9564111","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564111","url":null,"abstract":"Competitive Influence Maximization (CIM) aims to maximize the influence of a party given the competition from other parties in the same social network, like companies find key users to promote their competitive products on the social network to achieve maximum profit. Recently, learning-based solutions are introduced to tackle the competitive influence maximization problem. However, such studies focus on the static nature of social networks. This paper proposes a deep reinforcement learning-based framework employing network embedding, termed as DRL-EMB, to tackle the CIM problem on evolving social networks. The DRL-EMB key objective is to find the best strategy to maximize the party's reward, considering budget and competition with information propagation and network evolving being run in parallel. We validate our proposed framework with the DRL-based model using hand-crafted state features (DRL-HCF) and heuristic-based methods. Experimental results show that our proposed framework, DRL-EMB, achieves better results than heuristic-based and DRL-HCF models while significantly outperforming the DRL-HCF model in terms of time efficiency.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134572822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Early Classification of Time Series: Cost-based multiclass Algorithms 时间序列的早期分类:基于代价的多类算法
2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-10-06 DOI: 10.1109/DSAA53316.2021.9564134
Paul-Emile Zafar, Youssef Achenchabe, A. Bondu, A. Cornuéjols, V. Lemaire
{"title":"Early Classification of Time Series: Cost-based multiclass Algorithms","authors":"Paul-Emile Zafar, Youssef Achenchabe, A. Bondu, A. Cornuéjols, V. Lemaire","doi":"10.1109/DSAA53316.2021.9564134","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564134","url":null,"abstract":"Early classification of time series assigns each time series to one of a set of pre-defined classes using as few measurements as possible while preserving a high accuracy. This implies solving online the trade-off between the earliness and the prediction accuracy. This has been formalized in previous work where a cost-based framework taking into account both the cost of misclassification and the cost of delaying the decision has been proposed. The best resulting method, called Economy-$gamma$, is unfortunately so far limited to binary classification problems. This paper presents a set of six new methods that extend the Economy-$gamma$ method in order to solve multiclass classification problems. Extensive experiments on 33 datasets allowed us to compare the performance of the six proposed approaches to the state-of-the-art one. The results show that: (i) all proposed methods perform significantly better than the state of the art one; (ii) the best way to extend Economy-$gamma$ to multiclass problems is to use a confidence score, either the Gini index or the maximum probability.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"287 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124566607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Personalized Overdraft Protection Framework 个性化透支保护框架
2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-10-06 DOI: 10.1109/DSAA53316.2021.9564162
Karesia Ramlal, Patrick Hosein
{"title":"A Personalized Overdraft Protection Framework","authors":"Karesia Ramlal, Patrick Hosein","doi":"10.1109/DSAA53316.2021.9564162","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564162","url":null,"abstract":"Data and Artificial Intelligence are changing the business models of many financial institutions. The availability and granularity of customer data allows for the development of a personalized banking experience which has been shown to improve customer relationships and increase retention. We present a Machine Learning approach to providing personalized overdraft protection. The approach simultaneously provides benefits to both customer and bank and hence increases customer retention while improving the bank's revenue. We illustrate the approach with examples.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128800073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AssistML: A Concept to Recommend ML Solutions for Predictive Use Cases AssistML:为预测用例推荐ML解决方案的概念
2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-10-06 DOI: 10.1109/DSAA53316.2021.9564168
Alejandro Gabriel Villanueva Zacarias, Christian Weber, P. Reimann, B. Mitschang
{"title":"AssistML: A Concept to Recommend ML Solutions for Predictive Use Cases","authors":"Alejandro Gabriel Villanueva Zacarias, Christian Weber, P. Reimann, B. Mitschang","doi":"10.1109/DSAA53316.2021.9564168","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564168","url":null,"abstract":"The adoption of machine learning (ML) in organizations is characterized by the use of multiple ML software components. Citizen data scientists face practical requirements when building ML systems, which go beyond the known challenges of ML, e.g., data engineering or parameter optimization. They are expected to quickly identify ML system options that strike a suitable trade-off across multiple performance criteria. These options also need to be understandable for non-technical users. Addressing these practical requirements represents a problem for citizen data scientists with limited ML experience. This calls for a method to help them identify suitable ML software combinations. Related work, e.g., AutoML systems, are not responsive enough or cannot balance different performance criteria. In this paper, we introduce AssistML, a novel concept to recommend ML solutions, i.e., software systems with ML models, for predictive use cases. AssistML uses metadata of existing ML solutions to quickly identify and explain options for a new use case. We implement the approach and evaluate it with two exemplary use cases. Results show that AssistML proposes ML solutions that are in line with users' performance preferences in seconds.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116960474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Explainable clustering with multidimensional bounding boxes 具有多维边界框的可解释聚类
2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-10-06 DOI: 10.1109/DSAA53316.2021.9564220
M. Kuk, Szymon Bobek, G. J. Nalepa
{"title":"Explainable clustering with multidimensional bounding boxes","authors":"M. Kuk, Szymon Bobek, G. J. Nalepa","doi":"10.1109/DSAA53316.2021.9564220","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564220","url":null,"abstract":"Explainable Artificial Intelligence (XAI) aims at introducing transparency and intelligibility into decision-making process of AI systems. Most of the work in this area is focused on supervised machine learning tasks such as classification and regression. Unsupervised algorithms such as clustering can also be explained with existing approaches. This is most often achieved by explaining a classifier trained on cluster data with cluster labels as a dependant variable. However, with such a transformation the information about cluster shape and distribution is lost, which may lead to wrong interpretation of explanations. In this paper, we introduce a method that aids end experts in cluster analysis with human-readable rule-based explanations. We use state-of-the-art explanation mechanism on the multidimensional bounding boxes that represent arbitrarily-shaped clusters. We demonstrate our approach on reproducible synthetic datasets.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126713844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Hybrid Recommender System for Detection of Rare Cases Applied to Pulsar Candidate Selection 罕见情况检测混合推荐系统在脉冲星候选体选择中的应用
2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-10-06 DOI: 10.1109/DSAA53316.2021.9564139
Di Pang, K. Goseva-Popstojanova, M. Mclaughlin
{"title":"Hybrid Recommender System for Detection of Rare Cases Applied to Pulsar Candidate Selection","authors":"Di Pang, K. Goseva-Popstojanova, M. Mclaughlin","doi":"10.1109/DSAA53316.2021.9564139","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564139","url":null,"abstract":"Detection of extremely rare cases is a challenging problem for most machine learning algorithms, especially if class overlapping is present. In this paper we propose a hybrid recommender system that uses a target rare case to state users' requirements and ranks the candidates using a similarity function which is calculated as a weighted sum of individual feature similarities. Specifically, the weight of each feature is computed as a product of its association with the class label and the outlyingness of its value. We apply this hybrid recommender system on the radio pulsar candidate selection problem, for detection of two different types of rare cases: low signal-to-noise (S/N) pulsars and Fast Radio Bursts (FRBs). Our results show that the proposed approach successfully detects both low S/N pulsars and FRBs. When there is class overlapping, as in case of low S/N pulsars, treating rare feature values as outliers and increasing their weights in the similarity function improve the detection performance. For FRBs, which compared to the low S/N pulsars are relatively more distinguishable from the non-astrophysical signals, uniform weighting outperformed the feature-weighting methods. The proposed hybrid recommender system can be used in other application domains that share similar requirements such as high recall and face similar challenges such as class imbalance and class overlapping.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124152072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining Mavericks - A data-driven approach to detect spend leakage Mining Mavericks -一种数据驱动的方法来检测支出泄漏
2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-10-06 DOI: 10.1109/DSAA53316.2021.9564194
Priyadarshi, Anish Chaugule, M. Natu
{"title":"Mining Mavericks - A data-driven approach to detect spend leakage","authors":"Priyadarshi, Anish Chaugule, M. Natu","doi":"10.1109/DSAA53316.2021.9564194","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564194","url":null,"abstract":"Spend leakage in the procure-to-pay process is one of the prominent challenges that organizations across the globe face. Given the complex dynamics of the procurement process, spend analysis heavily relies on the tacit knowledge of experts. In this paper, we address the problem of detecting maverick spends using a data driven approach. We present approaches to model the behavior dynamics of the procurement process, detect mavericks, and recommend alternate procurement options. We demonstrate the effectiveness of this solution through a real-world case-study.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123383218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Framework for Statistically-Sound Customer Segment Search 一个统计可靠的客户细分搜索框架
2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2021-10-06 DOI: 10.1109/DSAA53316.2021.9564199
S. Amer-Yahia, Laure Berti-Équille, Abdelouahab Chibah
{"title":"A Framework for Statistically-Sound Customer Segment Search","authors":"S. Amer-Yahia, Laure Berti-Équille, Abdelouahab Chibah","doi":"10.1109/DSAA53316.2021.9564199","DOIUrl":"https://doi.org/10.1109/DSAA53316.2021.9564199","url":null,"abstract":"We develop S4, a Statistically-Sound Segment Search framework that combines principled data partitioning and sound statistical testing to verify common hypotheses in retail data and return interpretable customer data segments. Our framework accommodates one-sample, two-sample, and multiple-sample testing, to provide various aggregations and comparisons of customer transactions. To control the proportion of false discoveries in multiple hypothesis testing, we enforce an FDR-controlling procedure and formulate a unified optimization problem that returns customer data segments that satisfy the test for a given significance level, maximize coverage of the input data, and are within a risk capital. We develop a greedy algorithm to explore different data partitions and test multiple hypotheses in a sound manner. Our extensive experiments on four retail data sets examine the interaction between significance, risk and coverage, and demonstrate the expressivity, usefulness, and scalability of S4 in practice.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129691860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信