2018 IEEE International Conference on Big Knowledge (ICBK)最新文献_第4页

[Copyright notice] (版权)

2018 IEEE International Conference on Big Knowledge (ICBK) Pub Date : 2018-11-01 DOI: 10.1109/icbk.2018.00003

引用次数: 0

Matrix Profile XIII: Time Series Snippets: A New Primitive for Time Series Data Mining 矩阵轮廓XIII:时间序列片段:时间序列数据挖掘的一种新基元

2018 IEEE International Conference on Big Knowledge (ICBK) Pub Date : 2018-11-01 DOI: 10.1109/ICBK.2018.00058

Shima Imani, Frank Madrid, W. Ding, S. Crouter, Eamonn J. Keogh

引用次数: 36

Stochastic Optimization for Market Return Prediction Using Financial Knowledge Graph 基于金融知识图的市场收益预测随机优化

2018 IEEE International Conference on Big Knowledge (ICBK) Pub Date : 2018-11-01 DOI: 10.1109/ICBK.2018.00012

Xiaoyi Fu, Xinqi Ren, O. Mengshoel, Xindong Wu

{"title":"Stochastic Optimization for Market Return Prediction Using Financial Knowledge Graph","authors":"Xiaoyi Fu, Xinqi Ren, O. Mengshoel, Xindong Wu","doi":"10.1109/ICBK.2018.00012","DOIUrl":"https://doi.org/10.1109/ICBK.2018.00012","url":null,"abstract":"Interactive prediction of financial instrument returns is important. It is needed for asset managers to generate trading strategies as well as for stock exchange regulators to discover pricing anomalies. In this paper, we introduce an integrated stochastic optimization technique, namely genetic programming (GP) with generalized crowding (GC), GP+GC, as an integrated approach for a market return prediction system, using a financial knowledge graph (KG). On the one hand, using time-series data for twenty-nine component stocks of the Dow Jones industrial average, we show that our stochastic local search method can give a better prediction performance by providing a comparison of its return performances with two traditional benchmarks, namely a Buy & Hold strategy and the Moving Average Convergence Divergence (MACD) technical indicator. On the other hand, we use features extracted from a time-evolving knowledge graph constructed from fifty component stocks of the SSE50 Index. These features are used to a GP variant and then incorporate the knowledge extracted from the expression learnt from GP into a KG. Overall, this work demonstrates how to integrate GP+GC with KGs in a powerful manner.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124248946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Don't Do Imputation: Dealing with Informative Missing Values in EHR Data Analysis 不要代入:处理电子病历数据分析中的信息缺失值

2018 IEEE International Conference on Big Knowledge (ICBK) Pub Date : 2018-11-01 DOI: 10.1109/ICBK.2018.00062

Jia Li, Mengdie Wang, M. Steinbach, Vipin Kumar, György J. Simon

{"title":"Don't Do Imputation: Dealing with Informative Missing Values in EHR Data Analysis","authors":"Jia Li, Mengdie Wang, M. Steinbach, Vipin Kumar, György J. Simon","doi":"10.1109/ICBK.2018.00062","DOIUrl":"https://doi.org/10.1109/ICBK.2018.00062","url":null,"abstract":"Missing values pose a significant challenge in data analytic, especially in clinical studies, data is typically missing-not-at-random (MNAR). Applying techniques (e.g. imputations) that were designed for missing-at-random (MAR) to MNAR data, can lead to biases. In this work, we propose pattern-wise analysis, a collection of methods for building predictive models in the presence of MNAR missing values. On a per-pattern basis, this methodology constructs an individual model for each missingness pattern. We show that even the simplest pattern-wise method, Per-Pattern Modeling (PPM) outperforms models built on data sets completed by the most popular imputation methods. PPM faces difficulty when the number of missingness patterns is too high or when the missingness patterns have too few observations. We developed variants of PPM to overcome these challenges from three complementary perspectives: (i) from a model selection perspective, where PPM can select patterns to build models; (ii) a distributional perspective, where the training data set is expanded in a distribution-preserving fashion; and (iii) from a causal perspective, where a causal structure for the MNAR mechanism is assumed and exploited to convert the problem from MNAR to MAR. Evaluation of the proposed methods on both synthetic MNAR data and a real-world clinical data set of sepsis patients shows notable improvement over traditional approaches.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123142370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Opponent Resource Prediction in StarCraft Using Imperfect Information 基于不完全信息的《星际争霸》对手资源预测

2018 IEEE International Conference on Big Knowledge (ICBK) Pub Date : 2018-11-01 DOI: 10.1109/ICBK.2018.00056

W. Hamilton, M. Shafiq

引用次数: 3

LINKSOCIAL: Linking User Profiles Across Multiple Social Media Platforms LINKSOCIAL:跨多个社交媒体平台链接用户档案

2018 IEEE International Conference on Big Knowledge (ICBK) Pub Date : 2018-11-01 DOI: 10.1109/ICBK.2018.00042

V. Sharma, C. Dyreson

{"title":"LINKSOCIAL: Linking User Profiles Across Multiple Social Media Platforms","authors":"V. Sharma, C. Dyreson","doi":"10.1109/ICBK.2018.00042","DOIUrl":"https://doi.org/10.1109/ICBK.2018.00042","url":null,"abstract":"Social media connects individuals to on-line communities through a variety of platforms, which are partially funded by commercial marketing and product advertisements. A recent study reported that 92% of businesses rated social media marketing as very important. Accurately linking the identity of users across various social media platforms has several applications viz. marketing strategy, friend suggestions, multi platform user behavior, information verification etc. We propose LINKSOCIAL, a large-scale, scalable, and efficient system to link social media profiles. Unlike most previous research that focuses mostly on pair-wise linking (e.g., Facebook profiles paired to Twitter profiles), we focus on linking across multiple social media platforms. L INK S OCIAL has three steps: (1) extract features from user profiles and build a cost function, (2) use Stochastic Gradient Descent to calculate feature weights, and (3) perform pair-wise and multi-platform linking of user profiles. To reduce the cost of computation, L INK S OCIAL uses clustering to perform candidate pair selection. Our experiments show that L INK S OCIAL predicts with 92% accuracy on pair-wise and 74% on multi-platform linking of three well-known social media platforms. Data used in our approach will be available at http://vishalshar.github.io/data/.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115757614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Principal Sample Analysis for Data Reduction 数据约简的主样本分析

2018 IEEE International Conference on Big Knowledge (ICBK) Pub Date : 2018-11-01 DOI: 10.1109/ICBK.2018.00054

Benyamin Ghojogh, Mark Crowley

{"title":"Principal Sample Analysis for Data Reduction","authors":"Benyamin Ghojogh, Mark Crowley","doi":"10.1109/ICBK.2018.00054","DOIUrl":"https://doi.org/10.1109/ICBK.2018.00054","url":null,"abstract":"Data reduction is an essential technique used for purifying data, training discriminative models more efficiently, encouraging generalizability, and for using less storage space for memory-limited systems. The literature on data reduction focuses mostly on dimensionality reduction, however, data sample reduction (i.e. removal of data points from a dataset) has its own benefits and is no less important given growing sizes of datasets and the growing need for usable data analysis methods on the network edge. This paper proposes a new data sample reduction method, Principal Sample Analysis (PSA), which reduces the number (population) of data samples as a preprocessing step for classification. PSA ranks the samples of each class considering how well they represent it and enables better discriminative learning by using the sparsity and similarity of samples at the same time. Data sample reduction then occurs by cutting off the lowest ranked samples. The PSA method can work alongside any other data reduction/expansion and classification method. Experiments are carried out on three datasets (WDBC, AT&T, and MNIST) with contrasting characteristics and show the state-of-the-art effectiveness of the proposed method.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127238212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Confidence-Aware Negative Sampling Method for Noisy Knowledge Graph Embedding 噪声知识图嵌入的置信度感知负抽样方法

2018 IEEE International Conference on Big Knowledge (ICBK) Pub Date : 2018-11-01 DOI: 10.1109/ICBK.2018.00013

Yingchun Shan, Chenyang Bu, Xiaojian Liu, Shengwei Ji, Lei Li

{"title":"Confidence-Aware Negative Sampling Method for Noisy Knowledge Graph Embedding","authors":"Yingchun Shan, Chenyang Bu, Xiaojian Liu, Shengwei Ji, Lei Li","doi":"10.1109/ICBK.2018.00013","DOIUrl":"https://doi.org/10.1109/ICBK.2018.00013","url":null,"abstract":"Knowledge graph embedding (KGE) can benefit a variety of downstream tasks, such as link prediction and relation extraction, and has therefore quickly gained much attention. However, most conventional embedding models assume that all triple facts share the same confidence without any noise, which is inappropriate. In fact, many noises and conflicts can be brought into a knowledge graph (KG) because of both the automatic construction process and data quality problems. Fortunately, the novel confidence-aware knowledge representation learning (CKRL) framework was proposed, to incorporate triple confidence into translation-based models for KGE. Though effective at detecting noises, with uniform negative sampling methods, and a harsh triple quality function, CKRL could easily cause zero loss problems and false detection issues. To address these problems, we introduce the concept of negative triple confidence and propose a confidence-aware negative sampling method to support the training of CKRL in noisy KGs. We evaluate our model on the knowledge graph completion task. Experimental results demonstrate that the idea of introducing negative triple confidence can greatly facilitate performance improvement in this task, which confirms the capability of our model in noisy knowledge representation learning (NKRL).","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133009800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Semi-Supervised Representation Learning: Transfer Learning with Manifold Regularized Auto-Encoders 半监督表示学习:流形正则化自编码器的迁移学习

2018 IEEE International Conference on Big Knowledge (ICBK) Pub Date : 2018-11-01 DOI: 10.1109/ICBK.2018.00019

Yi Zhu, Xuegang Hu, Yuhong Zhang, Peipei Li

引用次数: 2

Fast Approximate Hubness Reduction for Large High-Dimensional Data 大型高维数据的快速近似轮毂约简

2018 IEEE International Conference on Big Knowledge (ICBK) Pub Date : 2018-11-01 DOI: 10.1109/ICBK.2018.00055

Roman Feldbauer, Maximilian Leodolter, C. Plant, A. Flexer

引用次数: 10