{"title":"Decomposing TripAdvisor: Detecting Potentially Fraudulent Hotel Reviews in the Era of Big Data","authors":"Christopher G. Harris","doi":"10.1109/ICBK.2018.00040","DOIUrl":"https://doi.org/10.1109/ICBK.2018.00040","url":null,"abstract":"The impact of customer reviews on user purchase decisions has been well documented. For example, a one-star increase in a restaurant's Yelp rating can lead to a 5 to 9 percent increase in revenue. Unfortunately, this has motivated some businesses in review-dependent industries to falsify reviews. In the era of big data, analytical methods have made detection of these false reviews easier. We perform a longitudinal study of 2.65 million hotel reviews made by nearly 320,000 reviewers on TripAdvisor (which does not verify customers stayed at the property they reviewed) and compare them to 2.93 million reviews on other two other booking platforms, Agoda and Booking.com (which verify its reviewers stayed at least one night at the property they reviewed). We analyze the language used, the patterns of reviewer activity, and the change in hotel's reputation score over time across more than 5.5 million reviews. We find the word frequency between the two types of websites and the patterns of reviewer activity differ considerably, even though the relative ranking of hotel reputation scores across review platforms are similar.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127069113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Risk Factor Analysis of Bone Mineral Density Based on Feature Selection in Type 2 Diabetes","authors":"Wei Wang, Bingbing Jiang, S. Ye, Liting Qian","doi":"10.1109/ICBK.2018.00037","DOIUrl":"https://doi.org/10.1109/ICBK.2018.00037","url":null,"abstract":"Type 2 diabetes (T2DM), one of the most common chronic diseases, predisposes bone to fragility fracture, which brings the heavy burden of medical care costs and affection on quality of life. Altered bone mineral density (BMD) is closely linked to T2DM-related bone fragility fracture. In this study, we adopt the feature selection technique to learning the most relevant or informative risk factors of BMD based on the clinical data set including general clinical data and glucose metabolic indexes of patients with T2DM. To illustrate the effectiveness and superiority of feature selection technique, eight state-of-the-art feature selection algorithms are exploited to select the subset of risk factors. This study successfully uses machine learning methods to implement risk factor analysis and prediction of BMD in patients with T2DM based on the easily obtained data in community medical institutions, which will be beneficial for the management of T2DM-related bone fracture in the primary healthcare systems.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131275760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discriminative Graph Autoencoder","authors":"Haifeng Jin, Qingquan Song, X. Hu","doi":"10.1109/ICBK.2018.00033","DOIUrl":"https://doi.org/10.1109/ICBK.2018.00033","url":null,"abstract":"With the abundance of graph-structured data in various applications, graph representation learning has become an effective computational tool for seeking informative vector representations for graphs. Traditional graph kernel approaches are usually frequency-based. Each dimension of a learned vector representation for a graph is the frequency of a certain type of substructure. They encounter high computational cost for counting the occurrence of predefined substructures. The learned vector representations are very sparse, which prohibit the use of inner products. Moreover, the learned vector representations are not in a smooth space since the values can only be integers. The state-of-the-art approaches tackle the challenges by changing kernel functions instead of producing better vector representations. They can only produce kernel matrices for kernel-based methods and not compatible with methods requiring vector representations. Effectively learning smooth vector representations for graphs of various structures and sizes remains a challenging task. Motivated by the recent advances in deep autoencoders, in this paper, we explore the capability of autoencoder on learning representations for graphs. Unlike videos or images, the graphs are usually of various sizes and are not readily prepared for autoencoder. Therefore, a novel framework, namely discriminative graph autoencoder (DGA), is proposed to learn low-dimensional vector representations for graphs. The algorithm decomposes the large graphs into small subgraphs, from which the structural information is sampled. The DGA produces smooth and informative vector representations of graphs efficiently while preserving the discriminative information according to their labels. Extensive experiments have been conducted to evaluate DGA. The experimental results demonstrate the efficiency and effectiveness of DGA comparing with traditional and state-of-the-art approaches on various real-world datasets and applications, e.g., classification and visualization.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"518 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134011960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neural Networks for Predicting the Output of wind flow Simulations Over Complex Topographies","authors":"Michael Mayo, S. Wakes, C. Anderson","doi":"10.1109/ICBK.2018.00032","DOIUrl":"https://doi.org/10.1109/ICBK.2018.00032","url":null,"abstract":"We use deep learning techniques to model computational fluid dynamics (CFD) simulations of wind flow over a complex topography. Our motivation is to \"speed up\" the optimisation of CFD-based simulations (such as the 3D wind farm layout optimisation problem) by developing surrogate models capable of predicting the output of a simulation at any given point in 3D space, given output from a set of training simulations that have already been run. Our promising results using TensorFlow show that deep neural networks can be learned to model CFD outputs with an error of as low as 2.5 meters per second.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"66 2-3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132942513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shaohua Jia, D. Koh, Amanda Seccia, Pasha Antonenko, Richard L. Lamb, Andreas Keil, M. Schneps, M. Pomplun
{"title":"Biometric Recognition Through Eye Movements Using a Recurrent Neural Network","authors":"Shaohua Jia, D. Koh, Amanda Seccia, Pasha Antonenko, Richard L. Lamb, Andreas Keil, M. Schneps, M. Pomplun","doi":"10.1109/ICBK.2018.00016","DOIUrl":"https://doi.org/10.1109/ICBK.2018.00016","url":null,"abstract":"Eye movement biometrics have traditionally been tackled by using handcrafted features which lead to complex computation and heavy reliance on experimental design. The authors of this study present a general recurrent neural network framework for biometric recognition through eye movements whereby the dynamic features and temporal dependencies are automatically learned from a short data window extracted from a sequence of raw eye movement signals. The model works in a task-independent manner by using short-term feature vectors combined with using different stimuli in training and testing. The model is trained end-to-end using backpropagation and mini-batch gradient descent. We evaluate our model on a dataset with 32 subjects presented with static images, and the results show that our deep learning model significantly outperforms previous methods. The achieved Rank-1 Identification Rate (Rank-1 IR) for the identification scenario is 96.3% and the Equal Error Rate (EER) for the verification scenario is 0.85%.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129749771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Learning Capabilities of Recurrent Neural Networks: A Cryptographic Perspective","authors":"Shivin Srivastava, Ashutosh Bhatia","doi":"10.1109/ICBK.2018.00029","DOIUrl":"https://doi.org/10.1109/ICBK.2018.00029","url":null,"abstract":"It has been proven that Recurrent Neural Networks (RNNs) are Turing Complete, i.e. for any given computable function there exists a finite RNN to compute it. Consequently, researchers have trained Recurrent Neural Networks to learn simple functions like sorting, addition, compression and more recently, even classical cryptographic ciphers such as the Enigma. In this paper, we try to identify the characteristics of functions that make them easy or difficult for the RNN to learn. We look at functions from a cryptographic point of view by studying the ways in which the output depends on the input. We use cryptographic parameters (confusion and diffusion) for determining the strength of a cipher and quantify this dependence to show that a strong correlation exists between the learning capability of an RNN and the function's cryptographic parameters.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132197701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dimche Kostadinov, Behrooz Razeghi, S. Voloshynovskiy, Sohrab Ferdowsi
{"title":"Learning Discrimination Specific, Self-Collaborative and Nonlinear Model","authors":"Dimche Kostadinov, Behrooz Razeghi, S. Voloshynovskiy, Sohrab Ferdowsi","doi":"10.1109/ICBK.2018.00048","DOIUrl":"https://doi.org/10.1109/ICBK.2018.00048","url":null,"abstract":"This paper presents a novel nonlinear transform model for learning of collaboration structured, discriminative and sparse representations. The idea is to model a collaboration corrective functionality between multiple nonlinear transforms in order to reduce the uncertainty in the estimate. The focus is on the joint estimation of data-adaptive nonlinear transforms (NTs) that take into account a collaboration component w.r.t. a discrimination target. The joint model includes minimum information loss, collaboration corrective and discriminative priors. The model parameters are learned by minimizing an approximation to the empirical negative log likelihood of the model, where we propose an efficient solution by an iterative, coordinate descent algorithm. Numerical experiments validate the potential of the learning principle. The preliminary results show advantages in comparison to the stateof-the-art methods, w.r.t. the learning time, the discriminative quality and the recognition accuracy.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134431362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-label Learning Based on Label-Specific Feature Extraction","authors":"Ting Nie","doi":"10.1109/ICBK.2018.00047","DOIUrl":"https://doi.org/10.1109/ICBK.2018.00047","url":null,"abstract":"In the framework of multi-label learning, each instance is represented by a feature vector and is simultaneously assigned with more than one class label. Multi-label data usually present the characteristics of high dimension, much redundant information, and so on, which make dimensionality reduction technology more and more important in multi label learning. Since different class labels may have their own unique characteristics, they are called label-specific features. Based on the above assumption, we propose a multi-label learning approach with label specific features called MLLSFE to extract low dimensional features for all labels. The proposed algorithm implements the label-specific feature extraction by the thought of pairwise constraint dimensionality reduction. Extensive experimental results conducted on different datasets show that the proposed algorithm can effectively promote the classification performance in multi-label learning.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128088172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Detection of Warped Patterns in Time Series: The Caterpillar Algorithm","authors":"Maximilian Leodolter, Norbert Brändle, C. Plant","doi":"10.1109/ICBK.2018.00063","DOIUrl":"https://doi.org/10.1109/ICBK.2018.00063","url":null,"abstract":"Detection of similar representations of a given query time series within longer time series is an important task in many applications such as finance, activity research, text mining and many more. Identifying time warped instances of different lengths but similar shape within longer time series is still a difficult problem. We propose the novel Caterpillar algorithm which fuses the advantages of Dynamic Time Warping (DTW) and the Minimum Description Length (MDL) principle to move a sliding window in a crawling-like way into the future and past of a time series. To demonstrate the wide field of application and validity, we compare our method against stateof-the-art methods on accelerometer time series and synthetic random walks. Our experiments demonstrate that Caterpillar outperforms the comparison methods in detecting accelerometer signals of metro stops.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132994632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DMTMV: A Unified Learning Framework for Deep Multi-task Multi-view Learning","authors":"Yi-Feng Wu, De-chuan Zhan, Yuan Jiang","doi":"10.1109/ICBK.2018.00015","DOIUrl":"https://doi.org/10.1109/ICBK.2018.00015","url":null,"abstract":"As the development of data collection techniques, complicated objects are described with more than one aspects as well as possess multiple concepts, so as many data mining approaches face the issues of dual-heterogeneity, i.e., feature heterogeneity and task heterogeneity. Traditional multi-task learning methods and multi-view learning methods may be not optimal for such a complicated learning problem since they only capture one type of heterogeneity. Then some works concentrate on a new direction where there are multiple related tasks with multi-view data. However, most existing MTMV methods focus on proposing linear models for fitting the specific application requirements while they are not suitable for common large-scale real-world problems in real environments. In this paper, we propose a unified learning framework for a deep multi-task multi-view neural network. In our approach, there are three kinds of networks called shared feature network, specific feature network and task network, each of which focus on the feature heterogeneity, unified feature representations, and task heterogeneity, respectively. Meanwhile, we employ a layer-by-layer regularization strategy for learning the relationships between tasks in multi-task multi-view learning. Moreover, the DMTMV method is naturally convenient for multi-class heterogeneous tasks as well. Finally, experiments on four real-world datasets successfully show that the proposed framework can significantly improve the prediction performance in multi-task multi-view learning while it can also discover inherent relationships among different tasks.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129975370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}