{"title":"An Efficient Parallel Stochastic Gradient Descent for Matrix Factorization On GPUS","authors":"Tianyu Xing, Bin Wu, Bai Wang","doi":"10.1109/DSC50466.2020.00047","DOIUrl":"https://doi.org/10.1109/DSC50466.2020.00047","url":null,"abstract":"Matrix factorization (MF) is an essential method used in recommender systems, database systems, word-embedding, Graph-mining, and others. Stochastic gradient descent (SGD) is a widely-used method of solving the MF problem because it has effective accuracy in dealing with large datasets and high computing speed. SGD is hard to be parallelized as it is a sequential algorithm, but there are also some effective parallel methods proposed by researches. In this research, we propose EMF-SGD, an effective GPU-based method of large-scale recommender systems. EMF-SGD accelerated the SGD algorithm by utilizing the GPU shared-memory and warp operations. Besides, we focus on maintaining the relationship between users and items in preprocessing data to gain higher accuracy. Finally, we parallelize the EMF-SGD on multi-GPUS and proved it gains 1.8-4.3x speed up and higher accuracy over the most state-of arts algorithm GPU-MF-SGD, based on the different amount of GPUS we used.","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131645921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DSC 2020 Index","authors":"","doi":"10.1109/dsc50466.2020.00069","DOIUrl":"https://doi.org/10.1109/dsc50466.2020.00069","url":null,"abstract":"","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127481557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiang Wang, Yunfan Liao, Junxing Zhu, Bin Zhou, Yan Jia
{"title":"A Low-Dimensional Representation Learning Method for Text Classification and Clustering","authors":"Xiang Wang, Yunfan Liao, Junxing Zhu, Bin Zhou, Yan Jia","doi":"10.1109/DSC50466.2020.00039","DOIUrl":"https://doi.org/10.1109/DSC50466.2020.00039","url":null,"abstract":"Natural language processing applications often suffer the curse of dimensionality. In this paper, we propose a low-dimensional text representation learning algorithm, which preserves the pairwise similarity relations of texts. Our method maximizes the log-probability of observing similar texts conditioned on its feature representation. To generate enough similar text pairs for training the objective function, we first build an adjacency graph based on the pairwise similarity relations of the texts, and then propose a simulated sampling strategy to generate the co-occurrence text sequences from the adjacency graph. Experiments on four long and short text datasets demonstrate that our method outperforms several state-of-the-art dimensionality reduction methods. Our method is also better than Doc2vec except on the 20 Newsgroups” dataset for text clustering. Our method can also be applied to the representation learning of images rather than specified in texts.","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121379659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GeST: A Grid Embedding based Spatio-Temporal Correlation Model for Crime Prediction","authors":"Yiting Qian, Li Pan, Peng Wu, Zhengmin Xia","doi":"10.1109/DSC50466.2020.00009","DOIUrl":"https://doi.org/10.1109/DSC50466.2020.00009","url":null,"abstract":"Crime prediction greatly contributes to improving public safety in urban cities. Recent studies have achieved effectiveness by considering spatio-temporal crime distribution correlations among regions. However, with developments of advanced telecommunications and intelligent transportation in urban cities, urban regions tend to be more interacted and integrated, which makes existing methods difficult to capture in-depth geographical and contextual inter-area spatial correlations. To solve the problem, we propose the Grid-embedding based Spatio-Temporal correlation (GeST) model, which consists of grid-embedding module and crime graph prediction module. In the grid-embedding module, the convolutional AutoEncoder can explore distance-based inter-area spatial correlations and decompose crime distributions into hidden crime spatial bases. The bases are regarded as the representation of decomposed crime distribution. The Graph Convolutional Network (GCN) in grid-embedding module can capture contextual spatial correlations among feature-similar regions. After combining two types of grid-embedding vectors, the crime graph prediction module utilizes Long Short-Term Memory (LSTM) neural network to learn temporal correlations of crime distribution. Experiments conducted on two real-world datasets show that the proposed model achieves better prediction performance than other methods.","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125864469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Xuan, Jun Zheng, Lihong Chen, Shanqing Yu, Jinyin Chen, Dan Zhang, Qingpeng Zhang
{"title":"Unsupervised Euclidean Distance Attack on Network Embedding","authors":"Qi Xuan, Jun Zheng, Lihong Chen, Shanqing Yu, Jinyin Chen, Dan Zhang, Qingpeng Zhang","doi":"10.1109/DSC50466.2020.00019","DOIUrl":"https://doi.org/10.1109/DSC50466.2020.00019","url":null,"abstract":"Considering the wide application of network embedding methods in graph data mining, inspired by adversarial attacks in deep learning, a genetic algorithm-based Euclidean distance attack strategy is proposed to attack the network embedding method, thereby preventing structure information being discovered. EDA focuses on disturbing the Euclidean distance between a pair of nodes in the embedding space as much as possible through minimal modifications of the network structure. Since many downstream network algorithms, such as community detection and node classification, rely on the Euclidean distance between nodes to evaluate their similarity in the embedded space, EDA can be regarded as a general attack on various network algorithms. Different from traditional supervised attack strategies, EDA does not need labeling information, it is an unsupervised network embedding attack method. Experiments on a set of real networks demonstrate that the proposed EDA method can significantly reduce the performance of DeepWalk-based networking algorithms, i.e., community detection and node classification, and its performance is superior to several heuristic attack strategies.","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131122066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}