Yibing Guo, Yutao Huang, Ye Ding, Shuhan Qi, Xuan Wang, Qing Liao
{"title":"GPU-BTM: A Topic Model for Short Text using Auxiliary Information","authors":"Yibing Guo, Yutao Huang, Ye Ding, Shuhan Qi, Xuan Wang, Qing Liao","doi":"10.1109/DSC50466.2020.00037","DOIUrl":"https://doi.org/10.1109/DSC50466.2020.00037","url":null,"abstract":"Recently, short texts become very popular in social life. To understand short texts, researchers develop topic models to extract topic information. However, conventional topic models mainly focus on long documents which cannot deal with the sparsity problem of short text. In this paper, we propose a novel topic model for short text called GPU-BTM, which incorporates Generalized Pólya Urn technique into Biterm Topic Model. GPU-BTM utilizes the similarity information and the co-occurrence pattern of words simultaneously to handle the sparsity problem. Specifically, the GPU module considers the similarity information among words, so that GPU-BTM generates more coherent topics. On the other hand, BTM module tries to capture the co-occurrence pattern of words so that the enriched contexts relieve the data sparsity problem. In the experiment part, the results demonstrate that GPU-BTM model outperforms four latest comparison models on two real world short text datasets.","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114951144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yushun Xie, Zhaoquan Gu, Bin Zhu, Le Wang, Weihong Han, Lihua Yin
{"title":"Adversarial Examples for Chinese Text Classification","authors":"Yushun Xie, Zhaoquan Gu, Bin Zhu, Le Wang, Weihong Han, Lihua Yin","doi":"10.1109/DSC50466.2020.00043","DOIUrl":"https://doi.org/10.1109/DSC50466.2020.00043","url":null,"abstract":"Deep neural networks (DNNs) have been widely adopted in various areas such as image recognition and natural language processing. However, many works show that DNNs for image classification are vulnerable to adversarial examples, which are generated by adding small-magnitude perturbations to the original inputs. In this paper, we show that DNNs for Chinese text classification are also vulnerable to adversarial examples. We propose a marginal attack method to generate adversarial examples that could fool the DNNs. This method adopts the Naïve Bayes principle to filter sensitive words and it only adds a small number of sensitive words at the end of the original text. The generated adversarial example could fool a variety of Chinese text classification DNNs, such that the text would be classified to incorrect category with high probability. We conduct extensive experiments to evaluate the attack performance and the results show that the success ratio of the attacks could reach almost 100% by adding only five sensitive words.","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128600238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image-based Encrypted Traffic Classification with Convolution Neural Networks","authors":"Yanjie He, Wei Li","doi":"10.1109/DSC50466.2020.00048","DOIUrl":"https://doi.org/10.1109/DSC50466.2020.00048","url":null,"abstract":"Network traffic classification plays an important part in the network management and network monitoring. It can help administrators to understand the constitution of network traffic, facilitate administrators to manage the network and provide differentiated service quality and security monitoring. However, the widespread usage of the encryption techniques and dynamic ports policy make encrypted traffic classification become a great challenge for traditional traffic classification methods. In this paper, we propose an image-based method that can classify encrypted network traffic with a high accuracy. The basic idea of the method is to convert the first few nonzero payload sizes of session to gray images, and classify the converted gray images with convolutional neural network to achieve the goal of categorizing the encrypted network traffic. This method is very light-weight and it can automatically extract features, select features and classify encrypted network traffic to categories. We use the public dataset ISCX VPN-nonVPN to validate our proposed method. The experimental results show that our proposed approach achieves F1 score of 97.73% on the conventional encrypted traffic classification and F1 score of 99.55% on the virtual private network traffic(VPN) classification.","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133710568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deepfake Detection with Clustering-based Embedding Regularization","authors":"Kui Zhu, Bin Wu, Bai Wang","doi":"10.1109/DSC50466.2020.00046","DOIUrl":"https://doi.org/10.1109/DSC50466.2020.00046","url":null,"abstract":"In recent months, AI-synthesized face swapping videos referred to as deepfake have become an emerging problem. False video is becoming more and more difficult to distinguish, which brings a series of challenges to social security. Some scholars are devoted to studying how to improve the detection accuracy of deepfake video. At the same time, in order to conduct better research, some datasets for deepfake detection are made. Companies such as Google and Facebook have also spent huge sums of money to produce datasets for deepfake video detection, as well as holding deepfake detection competitions. The continuous advancement of video tampering technology and the improvement of video quality have also brought great challenges to deepfake detection. Some scholars have achieved certain results on existing datasets, while the results on some high-quality datasets are not as good as expected. In this paper, we propose new method with clustering-based embedding regularization for deepfake detection. We use open source algorithms to generate videos which can simulate distinctive artifacts in the deepfake videos. To improve the local smoothness of the representation space, we integrate a clustering-based embedding regularization term into the classification objective, so that the obtained model learns to resist adversarial examples. We evaluate our method on three latest deepfake datasets. Experimental results demonstrate the effectiveness of our method.","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132352926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contextual Gated Graph Convolutional Networks for Social Unrest Events Prediction","authors":"Haiyang Wang, Bin Zhou, Zhipin Gu, Yan Jia","doi":"10.1109/DSC50466.2020.00056","DOIUrl":"https://doi.org/10.1109/DSC50466.2020.00056","url":null,"abstract":"In a wide range of social unrest events prediction, the dynamic graph convolutional network (DGCN) have been successfully leveraged to achieve reliable performance. The innovation of dynamic graph convolutional networks mainly focuses on capturing the temporal features of unrest events. Inspired by the dynamic graph convolutional network, we propose a new graph convolutional network model called Contextual Gated Graph Convolutional Network (CGGCN). We apply CGGCN to predict and analyze social unrest events. The CGGCN uses the contextual gated layer, which improves the layer-wise propagation rules of graph convolutional networks. The contextual gated layer can re-learn the keyword representation to capture the contextual semantic features of unrest events by using squeeze & excitation module. The principle of the squeeze & excitation module is to increase the weight of meaningful words for event prediction and suppress weaker ones. In this paper, we obtain historical texts including published news and short tweets related to social unrest events. Based on these historical texts data, the CGGCN can predict the occurrence of social unrest events. In addition, we propose a method for establishing the evolution graph of unrest events. In this way, we can use several core words to summarize the evolution of the event. Finally, we design experiments on the specific unrest events data sets. The experimental results show that the CGGCN leads by about 5% - 7% in the performance of prediction compared with other popular methods.","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123095421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SecureMLDebugger: A Privacy-Preserving Machine Learning Debugging Tool","authors":"Peiyi Han, Chaozheng Wang, Chuanyi Liu, Shaoming Duan, Hezhong Pan, Pengshuai Luo","doi":"10.1109/DSC50466.2020.00027","DOIUrl":"https://doi.org/10.1109/DSC50466.2020.00027","url":null,"abstract":"The issue of data privacy is uniquely challenging in machine learning which requires large datasets. Privacypreserving machine learning method based on the concept of model training isolated from data scientists have become a hot topic in recent years. In order to protect data privacy, training data is completely isolated from data scientists. Although this method can protect data privacy, data scientists cannot perceive any training information of data nodes during training, and it is difficult to debug machine learning model. Existing works provide data collection APIs to collect and display metadata during training to help data scientists debug machine learning models. Malicious data scientists can obtain private data through these APIs. In this paper, a novel security machine learning debug tool based non-intrusive metadata collection scheme, called SecureMLDebugger(SMLD), is proposed, which automatically collect, store and manage non-privacy metadata during training without any data collection API. Our tool accelerates users in their machine learning experiment while protecting data privacy. We achieve this by transparently tracing each function call in machine learning code and automatically extracting metadata such as hyperparameters of models, training runs, evaluations and layouts of neural networks. SMLD is integrated with popular frameworks such as scikit-learn and PyTorch, and meets the demands of various privacy-preserving training cases in practical.","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133916125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chinese Open Relation Extraction with Pointer-Generator Networks","authors":"Ziheng Cheng, Xu Wu, Xiaqing Xie, Jingchen Wu","doi":"10.1109/DSC50466.2020.00054","DOIUrl":"https://doi.org/10.1109/DSC50466.2020.00054","url":null,"abstract":"Most of the traditional Chinese open relation extraction (Open RE) system exploit the syntactic, lexical and other language structure information obtained by natural language processing (NLP) tools from sentences to build hand-crafted patterns for extraction, which is easy to cause error propagation and affect the accuracy of extraction. In this paper, we propose an end-to-end abstract Chinese Open RE model based on the Pointer-Generator network, PGCORE. We employ the results extracted by the state-of-the-art pattern-based Chinese Open RE system as the training set of the model. Experimental results show that our method is outperforms the pattern-based several baselines system, which proves the feasibility and effectiveness of using deep learning models for Chinese Open RE.","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133724427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Context-aware Event Type Identification Based on Context Fusion and Joint Learning","authors":"Zuowei Zhang, Yan Tang","doi":"10.1109/DSC50466.2020.00024","DOIUrl":"https://doi.org/10.1109/DSC50466.2020.00024","url":null,"abstract":"Automatic Event Type Identification from a text document is a particularly challenging task in event extraction. Consideration of the event triggers’ context has been shown effective in this task. However, existing methods suffer from insufficient consideration of the semantic information in the event trigger context and the dependencies between event triggers. To fill this gap, we propose a novel joint learning model called CAED for event type identification. Given a text document, first, CAED carries out event context fusion by sending the sentence embeddings to BiLSTMs for capturing the weighted average of its hidden states and aggregates them using the self-attention mechanism to obtain the document-level context embedding. Then, CAED introduces a dynamic memory vector to record the occurrences of event triggers and their dependencies in each sentence. Finally, CAED concatenates the documentlevel context embedding, the dynamic memory vector, and the pre-trained word embedding into a joint word-level embedding. CAED carries out joint learning by sending the joint word-level embeddings to a LSTM based event type classifier that iteratively uses the dynamic memory vector to achieve Context-aware Event Type Identification. Experiment results on the CEC benchmark dataset and a case study demonstrate the superiority of CAED over six state-of-the-art event type identification models.","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117092802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Task Learning Network for Document-level and Multi-aspect Sentiment Classification","authors":"Zhou Wang, Jing Cao","doi":"10.1109/DSC50466.2020.00033","DOIUrl":"https://doi.org/10.1109/DSC50466.2020.00033","url":null,"abstract":"Document-level sentiment classification aims to predict overall sentiment polarity in a document about a product, while multi-aspect sentiment classification aims at detecting sentiment polarities for different aspects of a product in a document. Most existing methods perform the two tasks separately and ignore the correlation between them. In this paper, we propose a multi-task framework called multi-sentiment hierarchical attention network (MSHAN) that jointly performs document-level and multi-aspect sentiment classification both. Specifically, MSHAN adopts hierarchical architecture and attention mechanism to predict aspect sentiments and aggregate those aspect sentiments into overall sentiment. Moreover, considering that aspect sentiment can not fully express overall sentiment, MSHAN adopts another hierarchical architecture to capture additional document sentiment information and add this sentiment to the overall sentiment. Experimental results on two real-world datasets show that the proposed method outperforms previous methods. To the best of our knowledge, this is the first study that performs document-level and multi-aspect sentiment classification in a unified model.","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122683597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenchen Li, Aiping Li, Hongkui Tu, Ye Wang, Changhai Wang
{"title":"A Knowledge Graph Embedding Method Based on Neural Network","authors":"Chenchen Li, Aiping Li, Hongkui Tu, Ye Wang, Changhai Wang","doi":"10.1109/DSC50466.2020.00057","DOIUrl":"https://doi.org/10.1109/DSC50466.2020.00057","url":null,"abstract":"As the basis of many knowledge graph completion tasks, the embedding representation of entities and relations in knowledge graph (KG) is an important task in the fields of Natural Language Processing (NLP) and Artificial Intelligence (AI). While most of the existing knowledge graph embedding (KGE) models based on convolutional neural network (CNN) can obtain abundant feature embedding, they may ignore an important fact that the triples in the KG come from the text, as they simply learn about the feature embedding of entities and relations without considering contextual information. Therefore, in this paper, we propose an effective KGE model based on neural network. First of all, we convert the triple (h,r,t) of the KG into a sentence [h r t]. Then, the LSTM neural network is used to learn the long-term dependence of sentences from the input feature vectors. Then, on this basis, the two-layer convolutional neural network with several different filters is used to extract different local features. Finally, the obtained feature vectors are connected together, and the inner product is carried out with the weight vectors to obtain the score of the triple, so as to judge the validity of the given triple. We evaluate our model on two benchmark datasets FB15k-237 and WN18RR, the experimental results show that the model can effectively improve the accuracy of link prediction, achieving better results compared with other baseline models.","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125091419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}