Shikai Guo, Bowen Ping, Zixuan Song, Hui Li, Rong Chen
{"title":"Generating High Quality Titles in StackOverflow via Data Denoising Method","authors":"Shikai Guo, Bowen Ping, Zixuan Song, Hui Li, Rong Chen","doi":"10.1109/PAAP56126.2022.10010656","DOIUrl":"https://doi.org/10.1109/PAAP56126.2022.10010656","url":null,"abstract":"StackOverflow is one of the most popular question-and-answer platforms on the internet and whether posts on StackOverflow will be answered largely depends on their titles’ quality. Based on recurrent neural networks (RNN) or transformers, previous studies have attempted to use real posts from StackOverflow to generate better titles. However, the challenge of noise in existing data has been ignored, leading models can’t generate higher quality titles. To address this issue, we propose the K-clusters confidence learning for code titles (KCL-CT) model, which contains code clustering and confident learning (CL) denoising components. Specifically, the code clustering component is used to capture the word order and semantic information in code and classify code into different functional categories. The CL denoising component receives the output from the code clustering component and employs a heuristic method based on a confidence threshold to prune raw datasets. We conducted experiments based on Java, Python, JavaScript, SQL and C# datasets, the results of which indicated that in terms of the BLEU and ROUGE scores, the proposed KCL-CT model can outperform previous state-of-the-art models by 2.0%–11.1% and 2.5%–14.0%, respectively.","PeriodicalId":336339,"journal":{"name":"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127082144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianqi Ren, Ke Ji, Zhenxiang Chen, R. Sun, Kun Ma, Jin Zhou
{"title":"Fake news detection method based on multi-feature fusion of entity and structured text","authors":"Tianqi Ren, Ke Ji, Zhenxiang Chen, R. Sun, Kun Ma, Jin Zhou","doi":"10.1109/PAAP56126.2022.10010377","DOIUrl":"https://doi.org/10.1109/PAAP56126.2022.10010377","url":null,"abstract":"Internet technology has facilitated the development of social media, which has become the main way for people to get news. The low cost of information generation has allowed unconfirmed fake news to be noticed and reposted, which can easily cause social and economic harm. Much of the existing research on fake news detection has focused solely on the accuracy of detection, but has neglected the time efficiency of detection. We propose a fake news detection method called MFEAST, which simplifies article content and speeds up model training by extracting key information from articles. Compared with existing real fake news detection methods, our method substantially reduces the training time and achieves high detection accuracy.","PeriodicalId":336339,"journal":{"name":"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133551204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongwei Zhao, Yidong Li, Siquan Wu, Zhen Tian, Junbo Liu
{"title":"A MPI programming model for fast bird nest detection on the railway catenary","authors":"Hongwei Zhao, Yidong Li, Siquan Wu, Zhen Tian, Junbo Liu","doi":"10.1109/PAAP56126.2022.10010395","DOIUrl":"https://doi.org/10.1109/PAAP56126.2022.10010395","url":null,"abstract":"We propose a MIP programming model for the bird’s nest detection on the railway catenary, which performs coarse-to-fine strategy based on a cascaded YOLO network, and calculates the coarse-level and fine-level detection in parallel for different detected images. Due to the optimization of the parallel pipeline acceleration model, the deep learning network has a running speed equivalent to that of the single-stage network, which can perform real-time detection of bird’s nest.","PeriodicalId":336339,"journal":{"name":"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115408607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luxin Cai, Naiyue Chen, Yuanmeng Wei, Huaping Chen, Yidong Li
{"title":"Cluster-based Federated Learning Framework for Intrusion Detection","authors":"Luxin Cai, Naiyue Chen, Yuanmeng Wei, Huaping Chen, Yidong Li","doi":"10.1109/PAAP56126.2022.10010553","DOIUrl":"https://doi.org/10.1109/PAAP56126.2022.10010553","url":null,"abstract":"With the rapid development of Industrial Internet, the network intrusion detection has become particularly important. In the Industrial Internet, large-scale data is distributed in the edge nodes caused the joint analysis of network intrusion detection at each edge node has become necessary. Federated learning structure can avoid data out of local nodes to protect user privacy data. However, the data distribution is different for each edge nodes, which limits the effectiveness of federated learning models. We focus on the non-IID data features and propose a new cluster-based federated learning framework for network intrusion detection. In this method, we cluster clients into different communities by data labels, which the clients contain the similar proportion of data labels in the same community. Based on the clustering results, we decompose federated learning model aggregation into cluster aggregation and global aggregation by leveraging similarities both within and between clusters. We conduct extensive experiments based on UNSW_NB15 dataset. The results show that our method has better performance than FedAvg and FedProx. It can work well in scenarios with different distributions of data samples while ensuring data security and privacy protection.","PeriodicalId":336339,"journal":{"name":"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)","volume":"284 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127556019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative Study on Data Sovereignty Guarantee Technology","authors":"Yaodong Tao, shuai Yang, Hongmei Ge","doi":"10.1109/PAAP56126.2022.10010593","DOIUrl":"https://doi.org/10.1109/PAAP56126.2022.10010593","url":null,"abstract":"As an economic commodity, data sharing, circulation and trading can not only reduce the maintenance and management costs of enterprises, but also tap the potential value of data, improve the internal workflow of enterprises and the cooperation among enterprises. The marketization of data elements and the clarification of data sovereignty are the current difficulties hindering data flow. This paper addresses one of the current data circulation issues: how to maintain data sovereignty, and makes exploration and research in combination with the current era background. For the current research projects and products, compare and analyze the techniques used to maintain data sovereignty. Finally, based on the current technology, it gives recommendations for the future development of data sovereignty protection technology.","PeriodicalId":336339,"journal":{"name":"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122432563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Ambient Noise Cross-Correlation Algorithm on Heterogeneous CPU-GPU Cluster","authors":"Chao Wu, Xing Tan, Huikun Li, Guangzhong Sun","doi":"10.1109/PAAP56126.2022.10010612","DOIUrl":"https://doi.org/10.1109/PAAP56126.2022.10010612","url":null,"abstract":"Calculation of noise cross-correlation functions (NCF) plays a vital role in ambient noise seismology. However high computation and storage requirements of NCF become huge obstacles in development of ambient noise seismology. Nowadays heterogeneous clusters show great impact in scientific computing. In this paper, we propose a parallel NCF algorithm based on heterogeneous cluster. Firstly, input SAC files are partitioned by dimension of dates and distributed to different nodes of heterogeneous cluster. Secondly, during NCF computation in each node, workloads are divided by station pairs, and calculated in different graphics processing units (GPUs) embedded in computing node. Finally, all NCF results of different dates are gathered to one computing node and stacked using all central processing units (CPUs) of that node. Experimental results demonstrate that the parallel NCF computing can be accomplished within much less time than that of the CPU counterpart, and the speedup remains almost linear when increasing number of computing nodes.","PeriodicalId":336339,"journal":{"name":"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122861792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Original Space and Latent Space for Multi-view Spectral Clustering","authors":"Ruiting Hu, Zhibin Gu, Songhe Feng","doi":"10.1109/PAAP56126.2022.10010419","DOIUrl":"https://doi.org/10.1109/PAAP56126.2022.10010419","url":null,"abstract":"We propose a novel multi-view spectral clustering model, called Joint Original space and Latent space for Multi-view clustering (JOLM). Different from most existing multi-view clustering methods, which usually improve clustering performance by developing original or latent features of multi-view data, the proposed JOLM method integrates both original features and latent features into a framework to improve clustering performance. Specifically, we learn the similarity graph matrix from original multiple features and latent features respectively, and obtain the global graph by minimizing the errors between them, so as to better utilize the rich information from multiple views. An effective iterative algorithm is proposed to optimize the objective function. Finally, abundant experiments show the effectiveness of our proposed method.","PeriodicalId":336339,"journal":{"name":"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114818830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liang Jin, Xiaochuan Li, Baoyu Fan, Zhenhua Guo, Ruidong Li, Li Wang, Yanwei Wang, Yaqian Zhao, Rengang Li
{"title":"A Multi-object Detection Sampling Algorithm For Large Scenes","authors":"Liang Jin, Xiaochuan Li, Baoyu Fan, Zhenhua Guo, Ruidong Li, Li Wang, Yanwei Wang, Yaqian Zhao, Rengang Li","doi":"10.1109/PAAP56126.2022.10010614","DOIUrl":"https://doi.org/10.1109/PAAP56126.2022.10010614","url":null,"abstract":"Multi-object detection in large scenes aims to find objects in images, which usually contain more than one billion pixels. Based on the concept of dividing and conquering, the state-of-the-art (SOTA) methods slice the super-resolution image into patches first and then lower the image solution to detect objects later. The advantage of this method is that it can adapt quickly to regular detection algorithms. However, a set of parameters needs to be set manually, such as the size of sliding windows and overlap, which is quite hard to fit all scenarios. It may result in a loss of samples located at the boundary of the sliding window and the oversampling of inefficient samples that appear within the overlap. In this paper, we propose a object-oriented image sampling algorithm based on anchor boxes during training and multi-scale pyramids during inference. Inspired by the mature object detection baseline Scale-YOLOv4, we present more tricks to fit large scenes. The accuracy can reach 66%, which is 24 points higher than the CascadeRCNN model of the official backbone network ResNet50. Finally, we have won first place in the PANDA object detection tracking using this method.","PeriodicalId":336339,"journal":{"name":"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127974016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tai Nguyen, Minh Bui, Huong Ninh, T. Nguyen, H. Tran
{"title":"Efficient Heuristic Algorithm to Speed up GraphCut in GPU for Image Stitching","authors":"Tai Nguyen, Minh Bui, Huong Ninh, T. Nguyen, H. Tran","doi":"10.1109/PAAP56126.2022.10010453","DOIUrl":"https://doi.org/10.1109/PAAP56126.2022.10010453","url":null,"abstract":"GraphCut algorithm has been widely utilized to solve various types of computer vision problems. Its expensive computational cost encouraged many researchers to improve the speed of the algorithm. Recent works proposed schemes that work on parallel computing platforms such as CUDA. However, the problem of low convergence speed prevents the usage of GraphCut for real time applications. In this paper, we propose global suppression heuristic to boost the convergence process of the algorithm. A parallel implementation of GraphCut algorithm on CUDA designed for the image stitching problem is introduced. Our method achieves up to 3× time boost on the graph of size 80×480 compared to the best sequential GraphCut algorithm while achieving satisfactory stitched images, suitable for panorama applications. Our source code will be soon available for further research.","PeriodicalId":336339,"journal":{"name":"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133412557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}