Big Data Mining and Analytics最新文献_第10页

Intelligent and adaptive web data extraction system using convolutional and long short-term memory deep learning networks 使用卷积和长短期记忆深度学习网络的智能自适应web数据提取系统

IF 13.6 1区计算机科学

Big Data Mining and Analytics Pub Date : 2021-08-26 DOI: 10.26599/BDMA.2021.9020012

Sudhir Kumar Patnaik;C. Narendra Babu;Mukul Bhave

引用次数: 14

Coronavirus pandemic analysis through tripartite graph clustering in online social networks 在线社交网络中基于三方图聚类的冠状病毒疫情分析

IF 13.6 1区计算机科学

Big Data Mining and Analytics Pub Date : 2021-08-26 DOI: 10.26599/BDMA.2021.9020010

Xueting Liao;Danyang Zheng;Xiaojun Cao

{"title":"Coronavirus pandemic analysis through tripartite graph clustering in online social networks","authors":"Xueting Liao;Danyang Zheng;Xiaojun Cao","doi":"10.26599/BDMA.2021.9020010","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020010","url":null,"abstract":"The COVID-19 pandemic has hit the world hard. The reaction to the pandemic related issues has been pouring into social platforms, such as Twitter. Many public officials and governments use Twitter to make policy announcements. People keep close track of the related information and express their concerns about the policies on Twitter. It is beneficial yet challenging to derive important information or knowledge out of such Twitter data. In this paper, we propose a Tripartite Graph Clustering for Pandemic Data Analysis (TGC-PDA) framework that builds on the proposed models and analysis: (1) tripartite graph representation, (2) non-negative matrix factorization with regularization, and (3) sentiment analysis. We collect the tweets containing a set of keywords related to coronavirus pandemic as the ground truth data. Our framework can detect the communities of Twitter users and analyze the topics that are discussed in the communities. The extensive experiments show that our TGC-PDA framework can effectively and efficiently identify the topics and correlations within the Twitter data for monitoring and understanding public opinions, which would provide policy makers useful information and statistics for decision making.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"4 4","pages":"242-251"},"PeriodicalIF":13.6,"publicationDate":"2021-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9523493/09523498.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68022831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

LotusSQL: SQL engine for high-performance big data systems LotusSQL：用于高性能大数据系统的SQL引擎

IF 13.6 1区计算机科学

Big Data Mining and Analytics Pub Date : 2021-08-26 DOI: 10.26599/BDMA.2021.9020009

Xiaohan Li;Bowen Yu;Guanyu Feng;Haojie Wang;Wenguang Chen

{"title":"LotusSQL: SQL engine for high-performance big data systems","authors":"Xiaohan Li;Bowen Yu;Guanyu Feng;Haojie Wang;Wenguang Chen","doi":"10.26599/BDMA.2021.9020009","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020009","url":null,"abstract":"In recent years, Apache Spark has become the de facto standard for big data processing. SparkSQL is a module offering support for relational analysis on Spark with Structured Query Language (SQL). SparkSQL provides convenient data processing interfaces. Despite its efficient optimizer, SparkSQL still suffers from the inefficiency of Spark resulting from Java virtual machine and the unnecessary data serialization and deserialization. Adopting native languages such as C++ could help to avoid such bottlenecks. Benefiting from a bare-metal runtime environment and template usage, systems with C++ interfaces usually achieve superior performance. However, the complexity of native languages also increases the required programming and debugging efforts. In this work, we present LotusSQL, an engine to provide SQL support for dataset abstraction on a native backend Lotus. We employ a convenient SQL processing framework to deal with frontend jobs. Advanced query optimization technologies are added to improve the quality of execution plans. Above the storage design and user interface of the compute engine, LotusSQL implements a set of structured dataset operations with high efficiency and integrates them with the frontend. Evaluation results show that LotusSQL achieves a speedup of up to 9× in certain queries and outperforms Spark SQL in a standard query benchmark by more than 2× on average.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"4 4","pages":"252-265"},"PeriodicalIF":13.6,"publicationDate":"2021-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9523493/09523499.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68022830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A deep-learning prediction model for imbalanced time series data forecasting 一种用于不平衡时间序列数据预测的深度学习预测模型

IF 13.6 1区计算机科学

Big Data Mining and Analytics Pub Date : 2021-08-26 DOI: 10.26599/BDMA.2021.9020011

Chenyu Hou;Jiawei Wu;Bin Cao;Jing Fan

{"title":"A deep-learning prediction model for imbalanced time series data forecasting","authors":"Chenyu Hou;Jiawei Wu;Bin Cao;Jing Fan","doi":"10.26599/BDMA.2021.9020011","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020011","url":null,"abstract":"Time series forecasting has attracted wide attention in recent decades. However, some time series are imbalanced and show different patterns between special and normal periods, leading to the prediction accuracy degradation of special periods. In this paper, we aim to develop a unified model to alleviate the imbalance and thus improving the prediction accuracy for special periods. This task is challenging because of two reasons: (1) the temporal dependency of series, and (2) the tradeoff between mining similar patterns and distinguishing different distributions between different periods. To tackle these issues, we propose a self-attention-based time-varying prediction model with a two-stage training strategy. First, we use an encoder-decoder module with the multi-head self-attention mechanism to extract common patterns of time series. Then, we propose a time-varying optimization module to optimize the results of special periods and eliminate the imbalance. Moreover, we propose reverse distance attention in place of traditional dot attention to highlight the importance of similar historical values to forecast results. Finally, extensive experiments show that our model performs better than other baselines in terms of mean absolute error and mean absolute percentage error.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"4 4","pages":"266-278"},"PeriodicalIF":13.6,"publicationDate":"2021-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9523493/09523500.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68022965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Call for papers: Special issue on intelligent systems and Internet of Things 论文征集：智能系统与物联网特刊

IF 13.6 1区计算机科学

Big Data Mining and Analytics Pub Date : 2021-03-12 DOI: 10.26599/BDMA.2021.9020007

引用次数: 0

A multitask multiview neural network for end-to-end aspect-based sentiment analysis 一种用于端到端基于方面的情绪分析的多任务多视角神经网络

IF 13.6 1区计算机科学

Big Data Mining and Analytics Pub Date : 2021-03-12 DOI: 10.26599/BDMA.2021.9020003

Yong Bie;Yan Yang

{"title":"A multitask multiview neural network for end-to-end aspect-based sentiment analysis","authors":"Yong Bie;Yan Yang","doi":"10.26599/BDMA.2021.9020003","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020003","url":null,"abstract":"The aspect-based sentiment analysis (ABSA) consists of two subtasks-aspect term extraction and aspect sentiment prediction. Existing methods deal with both subtasks one by one in a pipeline manner, in which there lies some problems in performance and real application. This study investigates the end-to-end ABSA and proposes a novel multitask multiview network (MTMVN) architecture. Specifically, the architecture takes the unified ABSA as the main task with the two subtasks as auxiliary tasks. Meanwhile, the representation obtained from the branch network of the main task is regarded as the global view, whereas the representations of the two subtasks are considered two local views with different emphases. Through multitask learning, the main task can be facilitated by additional accurate aspect boundary information and sentiment polarity information. By enhancing the correlations between the views under the idea of multiview learning, the representation of the global view can be optimized to improve the overall performance of the model. The experimental results on three benchmark datasets show that the proposed method exceeds the existing pipeline methods and end-to-end methods, proving the superiority of our MTMVN architecture.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"4 3","pages":"195-207"},"PeriodicalIF":13.6,"publicationDate":"2021-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9430128/09430135.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67859134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

AIPerf: Automated machine learning as an AI-HPC benchmark AIPerf：作为AI-HPC基准的自动机器学习

IF 13.6 1区计算机科学

Big Data Mining and Analytics Pub Date : 2021-03-12 DOI: 10.26599/BDMA.2021.9020004

Zhixiang Ren;Yongheng Liu;Tianhui Shi;Lei Xie;Yue Zhou;Jidong Zhai;Youhui Zhang;Yunquan Zhang;Wenguang Chen

{"title":"AIPerf: Automated machine learning as an AI-HPC benchmark","authors":"Zhixiang Ren;Yongheng Liu;Tianhui Shi;Lei Xie;Yue Zhou;Jidong Zhai;Youhui Zhang;Yunquan Zhang;Wenguang Chen","doi":"10.26599/BDMA.2021.9020004","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020004","url":null,"abstract":"The plethora of complex Artificial Intelligence (AI) algorithms and available High-Performance Computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems has rapidly emerged. In particular, the defacto HPC benchmark, LINPACK, cannot reflect the AI computing power and input/output performance without a representative workload. Current popular AI benchmarks, such as MLPerf, have a fixed problem size and therefore limited scalability. To address these issues, we propose an end-to-end benchmark suite utilizing automated machinelearning, which not only represents real AI scenarios, but also is auto-adaptively scalable to various scales ofmachines. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimizationpotential on diverse systems with customizable configurations. We utilize Operations Per Second (OPS), which ismeasured in an analytical and systematic approach, as a major metric to quantify the AI performance. We performevaluations on various systems to ensure the benchmark's stability and scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured) up to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the results show near-linear weak scalability. With a flexible workload and single metric, AIPerf can easily scaleon and rank AI-HPC, providing a powerful benchmark suite for the coming supercomputing era.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"4 3","pages":"208-220"},"PeriodicalIF":13.6,"publicationDate":"2021-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9430128/09430136.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68026694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

A survey on algorithms for intelligent computing and smart city applications 智能计算算法及智能城市应用综述

IF 13.6 1区计算机科学

Big Data Mining and Analytics Pub Date : 2021-03-12 DOI: 10.26599/BDMA.2020.9020029

Zhao Tong;Feng Ye;Ming Yan;Hong Liu;Sunitha Basodi

{"title":"A survey on algorithms for intelligent computing and smart city applications","authors":"Zhao Tong;Feng Ye;Ming Yan;Hong Liu;Sunitha Basodi","doi":"10.26599/BDMA.2020.9020029","DOIUrl":"https://doi.org/10.26599/BDMA.2020.9020029","url":null,"abstract":"With the rapid development of human society, the urbanization of the world's population is also progressing rapidly. Urbanization has brought many challenges and problems to the development of cities. For example, the urban population is under excessive pressure, various natural resources and energy are increasingly scarce, and environmental pollution is increasing, etc. However, the original urban model has to be changed to enable people to live in greener and more sustainable cities, thus providing them with a more convenient and comfortable living environment. The new urban framework, the smart city, provides excellent opportunities to meet these challenges, while solving urban problems at the same time. At this stage, many countries are actively responding to calls for smart city development plans. This paper investigates the current stage of the smart city. First, it introduces the background of smart city development and gives a brief definition of the concept of the smart city. Second, it describes the framework of a smart city in accordance with the given definition. Finally, various intelligent algorithms to make cities smarter, along with specific examples, are discussed and analyzed.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"4 3","pages":"155-172"},"PeriodicalIF":13.6,"publicationDate":"2021-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9430128/09430132.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68026696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

Improvising personalized travel recommendation system with recency effects 利用近因效应改进个性化旅游推荐系统

IF 13.6 1区计算机科学

Big Data Mining and Analytics Pub Date : 2021-03-12 DOI: 10.26599/BDMA.2020.9020026

Paromita Nitu;Joseph Coelho;Praveen Madiraju

引用次数: 66

Call for papers: Special issue on unlocking genetic diseases by integrating machine learning techniques and medical data 论文征集：通过整合机器学习技术和医学数据解锁遗传疾病特刊

IF 13.6 1区计算机科学

Big Data Mining and Analytics Pub Date : 2021-03-12 DOI: 10.26599/BDMA.2021.9020005

引用次数: 0