Big Data ResearchPub Date : 2023-05-01DOI: 10.1016/j.bdr.2023.100395
Ling Ding, Peng Du, Hai-wei Hou, Jian Zhang, Di Jin, Shifei Ding
{"title":"Botnet DGA Domain Name Classification Using Transformer Network with Hybrid Embedding","authors":"Ling Ding, Peng Du, Hai-wei Hou, Jian Zhang, Di Jin, Shifei Ding","doi":"10.1016/j.bdr.2023.100395","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100395","url":null,"abstract":"","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"33 1","pages":"100395"},"PeriodicalIF":3.3,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"54134987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2023-04-01DOI: 10.1016/j.bdr.2023.100382
Mintae Kim, Wooju Kim
{"title":"Task-Oriented Collaborative Graph Embedding Using Explicit High-Order Proximity for Recommendation","authors":"Mintae Kim, Wooju Kim","doi":"10.1016/j.bdr.2023.100382","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100382","url":null,"abstract":"","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"33 1","pages":"100382"},"PeriodicalIF":3.3,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"54134975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What Is a Multi-Modal Knowledge Graph: A Survey","authors":"Jing-hui Peng, Xinyu Hu, Wenbo Huang, Jian Yang","doi":"10.2139/ssrn.4229435","DOIUrl":"https://doi.org/10.2139/ssrn.4229435","url":null,"abstract":"","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"105 1","pages":"100380"},"PeriodicalIF":3.3,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84850252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2023-02-28DOI: 10.1016/j.bdr.2022.100360
Lucia Cascone , Saima Sadiq , Saleem Ullah , Seyedali Mirjalili , Hafeez Ur Rehman Siddiqui , Muhammad Umer
{"title":"Predicting Household Electric Power Consumption Using Multi-step Time Series with Convolutional LSTM","authors":"Lucia Cascone , Saima Sadiq , Saleem Ullah , Seyedali Mirjalili , Hafeez Ur Rehman Siddiqui , Muhammad Umer","doi":"10.1016/j.bdr.2022.100360","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100360","url":null,"abstract":"<div><p>Energy consumption prediction has become an integral part of a smart and sustainable environment. With future demand forecasts, energy production and distribution can be optimized to meet the needs of the growing population. However, forecasting the demand of individual households is a challenging task due to the diversity of energy consumption patterns. Recently, it has become popular with artificial intelligence-based smart energy-saving designs, smart grid planning and social Internet of Things (IoT) based smart homes. Despite existing approaches for energy demand forecast, predominantly, such systems are based on one-step forecasting and have a short forecasting period. For resolving this issue and obtain high prediction accuracy, this study follows the prediction of household appliances' power in two phases. In the first phase, a long short-term memory (LSTM) based model is used to predict total generative active power for the coming 500 hours. The second phase employs a hybrid deep learning model that combines convolutional characteristics of neural network with LSTM for household electrical energy consumption forecasting of the week ahead utilizing Social IoT-based smart meter readings. Experimental results reveal that the proposed convolutional LSTM (ConvLSTM) architecture outperforms other models with the lowest root mean square error value of 367 kilowatts for weekly household power consumption.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"31 ","pages":"Article 100360"},"PeriodicalIF":3.3,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49711519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2023-02-28DOI: 10.1016/j.bdr.2023.100368
Suneuy Kim, Yvonne Hoang, Tsz Ting Yu, Yuvraj Singh Kanwar
{"title":"GeoYCSB: A Benchmark Framework for the Performance and Scalability Evaluation of Geospatial NoSQL Databases","authors":"Suneuy Kim, Yvonne Hoang, Tsz Ting Yu, Yuvraj Singh Kanwar","doi":"10.1016/j.bdr.2023.100368","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100368","url":null,"abstract":"<div><p>The proliferation of geospatial applications has tremendously increased the variety, velocity, and volume of spatial data that data stores have to manage. Traditional relational databases reveal limitations in handling such big geospatial data, mainly due to their rigid schema requirements and limited scalability. Numerous NoSQL databases have emerged and actively serve as alternative data stores for big spatial data.</p><p>This study presents a framework, called GeoYCSB, developed for benchmarking NoSQL databases with geospatial workloads. To develop GeoYCSB, we extend YCSB, a de facto benchmark framework for NoSQL systems, by integrating into its design architecture the new components necessary to support geospatial workloads. GeoYCSB supports both microbenchmarks and macrobenchmarks and facilitates the use of real datasets in both. It is extensible to evaluate any NoSQL database, provided they support spatial queries, using geospatial workloads performed on datasets of any geometric complexity. We use GeoYCSB to benchmark two leading document stores, MongoDB and Couchbase, and present the experimental results and analysis. Finally, we demonstrate the extensibility of GeoYCSB by including a new dataset consisting of complex geometries and using it to benchmark a system with a wide variety of geospatial queries: Apache Accumulo, a wide-column store, with the GeoMesa framework applied on top.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"31 ","pages":"Article 100368"},"PeriodicalIF":3.3,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49733847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2023-02-28DOI: 10.1016/j.bdr.2023.100369
Srikanth Baride , Anuj S. Saxena , Vikram Goyal
{"title":"Efficiently Mining Colocation Patterns for Range Query","authors":"Srikanth Baride , Anuj S. Saxena , Vikram Goyal","doi":"10.1016/j.bdr.2023.100369","DOIUrl":"https://doi.org/10.1016/j.bdr.2023.100369","url":null,"abstract":"<div><p>Colocation pattern mining finds a set of features whose instances frequently appear nearby in the same geographical space. Most of the existing algorithms for colocation patterns find nearby objects by a user-provided single-distance threshold. The value of the distance threshold is data specific and choosing a suitable distance for a user is not easy. In most real-world scenarios, it is rather meant to define spatial proximity by a distance range. It also provides flexibility to observe the change in the colocation patterns with distance and interprets the result better. Algorithms for mining colocations with a single distance threshold cannot be applied directly to the range of distances due to the computational overhead. We identify several structural properties of the collocation patterns and use them to propose an efficient single-pass colocation mining algorithm for distance range query, namely <span><math><mi>R</mi><mi>a</mi><mi>n</mi><mi>g</mi><mi>e</mi><mo>−</mo><mi>C</mi><mi>o</mi><mi>M</mi><mi>i</mi><mi>n</mi><mi>e</mi></math></span>. We compare the performance of the <span><math><mi>R</mi><mi>a</mi><mi>n</mi><mi>g</mi><mi>e</mi><mo>−</mo><mi>C</mi><mi>o</mi><mi>M</mi><mi>i</mi><mi>n</mi><mi>e</mi></math></span> with adapted versions of the famous Join-less colocation mining approach using both real-world and synthetic data sets and show that <span><math><mi>R</mi><mi>a</mi><mi>n</mi><mi>g</mi><mi>e</mi><mo>−</mo><mi>C</mi><mi>o</mi><mi>M</mi><mi>i</mi><mi>n</mi><mi>e</mi></math></span> outperforms the other algorithms.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"31 ","pages":"Article 100369"},"PeriodicalIF":3.3,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49733848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2023-02-28DOI: 10.1016/j.bdr.2022.100352
Wenhai Li , Zheng Yang , Lingfeng Deng , Zhiling Cheng , Weidong Wen , Yanxiang He
{"title":"Accelerating Columnar Storage Based on Asynchronous Skipping Strategy","authors":"Wenhai Li , Zheng Yang , Lingfeng Deng , Zhiling Cheng , Weidong Wen , Yanxiang He","doi":"10.1016/j.bdr.2022.100352","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100352","url":null,"abstract":"<div><p>Many database applications, such as OnLine Analytical Processing (OLAP), web-based information extraction or scientific computation, need to select a subset of fields based on several user-defined filters. Developers of these applications require effective assembly methods for on-demand filtering and aggregation, which raises new challenges in deploying parallel computing components on top of columnar storage.</p><p>To efficiently generate qualified records, an asynchronous skipping strategy is presented to speed up filtering and decoding in the column-based storage. Concentrating on filtering-pushdown in parallel analytical workloads, we offer in-depth analysis on record assembly. We highlight the bottleneck of traditional record-wise assembling methods in the cases of evaluating analytical tasks on a nested schema. With a concurrent queue structure, an asynchronous skipping strategy is presented to evaluate column scan separately by a software pipeline involving an optionally different set of threads. We show how to intensively read the sequential blocks of each column, and how to effectively eliminate invalid payloads by integrating filtering-pushdown in an asynchronous I/O stack.</p><p>We implement a columnar storage supporting filtering-pushdown in nested schema. Our experiments are conducted on a de-facto standard benchmark using both variant-selectivity scans and ad-hoc queries. The results revealed that in parallel I/O-intensive workloads, our implementation improved the I/O performance of the state-of-the arts by 1.3X∼2.7X. Coupling the asynchronous strategy with filtering-pushdown, our implementation remarkably outperforms its competitors with heavyweight coding workloads on both HDD and SSD.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"31 ","pages":"Article 100352"},"PeriodicalIF":3.3,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49733846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2022-11-28DOI: 10.1016/j.bdr.2022.100350
Haiwei Zhang , Qijie Bai , Yining Lian , Yanlong Wen
{"title":"A Twig-Based Algorithm for Top-k Subgraph Matching in Large-Scale Graph Data","authors":"Haiwei Zhang , Qijie Bai , Yining Lian , Yanlong Wen","doi":"10.1016/j.bdr.2022.100350","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100350","url":null,"abstract":"<div><p><span><span><span>Subgraph matching aims to find similar substructures in a single graph according to a given query graph and is known as a basic query for graph data management. There exist many categories of subgraph matching solutions. Subgraph isomorphism, which is thought of an NP-complete problem, is an initial solution for the subgraph matching task. To speed up the procedure, graph simulation has been presented to match subgraphs with a </span>polynomial complexity of time. Unfortunately, graph simulation usually loses topologies of matched subgraphs because of its loose restrictions. In this paper, we propose an </span>approximation approach named kSGM (top-</span><strong>k S</strong>ubraph <strong>G</strong>raph <strong>M</strong>atching) for subgraph matching based on twig patterns. First, we transform query graphs into twig patterns and match candidate substructures in graph data. Second, we present an optimized join strategy along with top-k mechanism, including join order selection based on cost evaluation and optimized pruning based on maximum/minimum possible score. Finally, we design experiments on real-life and synthetic graph data to evaluate the performance of our work. The results show that our proposed kSGM obviously reduces the time complexity and guarantee the correctness for answering the queries of subgraph matching compared to existing algorithms.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100350"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136939468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2022-11-28DOI: 10.1016/j.bdr.2022.100348
Bogumił Kamiński , Tomasz Olczak , Bartosz Pankratz , Paweł Prałat , François Théberge
{"title":"Properties and Performance of the ABCDe Random Graph Model with Community Structure","authors":"Bogumił Kamiński , Tomasz Olczak , Bartosz Pankratz , Paweł Prałat , François Théberge","doi":"10.1016/j.bdr.2022.100348","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100348","url":null,"abstract":"<div><p>In this paper, we investigate properties and performance of synthetic random graph models with a built-in community structure. Such models are important for evaluating and tuning community detection algorithms that are unsupervised by nature. We propose <strong>ABCDe</strong>—a multi-threaded implementation of the <strong>ABCD</strong> (Artificial Benchmark for Community Detection) graph generator. We discuss the implementation details of the algorithm and compare it with both the previously available sequential version of the <strong>ABCD</strong> model and with the parallel implementation of the standard and extensively used <strong>LFR</strong> (Lancichinetti–Fortunato–Radicchi) generator. We show that <strong>ABCDe</strong> is more than ten times faster and scales better than the parallel implementation of <strong>LFR</strong> provided in <span>NetworKit</span>. Moreover, the algorithm is not only faster but random graphs generated by <strong>ABCD</strong> have similar properties to the ones generated by the original <strong>LFR</strong> algorithm, while the parallelized <span>NetworKit</span> implementation of <strong>LFR</strong> produces graphs that have noticeably different characteristics.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100348"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579622000429/pdfft?md5=5b249e2f347f9c9eeb348b655a88cf99&pid=1-s2.0-S2214579622000429-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91599233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}