Sarah Nait Bahloul, Oussama Abderrahim, Aya Ichrak Benhadj Amar, Mohammed Yacine Bouhedadja
{"title":"Improvement of Data Stream Decision Trees","authors":"Sarah Nait Bahloul, Oussama Abderrahim, Aya Ichrak Benhadj Amar, Mohammed Yacine Bouhedadja","doi":"10.4018/ijdwm.290889","DOIUrl":"https://doi.org/10.4018/ijdwm.290889","url":null,"abstract":"The classification of data streams has become a significant and active research area. The principal characteristics of data streams are a large amount of arrival data, the high speed and rate of its arrival, and the change of their nature and distribution over time. Hoeffding Tree is a method to, incrementally, build decision trees. Since its proposition in the literature, it has become one of the most popular tools of data stream classification. Several improvements have since emerged. Hoeffding Anytime Tree was recently introduced and is considered one of the most promising algorithms. It offers a higher accuracy compared to the Hoeffding Tree in most scenarios, at a small additional computational cost. In this work, the authors contribute by proposing three improvements to the Hoeffding Anytime Tree. The improvements are tested on known benchmark datasets. The experimental results show that two of the proposed variants make better usage of Hoeffding Anytime Tree’s properties. They learn faster while providing the same desired accuracy.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78748795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Rumor Detection by Image Captioning and Multi-Cell Bi-RNN With Self-Attention in Social Networks","authors":"Jenq-Haur Wang, Chin-Wei Huang, M. Norouzi","doi":"10.4018/ijdwm.313189","DOIUrl":"https://doi.org/10.4018/ijdwm.313189","url":null,"abstract":"User-generated contents in social media are not verified before being posted. They could bring many problems if they were misused. Among various types of rumors, the authors focus on the type in which there's mismatch between images and their surrounding texts. They can be detected by multimodal feature fusion in RNNs with attention mechanism, but the relations between images and texts are not well-addressed. In this paper, the authors propose to improve rumor detection by image captioning and RNNs with self-attention. Firstly, they utilize the idea of image captioning to translate images into the corresponding text descriptions. Secondly, these caption words are represented by word embedding models and aggregated with surrounding texts using early fusion. Finally, multi-cell bi-directional RNNs with self-attention are used to learn important features to identify rumors. From the experimental results, the best F-measure of 0.882 can be obtained, which shows the potential of our proposed approach to rumor detection. Further investigation is needed for data in larger scale.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42579490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Initial Optimization Techniques for the Cube Algebra Query Language: The Relational Model as a Target","authors":"Thomas Mercieca, J. Vella, K. Vella","doi":"10.4018/ijdwm.299016","DOIUrl":"https://doi.org/10.4018/ijdwm.299016","url":null,"abstract":"A common model used in addressing today's overwhelming amounts of data is the OLAP Cube. The OLAP community has proposed several cube algebras, although a standard has still not been nominated. This study focuses on a recent addition to the cube algebras: the user-centric Cube Algebra Query Language (CAQL). The study aims to explore the optimization potential of this algebra by applying logical rewriting inspired by classic relational algebra and parallelism. The lack of standard algebra is often cited as a problem in such discussions. Thus, the significance of this work is that of strengthening the position of this algebra within the OLAP algebras by addressing implementation details. The modern open-source PostgreSQL relational engine is used to encode the CAQL abstraction. A query workload based on a well-known dataset is adopted, and CAQL and SQL implementations are compared. Finally, the quality of the query created is evaluated through the observed performance characteristics of the query. Results show strong improvements over the baseline case of the unoptimized query.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76108926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyin Ge, Mingshu Zhang, Xu An Wang, Jia Liu, Bin Wei
{"title":"Emotion-Drive Interpretable Fake News Detection","authors":"Xiaoyin Ge, Mingshu Zhang, Xu An Wang, Jia Liu, Bin Wei","doi":"10.4018/ijdwm.314585","DOIUrl":"https://doi.org/10.4018/ijdwm.314585","url":null,"abstract":"Fake news has brought significant challenges to the healthy development of social media. Although current fake news detection methods are advanced, many models directly utilize unselected user comments and do not consider the emotional connection between news content and user comments. The authors propose an emotion-driven explainable fake news detection model (EDI) to solve this problem. The model can select valuable user comments by using sentiment value, obtain the emotional correlation representation between news content and user comments by using collaborative annotation, and obtain the weighted representation of user comments by using the attention mechanism. Experimental results on Twitter and Weibo show that the detection model significantly outperforms the state-of-the-art models and provides reasonable interpretation.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42693737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nguyen Thi Kim Son, Nguyen Van Bien, Nguyen Huu Quynh, C. Thơ
{"title":"Machine Learning Based Admission Data Processing for Early Forecasting Students' Learning Outcomes","authors":"Nguyen Thi Kim Son, Nguyen Van Bien, Nguyen Huu Quynh, C. Thơ","doi":"10.4018/ijdwm.313585","DOIUrl":"https://doi.org/10.4018/ijdwm.313585","url":null,"abstract":"In this paper, the authors explore the factors to improve the accuracy of predicting student learning outcomes. The method can remove redundant and irrelevant factors to get a “clean” data set without having to solve the NP-Hard problem. The method can improve the graduation outcome prediction accuracy through logistic regression machine learning method for “clean” data set. They empirically evaluate the training and university admission data of Hanoi Metropolitan University from 2016 to 2020. From data processing results and the support from the machine learning techniques application program, they analyze, evaluate, and forecast students' learning outcomes based on admission data, first-year, and second-year academic performance data. They then submit proposals of training and admission policies and methods of radically and quantitatively solving problems in university admissions.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49537984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bhargavinath Dornadula, S. Geetha, L. Anbarasi, Seifedine Kadry
{"title":"A Survey of COVID-19 Detection From Chest X-Rays Using Deep Learning Methods","authors":"Bhargavinath Dornadula, S. Geetha, L. Anbarasi, Seifedine Kadry","doi":"10.4018/ijdwm.314155","DOIUrl":"https://doi.org/10.4018/ijdwm.314155","url":null,"abstract":"The coronavirus (COVID-19) outbreak has opened an alarming situation for the whole world and has been marked as one of the most severe and acute medical conditions in the last hundred years. Various medical imaging modalities including computer tomography (CT) and chest x-rays are employed for diagnosis. This paper presents an overview of the recently developed COVID-19 detection systems from chest x-ray images using deep learning approaches. This review explores and analyses the data sets, feature engineering techniques, image pre-processing methods, and experimental results of various works carried out in the literature. It also highlights the transfer learning techniques and different performance metrics used by researchers in this field. This information is helpful to point out the future research direction in the domain of automatic diagnosis of COVID-19 using deep learning techniques.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41458501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nasser Allheeib, Marine Alraqdi, Mohammed Almukaynizi
{"title":"Data Warehouse and Interactive Map for Promoting Cultural Heritage in Saudi Arabia Using GIS","authors":"Nasser Allheeib, Marine Alraqdi, Mohammed Almukaynizi","doi":"10.4018/ijdwm.314236","DOIUrl":"https://doi.org/10.4018/ijdwm.314236","url":null,"abstract":"With the urbanization of various regions, many historical sites may be misrepresented or totally neglected. As more people move to urban areas with time, heritage areas are being abandoned or ignored. The roads leading to such areas are less maintained, and they are not being adequately promoted. Over the years, the emergence and evolution of digital maps have played a significant role in tourist and cultural exploration and are important sources of information for tourists who are considering specific destinations. In this paper, the authors discuss the development and implementation of a geographic information system (GIS) in the tourism industry. They create an interactive map for tourist sites and suggest a means of retrieving tourist data. They select the Aseer region as a case study since it is rich with deep cultural heritage, comprising almost 4,000 heritage villages, and is considered to be one of the most important tourist destinations in the country. In this paper, the authors propose an initiative for the development and implementation of GIS in the tourism industry.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48390256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Z. Ye, Wenhui Cai, Mingwei Wang, Aixin Zhang, Wen-hua Zhou, Na Deng, Zimei Wei, Daxin Zhu
{"title":"Association Rule Mining Based on Hybrid Whale Optimization Algorithm","authors":"Z. Ye, Wenhui Cai, Mingwei Wang, Aixin Zhang, Wen-hua Zhou, Na Deng, Zimei Wei, Daxin Zhu","doi":"10.4018/ijdwm.308817","DOIUrl":"https://doi.org/10.4018/ijdwm.308817","url":null,"abstract":"Association Rule Mining(ARM) is one of the most significant and active research areas in data mining. Recently, Whale Optimization Algorithm (WOA) has been successfully applied in the field of data mining, however, it easily falls into the local optimum. Therefore, an improved WOA based adaptive parameter strategy and Levy Flight mechanism (LWOA) is applied to mine association rules. Meanwhile, a hybrid strategy that blends two algorithms to balance the exploration and exploitation phases is put forward, that is, grey wolf optimization algorithm (GWO), artificial bee colony algorithm (ABC) and cuckoo search algorithm (CS) are devoted to improving the convergence of LWOA. The approach performs a global search and finds the association rules sets by modeling the rule mining task as a multi-objective problem that simultaneously meets support, confidence, lift, and certain factor, which is examined on multiple data sets. Experimental results verify that the proposed method has better mining performance compared to other algorithms involved in the paper.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91026072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical Hybrid Neural Networks With Multi-Head Attention for Document Classification","authors":"Weihao Huang, Jiaojiao Chen, Qianhua Cai, Xuejie Liu, Yu-dong Zhang, Xiaohui Hu","doi":"10.4018/ijdwm.303673","DOIUrl":"https://doi.org/10.4018/ijdwm.303673","url":null,"abstract":"Document classification is a research topic aiming to predict the overall text sentiment polarity with the advent of deep neural networks. Various deep learning algorithms have been employed in the current studies to improve classification performance. To this end, this paper proposes a hierarchical hybrid neural network with multi-head attention (HHNN-MHA) model on the task of document classification. The proposed model contains two layers to deal with the word-sentence level and sentence-document level classification respectively. In the first layer, CNN is integrated into Bi-GRU and a multi-head attention mechanism is employed, in order to exploit local and global features. Then, both Bi-GRU and attention mechanism are applied to document processing and classification in the second layer. Experiments on four datasets demonstrate the effectiveness of the proposed method. Compared to the state-of-art methods, our model achieves competitive results in document classification in terms of experimental performance.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89573483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Schema Evolution in Multiversion Data Warehouses","authors":"Waqas Ahmed, E. Zimányi, A. Vaisman, R. Wrembel","doi":"10.4018/ijdwm.2021100101","DOIUrl":"https://doi.org/10.4018/ijdwm.2021100101","url":null,"abstract":"Data warehouses (DWs) evolve in both their content and schema due to changes of user requirements, business processes, or external sources to name a few. Although multiple approaches using temporal and/or multiversion DWs have been proposed to handle these changes, an efficient solution for this problem is still lacking. The authors' approach is to separate concerns and use temporal DWs to deal with content changes, and multiversion DWs to deal with schema changes. To address the former, previously, they have proposed a temporal multidimensional (MD) model. In this paper, they propose a multiversion MD model for schema evolution to tackle the latter problem. The two models complement each other and allow managing both content and schema evolution. In this paper, the semantics of schema modification operators (SMOs) to derive various schema versions are given. It is also shown how online analytical processing (OLAP) operations like roll-up work on the model. Finally, the mapping from the multiversion MD model to a relational schema is given along with OLAP operations in standard SQL.","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73133877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}