Big Data Research最新文献

筛选
英文 中文
Correlation Expert Tuning System for Performance Acceleration 性能加速相关专家调谐系统
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100345
Yanfeng Chai , Jiake Ge , Qiang Zhang , Yunpeng Chai , Xin Wang , Qingpeng Zhang
{"title":"Correlation Expert Tuning System for Performance Acceleration","authors":"Yanfeng Chai ,&nbsp;Jiake Ge ,&nbsp;Qiang Zhang ,&nbsp;Yunpeng Chai ,&nbsp;Xin Wang ,&nbsp;Qingpeng Zhang","doi":"10.1016/j.bdr.2022.100345","DOIUrl":"10.1016/j.bdr.2022.100345","url":null,"abstract":"<div><p>One configuration can not fit all workloads and diverse resources limitations in modern databases. Auto-tuning methods based on reinforcement learning (RL) normally depend on the exhaustive offline training process with a huge amount of performance measurements, which includes large inefficient knobs combinations under a trial-and-error method. The most time-consuming part of the process is not the RL network training but the performance measurements for acquiring the reward values of target goals like higher throughput or lower latency. In other words, the whole process nearly could be considered as a zero-knowledge method without any experience or rules to constrain it. So we propose a correlation expert tuning system (CXTuning) for acceleration, which contains a correlation knowledge model to remove unnecessary training costs and a multi-instance mechanism (MIM) to support fine-grained tuning for diverse workloads. The models define the importance and correlations among these configuration knobs for the user's specified target. But knobs-based optimization should not be the final destination for auto-tuning. Furthermore, we import an abstracted architectural optimization method into CXTuning as a part of the progressive expert knowledge tuning (PEKT) algorithm. Experiments show that CXTuning can effectively reduce the training time and achieve extra performance promotion compared with the state-of-the-art auto-tuning method.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100345"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2214579622000399/pdfft?md5=959f53ff5a4e8dcd1c236afdbde633e4&pid=1-s2.0-S2214579622000399-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86236930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Intelligent Government Complaint Prediction Approach 一种智能政府投诉预测方法
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100336
Siqi Chen , Yanling Zhang , Bin Song , Xiaojiang Du , Mohsen Guizani
{"title":"An Intelligent Government Complaint Prediction Approach","authors":"Siqi Chen ,&nbsp;Yanling Zhang ,&nbsp;Bin Song ,&nbsp;Xiaojiang Du ,&nbsp;Mohsen Guizani","doi":"10.1016/j.bdr.2022.100336","DOIUrl":"10.1016/j.bdr.2022.100336","url":null,"abstract":"<div><p><span>Recent advances in machine learning<span> (ML) bring more opportunities for greater implementation of smart government construction. However, there are many challenges in terms of government data application due to the previous nonstandard records and man-made errors. In this paper, we propose a practical intelligent government complaint prediction (IGCP) framework that helps governments quickly respond to citizens' consultations and complaints via ML technologies<span>. In addition, we put forward an automatic label correction method and demonstrate its effectiveness on the performance improvement of intelligent government complaint prediction task. Specifically, the central server collects the interaction records from users and departments and automatically integrates them by the label correction approach which is designed to evaluate the similarity between different labels in data, and merge highly similar labels and corresponding samples into their most similar category. Based on those refined data, the central server quickly generates accurate solutions to complaints through text classification algorithms. The main innovation of our approach is that we turn the task of government complaint distribution into a text classification problem which is uniformly coordinated by the central server, and employ the label correction approach to correct redundant labels for training better models based on limited complaint records. To explore the influences of our approach, we evaluate its performance on real-world government service records provided by our collaborator. The experimental results demonstrate the prediction task which uses the label </span></span></span>correction algorithm achieves significant improvements on almost all metrics of the classifier.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100336"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84865537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Data Stream Classification Based on Extreme Learning Machine: A Review 基于极限学习机的数据流分类研究综述
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100356
Xiulin Zheng , Peipei Li , Xindong Wu
{"title":"Data Stream Classification Based on Extreme Learning Machine: A Review","authors":"Xiulin Zheng ,&nbsp;Peipei Li ,&nbsp;Xindong Wu","doi":"10.1016/j.bdr.2022.100356","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100356","url":null,"abstract":"<div><p>Many daily applications are generating massive amount of data in the form of stream at an ever higher speed, such as medical data, clicking stream, internet record and banking transaction, etc. In contrast to the traditional static data, data streams are of some inherent properties, to name a few, infinite length, concept drift, multiple labels and concept evolution. Among all the data mining tasks<span><span>, classification is one of the basic topics in data stream mining and has gained more and more attentions among different research communities. Extreme Learning Machine<span> (ELM) has drawn much interests in data classification due to its high efficiency, universal approximation capability, </span></span>generalization ability<span>, and simplicity, which have greatly inspired the development of many ELM-based algorithms and their applications during the past decades. In this paper, we mainly provide a comprehensive review on ELM theoretical research and its variants in data stream classification, and categorize these algorithms from different perspectives. Firstly, we briefly introduce the basic principles of ELM and its characteristics. Secondly, we give an overview of different ELM variants to address the particular issues of data stream classification. Thirdly, we present an overview of different strategies to optimize the ELM, which have further improved the stability, accuracy and generalization ability of ELM, and briefly introduce some practical applications of ELM in data stream classification. Finally, we conduct several groups of experiments to compare the performance of ELM based models addressing the focused issues. Also, the open issues and prospects of ELM models used for stream classification are discussed, which are worthwhile to be further studied in the future.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100356"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91599167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Augmented Functional Analysis of Variance (A-fANOVA): Theory and Application to Google Trends for Detecting Differences in Abortion Drugs Queries 增强功能方差分析(A-fANOVA):谷歌趋势检测流产药物查询差异的理论与应用
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100354
Fabrizio Maturo , Annamaria Porreca
{"title":"Augmented Functional Analysis of Variance (A-fANOVA): Theory and Application to Google Trends for Detecting Differences in Abortion Drugs Queries","authors":"Fabrizio Maturo ,&nbsp;Annamaria Porreca","doi":"10.1016/j.bdr.2022.100354","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100354","url":null,"abstract":"<div><p>The World Wide Web (WWW) has become a popular and readily accessible big data source in recent decades. The information in the WWW is offered in many different types, e.g. Google Trends, which provides deep insights into people's search queries in the Google Search engine. Analysing this kind of data is not straightforward because they usually take the form of high-dimensional data, given that the latter can be collected over extensive periods. Comparing Google Trends' means of different groups of people or Countries can help understand many phenomena and provide very appealing insights into populations' interests in specific periods and areas. However, appropriate statistical techniques should be adopted when inspecting and testing differences in such data due to the well-known curse of dimensionality. This paper suggests an original approach to dealing with Google Trends by concentrating on the search for the “<em>Cytotec</em><span>” abortion drug. The final purpose of the application is to determine if different Countries' abortion legislation can influence the research trends. This research focuses on Functional Data Analysis (FDA) to deal with high-dimensional data and proposes a generalisation of the classical functional analysis of variance model, namely the Augmented Functional Analysis of Variance (A-fANOVA). To test the existence of statistically significant differences among groups of Countries, A-fANOVA considers additional curves' characteristics provided by the velocity and acceleration of the original google queries over time. The proposed methodology appears to be intriguing for capturing additional information about curves' behaviours with the final aim of offering a monitoring tool for policy-makers.</span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100354"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91599230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Embedding Model for Knowledge Graph Completion Based on Graph Sub-Hop Convolutional Network 基于图子跳卷积网络的知识图补全嵌入模型
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100351
Haitao He , Haoran Niu , Jianzhou Feng , Junlan Nie , Yangsen Zhang , Jiadong Ren
{"title":"An Embedding Model for Knowledge Graph Completion Based on Graph Sub-Hop Convolutional Network","authors":"Haitao He ,&nbsp;Haoran Niu ,&nbsp;Jianzhou Feng ,&nbsp;Junlan Nie ,&nbsp;Yangsen Zhang ,&nbsp;Jiadong Ren","doi":"10.1016/j.bdr.2022.100351","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100351","url":null,"abstract":"<div><p>The research on knowledge graph completion based on representation learning<span><span> is increasingly dependent on the node structural feature in the graph. However, a large number of nodes have few immediate neighbors, resulting in the node features unable to be fully expressed. Hence, multi-hop structure features are crucial to the representation learning of nodes. GCN (Graph Convolutional Network) is a graph embedding model that can introduce the multi-hop structure. However, the multi-hop information transmitted between GCN layers suffers a lot of losses. This would lead to the insufficient mining of the node structure features and semantic feature association among entities, further reducing the efficiency of graph knowledge completion. A gate-controlled graph sub-hop </span>convolutional network<span> model for knowledge graph completion is proposed to fill these research gaps. Firstly, a graph sub-hop convolutional network based on matrix representation is designed, which can transmit multi-hop neighbor features directly to the encoded node vector to avoid a large loss of features during multi-hop transmission. On this basis, the implicit multi-hop relations are explicitly embedded into the model based on the TransE. In the process of each hop convolution, aiming at the accumulation of noise redundancy caused by the increase of the receptive field, a sub-hop gate mechanism strategy is proposed to filter information. Finally, the linear model is used to decode the encoded nodes and then complete the knowledge graph. We carried out experimental comparison and analysis on WN18RR, FB15k-237, UMLS, and KINSHIP datasets. The results show that the embedding method based on the sub-hop structural information fusion can greatly improve the results of link prediction.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100351"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91599231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Facial Expression Recognition Approach for Social IoT Frameworks 基于社交物联网框架的面部表情识别方法
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100353
Silvio Barra , Sanoar Hossain , Chiara Pero , Saiyed Umer
{"title":"A Facial Expression Recognition Approach for Social IoT Frameworks","authors":"Silvio Barra ,&nbsp;Sanoar Hossain ,&nbsp;Chiara Pero ,&nbsp;Saiyed Umer","doi":"10.1016/j.bdr.2022.100353","DOIUrl":"10.1016/j.bdr.2022.100353","url":null,"abstract":"<div><p>Social IoT<span> has become a sensitive topic in the last years, mainly due to the attraction of social networks and the related digital activities amongst the population. These techniques are gaining even more importance in the current period, in which digital tools are the only ones allowed to maintain social distancing due to the COVID-19 restrictions. In order to aid patients and elderly people in-home healthcare context, this article explores the usage of facial patient images and emotional detection. In this regard, a Social IoT approach is proposed, which is based on a camera connected home, allowing medical examinations at a distance by keeping posted the preferred contacts of the patient. A facial expression analysis is done to infer the patient's emotional state, thus communicating to the doctor and the emergency contacts any change in the patient's state (pain, suffering, etc.). The proposed facial expression recognition system consists of three main steps: during the image preprocessing phase<span>, face detection and normalization are performed; the feature extraction process involves the computation of discriminative patterns using the Spatial Pyramid Technique; finally, an expression recognition model is built using a multi-class linear Support Vector Machine classifier. The performance of the proposed system has been tested on two challenging benchmarks for facial expression recognition, namely KDEF and GENKI-4K, which show that the proposed system overcomes state-of-the-art methods.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100353"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81811615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Segmented PageRank-Based Value Compensation Method for Personal Data in Alliance Blockchains 联盟区块链中基于分段pagerank的个人数据价值补偿方法
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100326
Chaoxia Qin , Bing Guo , Yun Zhang , Omar Cheikhrouhou , Yan Shen , Zhen Zhang , Hong Su
{"title":"A Segmented PageRank-Based Value Compensation Method for Personal Data in Alliance Blockchains","authors":"Chaoxia Qin ,&nbsp;Bing Guo ,&nbsp;Yun Zhang ,&nbsp;Omar Cheikhrouhou ,&nbsp;Yan Shen ,&nbsp;Zhen Zhang ,&nbsp;Hong Su","doi":"10.1016/j.bdr.2022.100326","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100326","url":null,"abstract":"<div><p><span>Alliance blockchains<span><span> provide a multi-party trusted data trading environment, promoting the development of the data trading market in which the value compensation for personal data is still a key issue. However, limited by the data format and content, traditional attempts on data value compensation cannot form a widely applicable solution. Therefore, we propose a universal value compensation method for personal data in alliance blockchains. The basic idea of this method is to evaluate the value weight of data based on the </span>collaborative relationship of data value. First, we construct a Data Collaboration Markov Model (DCMM) to formalize the collaboration network of data value. Then, aiming at data collaboration networks with different structures, the corresponding Segmented PageRank (SPR) algorithm is proposed. SPR can universally evaluate the value weight of each data account without being subjected to the data format or content. Finally, we theoretically deduce that the time complexity and space complexity of SPR algorithm are respectively </span></span><span><math><mn>1</mn><mo>/</mo><mi>K</mi></math></span> and <span><math><mn>1</mn><mo>/</mo><msup><mrow><mi>K</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span><span> taken by PageRank algorithm. Experiments show the feasibility and superior performance of SPR.</span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100326"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91599229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Evaluating Standard Feature Sets Towards Increased Generalisability and Explainability of ML-Based Network Intrusion Detection 评估标准特征集以提高基于ml的网络入侵检测的通用性和可解释性
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100359
Mohanad Sarhan, Siamak Layeghy, Marius Portmann
{"title":"Evaluating Standard Feature Sets Towards Increased Generalisability and Explainability of ML-Based Network Intrusion Detection","authors":"Mohanad Sarhan,&nbsp;Siamak Layeghy,&nbsp;Marius Portmann","doi":"10.1016/j.bdr.2022.100359","DOIUrl":"10.1016/j.bdr.2022.100359","url":null,"abstract":"<div><p>Machine Learning<span><span> (ML)-based network intrusion detection systems bring many benefits for enhancing the cybersecurity posture of an organisation. Many systems have been designed and developed in the research community, often achieving a close to perfect detection rate when evaluated using synthetic datasets. However, there are ongoing challenges with the development and evaluation of ML-based NIDSs; the limited ability of comprehensive evaluation of ML models and lack of understanding of internal ML operations. This paper overcomes the challenges by evaluating and explaining the generalisability of a common feature set to different network environments and attack scenarios. Two feature sets (NetFlow and CICFlowMeter) have been evaluated in terms of detection accuracy across three key datasets, i.e., CSE-CIC-IDS2018, BoT-IoT, and ToN-IoT. The results show the superiority of the NetFlow feature set in enhancing the ML model's detection accuracy of various network attacks. In addition, due to the complexity of the learning models, SHapley Additive exPlanations (SHAP), an </span>explainable AI methodology, has been adopted to explain and interpret the achieved classification decisions of ML models. The Shapley values of two common feature sets have been analysed across multiple datasets to determine the influence contributed by each feature towards the final ML prediction.</span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100359"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90904283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Satellite IoT Based Road Extraction from VHR Images Through Superpixel-CNN Architecture 基于卫星物联网的超像素cnn结构VHR图像道路提取
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100334
Tanmay Kumar Behera , Pankaj Kumar Sa , Michele Nappi , Sambit Bakshi
{"title":"Satellite IoT Based Road Extraction from VHR Images Through Superpixel-CNN Architecture","authors":"Tanmay Kumar Behera ,&nbsp;Pankaj Kumar Sa ,&nbsp;Michele Nappi ,&nbsp;Sambit Bakshi","doi":"10.1016/j.bdr.2022.100334","DOIUrl":"10.1016/j.bdr.2022.100334","url":null,"abstract":"<div><p><span>In the past few decades, technology has progressively become ineluctable in human lives, primarily due to the growth of certain fields like space technology, Big Data, the Internet of Things<span><span> (IoT), and machine learning. Space technology has revolutionized communication mechanisms while creating opportunities for various research areas, including remote sensing (RS)-inspired applications. On the other hand, IoT presents a platform to use the power of the internet over a whole range of devices through a phenomenon known as social IoT. These devices generate a humongous amount of data that requires handling and managing by big data technology incorporated with </span>deep learning techniques<span><span> to reduce the manual workload of an operator. Moreover, deep learning architectures like </span>convolutional neural networks<span><span> (CNNs) have presented a scope to extract the underlying features from the large-scale input images in providing better solutions for tasks such as automatic road detection that come at the cost of time and memory overhead. In this context, we have proposed a three-layer edge-fog-cloud-based intelligent satellite IoT architecture that uses the superpixel-based CNN approach. At the fog layer, the superpixel-based simple linear iterative cluster (SLIC) algorithm uses the images captured by the satellites of the edge level to produce the smaller-sized </span>superpixel<span> images that can be transferred even in a low bandwidth link. The CNN module at the cloud level is then trained with these superpixel images to predict the road networks from these </span></span></span></span></span>RS images. Two popular road datasets: the DeepGlobe Road dataset and the Massachusetts Road dataset, have been considered to prove the usefulness of the proposed SLIC-CNN architecture in satellite-based IoT platforms to address the problems like RS image-based road extraction. The proposed architecture achieves better performance accuracy than the classical CNN while reducing the incurred overhead by a noticeable limit.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100334"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88275113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Linked Open Government Data to Predict and Explain House Prices: The Case of Scottish Statistics Portal 链接开放政府数据预测和解释房价:苏格兰统计门户的案例
IF 3.3 3区 计算机科学
Big Data Research Pub Date : 2022-11-28 DOI: 10.1016/j.bdr.2022.100355
Areti Karamanou, Evangelos Kalampokis, Konstantinos Tarabanis
{"title":"Linked Open Government Data to Predict and Explain House Prices: The Case of Scottish Statistics Portal","authors":"Areti Karamanou,&nbsp;Evangelos Kalampokis,&nbsp;Konstantinos Tarabanis","doi":"10.1016/j.bdr.2022.100355","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100355","url":null,"abstract":"<div><p>Accurately estimating the prices of houses is important for various stakeholders including house owners, real estate agencies, government agencies, and policy-makers. Towards this end, traditional statistics and, only recently, advanced machine learning<span><span> and artificial intelligence<span> models are used. Open Government Data (OGD) have a huge potential especially when combined with AI technologies. OGD are often published as linked data to facilitate data integration and re-usability. </span></span>EXplainable Artificial Intelligence<span><span> (XAI) can be used by stakeholders to understand the decisions of a predictive model. This work creates a model that predicts house prices by applying machine learning on linked OGD. We present a case study that uses XGBoost, a powerful </span>machine learning algorithm, and linked OGD from the official Scottish data portal to predict the probability the mean prices of houses in the various data zones of Scotland to be higher than the average price in Scotland. XAI is also used to globally and locally explain the decisions of the model. The created model has Receiver Operating Characteristic (ROC) AUC score 0.923 and Precision Recall Curve (PRC) AUC score 0.891. According to XAI, the variable that mostly affects the decisions of the model is Comparative Illness Factor, an indicator of health conditions. However, local explainability shows that the decisions made in some data zones may be mostly affected by other variables such as the percent of detached dwellings and employment deprived population.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100355"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89991696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信