{"title":"Data Source Selection in Big Data Context","authors":"Hicham Moad Safhi, B. Frikh, B. Ouhbi","doi":"10.1145/3366030.3366121","DOIUrl":null,"url":null,"abstract":"Big Data presents promising technological and economical opportunities. In fact, it has become the raw material of production for many organizations. Data is available in large quantities, and it continues generating abundantly. However, not all the data will have valuable knowledge. Unreliable sources provide misleading and biased information, and even reliable sources could suffer from low data quality. In this paper, we propose a novel methodology for the selectability of data sources, by both considering the presence and the absence of users' preferences. The proposed model integrates multiple factors that affect the reliability of data sources, including their quality, gain, cost and coverage. Experimental results on real world data-sets, show its capability to find the subset of relevant and reliable sources with the lowest cost.","PeriodicalId":446280,"journal":{"name":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3366030.3366121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Big Data presents promising technological and economical opportunities. In fact, it has become the raw material of production for many organizations. Data is available in large quantities, and it continues generating abundantly. However, not all the data will have valuable knowledge. Unreliable sources provide misleading and biased information, and even reliable sources could suffer from low data quality. In this paper, we propose a novel methodology for the selectability of data sources, by both considering the presence and the absence of users' preferences. The proposed model integrates multiple factors that affect the reliability of data sources, including their quality, gain, cost and coverage. Experimental results on real world data-sets, show its capability to find the subset of relevant and reliable sources with the lowest cost.