{"title":"IDQMS: AN INTELLIGENT DATA QUALITY MANAGEMENT SYSTEM TOOL","authors":"A. Salem, F. Boufarès","doi":"10.33965/ac2019_201912l001","DOIUrl":null,"url":null,"abstract":"Today, the quantity of data continues to increase; furthermore, the data are distributed and heterogeneous, from multiple sources (structured, semi-structured and unstructured) and with different levels of quality. Therefore, it is very likely to manipulate data without knowledge about their structures and their semantics. The subject covered in this paper aims at assisting the user in its quality approach. The data must be related to its semantic meaning, data types, constraints, comments and origin. We deal with the semantic schema recognition of a data source. It consists of categorizing the data by assigning it to a category and possibly a sub-category, and secondly, of establishing relations between columns and possibly discovering the semantics of the manipulated data source. These links detected between columns offer a better understanding of the source and the alternatives for correcting data.","PeriodicalId":432605,"journal":{"name":"Proceedings of the 16th International Conference on Applied Computing 2019","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th International Conference on Applied Computing 2019","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33965/ac2019_201912l001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Today, the quantity of data continues to increase; furthermore, the data are distributed and heterogeneous, from multiple sources (structured, semi-structured and unstructured) and with different levels of quality. Therefore, it is very likely to manipulate data without knowledge about their structures and their semantics. The subject covered in this paper aims at assisting the user in its quality approach. The data must be related to its semantic meaning, data types, constraints, comments and origin. We deal with the semantic schema recognition of a data source. It consists of categorizing the data by assigning it to a category and possibly a sub-category, and secondly, of establishing relations between columns and possibly discovering the semantics of the manipulated data source. These links detected between columns offer a better understanding of the source and the alternatives for correcting data.