{"title":"Semantic Data Understanding with Character Level Learning","authors":"Michael J. Mior, K. Pu","doi":"10.1109/IRI49571.2020.00043","DOIUrl":null,"url":null,"abstract":"Databases are growing in size and complexity. With the emergence of data lakes, databases have become open, fast evolving and highly heterogeneous. Understanding the complex relationships among different entity types in such scenarios is both challenging and necessary to data scientists. We propose an approach that utilizes a convolutional neural network to learn patterns associated with each entity type in the database at the character level. We demonstrate that the learned character-level patterns can capture sufficient semantic information for many useful applications including data lake schema exploration, and interactive data cleaning.","PeriodicalId":93159,"journal":{"name":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","volume":"48 1","pages":"253-258"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science : IRI 2020 : proceedings : virtual conference, 11-13 August 2020. IEEE International Conference on Information Reuse and Integration (21st : 2...","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI49571.2020.00043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Databases are growing in size and complexity. With the emergence of data lakes, databases have become open, fast evolving and highly heterogeneous. Understanding the complex relationships among different entity types in such scenarios is both challenging and necessary to data scientists. We propose an approach that utilizes a convolutional neural network to learn patterns associated with each entity type in the database at the character level. We demonstrate that the learned character-level patterns can capture sufficient semantic information for many useful applications including data lake schema exploration, and interactive data cleaning.