{"title":"RECA: Related Tables Enhanced Column Semantic Type Annotation Framework","authors":"Yushi Sun, Hao Xin, Lei Chen","doi":"10.14778/3583140.3583149","DOIUrl":null,"url":null,"abstract":"Understanding the semantics of tabular data is of great importance in various downstream applications, such as schema matching, data cleaning, and data integration. Column semantic type annotation is a critical task in the semantic understanding of tabular data. Despite the fact that various approaches have been proposed, they are challenged by the difficulties of handling wide tables and incorporating complex inter-table context information. Failure to handle wide tables limits the usage of column type annotation approaches, while failure to incorporate inter-table context harms the annotation quality. Existing methods either completely ignore these problems or propose ad-hoc solutions. In this paper, we propose Related tables Enhanced Column semantic type Annotation framework (RECA), which incorporates inter-table context information by finding and aligning schema-similar and topic-relevant tables based on a novel named entity schema. The design of RECA can naturally handle wide tables and incorporate useful inter-table context information to enhance the annotation quality. We conduct extensive experiments on two web table datasets to comprehensively evaluate the performance of RECA. Our results show that RECA achieves support-weighted F1 scores of 0.853 and 0.937 with macro average F1 scores of 0.674 and 0.783 on the two datasets respectively, which outperform the state-of-the-art methods.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. VLDB Endow.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3583140.3583149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Understanding the semantics of tabular data is of great importance in various downstream applications, such as schema matching, data cleaning, and data integration. Column semantic type annotation is a critical task in the semantic understanding of tabular data. Despite the fact that various approaches have been proposed, they are challenged by the difficulties of handling wide tables and incorporating complex inter-table context information. Failure to handle wide tables limits the usage of column type annotation approaches, while failure to incorporate inter-table context harms the annotation quality. Existing methods either completely ignore these problems or propose ad-hoc solutions. In this paper, we propose Related tables Enhanced Column semantic type Annotation framework (RECA), which incorporates inter-table context information by finding and aligning schema-similar and topic-relevant tables based on a novel named entity schema. The design of RECA can naturally handle wide tables and incorporate useful inter-table context information to enhance the annotation quality. We conduct extensive experiments on two web table datasets to comprehensively evaluate the performance of RECA. Our results show that RECA achieves support-weighted F1 scores of 0.853 and 0.937 with macro average F1 scores of 0.674 and 0.783 on the two datasets respectively, which outperform the state-of-the-art methods.