FlashSchema: Achieving High Quality XML Schemas with Powerful Inference Algorithms and Large-scale Schema Data

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI:10.1109/ICDE48307.2020.00214

Yeting Li, Jialun Cao, H. Chen, Tingjian Ge, Zhiwu Xu, Qiancheng Peng

{"title":"FlashSchema: Achieving High Quality XML Schemas with Powerful Inference Algorithms and Large-scale Schema Data","authors":"Yeting Li, Jialun Cao, H. Chen, Tingjian Ge, Zhiwu Xu, Qiancheng Peng","doi":"10.1109/ICDE48307.2020.00214","DOIUrl":null,"url":null,"abstract":"Getting high quality XML schemas to avoid or reduce application risks is an important problem in practice, for which some important aspects have yet to be addressed satisfactorily in existing work. In this paper, we propose a tool FlashSchema for high quality XML schema design, which supports both one-pass and interactive schema design and schema recommendation. To the best of our knowledge, no other existing tools support interactive schema design and schema recommendation. One salient feature of our work is the design of algorithms to infer k-occurrence interleaving regular expressions, which are not only more powerful in model capacity, but also more efficient. Additionally, such algorithms form the basis of our interactive schema design. The other feature is that, starting from large-scale schema data that we have harvested from the Web, we devise a new solution for type inference, as well as propose schema recommendation for schema design. Finally, we conduct a series of experiments on two XML datasets, comparing with 9 state-of-the-art algorithms and open-source tools in terms of running time, preciseness, and conciseness. Experimental results show that our work achieves the highest level of preciseness and conciseness within only a few seconds. Experimental results and examples also demonstrate the effectiveness of our type inference and schema recommendation methods.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"820 1","pages":"1962-1965"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE48307.2020.00214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Getting high quality XML schemas to avoid or reduce application risks is an important problem in practice, for which some important aspects have yet to be addressed satisfactorily in existing work. In this paper, we propose a tool FlashSchema for high quality XML schema design, which supports both one-pass and interactive schema design and schema recommendation. To the best of our knowledge, no other existing tools support interactive schema design and schema recommendation. One salient feature of our work is the design of algorithms to infer k-occurrence interleaving regular expressions, which are not only more powerful in model capacity, but also more efficient. Additionally, such algorithms form the basis of our interactive schema design. The other feature is that, starting from large-scale schema data that we have harvested from the Web, we devise a new solution for type inference, as well as propose schema recommendation for schema design. Finally, we conduct a series of experiments on two XML datasets, comparing with 9 state-of-the-art algorithms and open-source tools in terms of running time, preciseness, and conciseness. Experimental results show that our work achieves the highest level of preciseness and conciseness within only a few seconds. Experimental results and examples also demonstrate the effectiveness of our type inference and schema recommendation methods.

查看原文本刊更多论文

FlashSchema:通过强大的推理算法和大规模的模式数据实现高质量的XML模式

在实践中，获得高质量的XML模式以避免或减少应用程序风险是一个重要的问题，在现有的工作中，一些重要的方面还没有得到令人满意的解决。在本文中，我们提出了一个用于高质量XML模式设计的工具FlashSchema，该工具支持一次性和交互式模式设计以及模式推荐。据我们所知，没有其他现有工具支持交互式模式设计和模式推荐。我们工作的一个显著特征是设计了推断k次出现的交错正则表达式的算法，这不仅在模型容量上更强大，而且效率更高。此外，这些算法构成了交互式模式设计的基础。另一个特点是，从我们从Web上获得的大规模模式数据出发，我们设计了一种新的类型推断解决方案，并为模式设计提出了模式建议。最后，我们在两个XML数据集上进行了一系列实验，在运行时间、精确性和简洁性方面与9种最先进的算法和开源工具进行了比较。实验结果表明，我们的工作在几秒钟内就达到了最高的精确度和简洁性。实验结果和实例也证明了我们的类型推理和模式推荐方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 36th International Conference on Data Engineering (ICDE)

自引率

0.00%

发文量