{"title":"FlashSchema:通过强大的推理算法和大规模的模式数据实现高质量的XML模式","authors":"Yeting Li, Jialun Cao, H. Chen, Tingjian Ge, Zhiwu Xu, Qiancheng Peng","doi":"10.1109/ICDE48307.2020.00214","DOIUrl":null,"url":null,"abstract":"Getting high quality XML schemas to avoid or reduce application risks is an important problem in practice, for which some important aspects have yet to be addressed satisfactorily in existing work. In this paper, we propose a tool FlashSchema for high quality XML schema design, which supports both one-pass and interactive schema design and schema recommendation. To the best of our knowledge, no other existing tools support interactive schema design and schema recommendation. One salient feature of our work is the design of algorithms to infer k-occurrence interleaving regular expressions, which are not only more powerful in model capacity, but also more efficient. Additionally, such algorithms form the basis of our interactive schema design. The other feature is that, starting from large-scale schema data that we have harvested from the Web, we devise a new solution for type inference, as well as propose schema recommendation for schema design. Finally, we conduct a series of experiments on two XML datasets, comparing with 9 state-of-the-art algorithms and open-source tools in terms of running time, preciseness, and conciseness. Experimental results show that our work achieves the highest level of preciseness and conciseness within only a few seconds. Experimental results and examples also demonstrate the effectiveness of our type inference and schema recommendation methods.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"820 1","pages":"1962-1965"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"FlashSchema: Achieving High Quality XML Schemas with Powerful Inference Algorithms and Large-scale Schema Data\",\"authors\":\"Yeting Li, Jialun Cao, H. Chen, Tingjian Ge, Zhiwu Xu, Qiancheng Peng\",\"doi\":\"10.1109/ICDE48307.2020.00214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Getting high quality XML schemas to avoid or reduce application risks is an important problem in practice, for which some important aspects have yet to be addressed satisfactorily in existing work. In this paper, we propose a tool FlashSchema for high quality XML schema design, which supports both one-pass and interactive schema design and schema recommendation. To the best of our knowledge, no other existing tools support interactive schema design and schema recommendation. One salient feature of our work is the design of algorithms to infer k-occurrence interleaving regular expressions, which are not only more powerful in model capacity, but also more efficient. Additionally, such algorithms form the basis of our interactive schema design. The other feature is that, starting from large-scale schema data that we have harvested from the Web, we devise a new solution for type inference, as well as propose schema recommendation for schema design. Finally, we conduct a series of experiments on two XML datasets, comparing with 9 state-of-the-art algorithms and open-source tools in terms of running time, preciseness, and conciseness. Experimental results show that our work achieves the highest level of preciseness and conciseness within only a few seconds. Experimental results and examples also demonstrate the effectiveness of our type inference and schema recommendation methods.\",\"PeriodicalId\":6709,\"journal\":{\"name\":\"2020 IEEE 36th International Conference on Data Engineering (ICDE)\",\"volume\":\"820 1\",\"pages\":\"1962-1965\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 36th International Conference on Data Engineering (ICDE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE48307.2020.00214\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE48307.2020.00214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
FlashSchema: Achieving High Quality XML Schemas with Powerful Inference Algorithms and Large-scale Schema Data
Getting high quality XML schemas to avoid or reduce application risks is an important problem in practice, for which some important aspects have yet to be addressed satisfactorily in existing work. In this paper, we propose a tool FlashSchema for high quality XML schema design, which supports both one-pass and interactive schema design and schema recommendation. To the best of our knowledge, no other existing tools support interactive schema design and schema recommendation. One salient feature of our work is the design of algorithms to infer k-occurrence interleaving regular expressions, which are not only more powerful in model capacity, but also more efficient. Additionally, such algorithms form the basis of our interactive schema design. The other feature is that, starting from large-scale schema data that we have harvested from the Web, we devise a new solution for type inference, as well as propose schema recommendation for schema design. Finally, we conduct a series of experiments on two XML datasets, comparing with 9 state-of-the-art algorithms and open-source tools in terms of running time, preciseness, and conciseness. Experimental results show that our work achieves the highest level of preciseness and conciseness within only a few seconds. Experimental results and examples also demonstrate the effectiveness of our type inference and schema recommendation methods.