{"title":"实用的网络数据提取:我们到了吗?-简短调查","authors":"Andreas Schulz, Jörg Lässig, M. Gaedke","doi":"10.1109/WI.2016.0096","DOIUrl":null,"url":null,"abstract":"The number of web documents as well as the inherent data and information is growing at a rapid pace. The interest in extracting and utilizing this data is rising likewise. The prospects that are unlocked by Web Data Extraction to its users are as broad as the extensiveness of topics and fields on the Web. The major obstacle is to utilize the available data, contents and processes. Several, mostly older survey papers have already shown developments and approaches to solve Web Data Extraction tasks, but there is a need for a more up-to-date review, showing the latest developments. Additionally when looking from the user perspective, there is still a gap between research results and practical applicability. Available solutions, including research results, commercial products and open source solutions lack certain capabilities or suffer from severe usability issues. This paper therefore gives a short review of the state of the art in Web Data Extraction and relates this to the practical application of these technologies.","PeriodicalId":6513,"journal":{"name":"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"29 1","pages":"562-567"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Practical Web Data Extraction: Are We There Yet? - A Short Survey\",\"authors\":\"Andreas Schulz, Jörg Lässig, M. Gaedke\",\"doi\":\"10.1109/WI.2016.0096\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The number of web documents as well as the inherent data and information is growing at a rapid pace. The interest in extracting and utilizing this data is rising likewise. The prospects that are unlocked by Web Data Extraction to its users are as broad as the extensiveness of topics and fields on the Web. The major obstacle is to utilize the available data, contents and processes. Several, mostly older survey papers have already shown developments and approaches to solve Web Data Extraction tasks, but there is a need for a more up-to-date review, showing the latest developments. Additionally when looking from the user perspective, there is still a gap between research results and practical applicability. Available solutions, including research results, commercial products and open source solutions lack certain capabilities or suffer from severe usability issues. This paper therefore gives a short review of the state of the art in Web Data Extraction and relates this to the practical application of these technologies.\",\"PeriodicalId\":6513,\"journal\":{\"name\":\"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)\",\"volume\":\"29 1\",\"pages\":\"562-567\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI.2016.0096\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2016.0096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
摘要
网络文档的数量以及固有的数据和信息正在快速增长。对提取和利用这些数据的兴趣也在上升。Web Data Extraction为用户打开的前景就像Web上的主题和领域一样广阔。主要的障碍是如何利用现有的数据、内容和流程。一些(主要是较老的)调查论文已经展示了解决Web Data Extraction任务的发展和方法,但是需要一个更新的综述,展示最新的发展。此外,从用户的角度来看,研究成果与实际适用性之间还存在差距。可用的解决方案,包括研究成果、商业产品和开源解决方案缺乏某些功能,或者存在严重的可用性问题。因此,本文简要回顾了Web数据提取技术的现状,并将其与这些技术的实际应用联系起来。
Practical Web Data Extraction: Are We There Yet? - A Short Survey
The number of web documents as well as the inherent data and information is growing at a rapid pace. The interest in extracting and utilizing this data is rising likewise. The prospects that are unlocked by Web Data Extraction to its users are as broad as the extensiveness of topics and fields on the Web. The major obstacle is to utilize the available data, contents and processes. Several, mostly older survey papers have already shown developments and approaches to solve Web Data Extraction tasks, but there is a need for a more up-to-date review, showing the latest developments. Additionally when looking from the user perspective, there is still a gap between research results and practical applicability. Available solutions, including research results, commercial products and open source solutions lack certain capabilities or suffer from severe usability issues. This paper therefore gives a short review of the state of the art in Web Data Extraction and relates this to the practical application of these technologies.