{"title":"Web data extraction techniques: A review","authors":"N. V. Kamanwar, S. Kale","doi":"10.1109/STARTUP.2016.7583910","DOIUrl":null,"url":null,"abstract":"Web data extraction is the process of extracting user required information from websites. The web document contains data which is not in structured format. From the word web data extraction, we mean the extraction of data that is present in the web documents in HTML format. Then removing the unwanted stuff such as tags, advertisements, videos and so on. Then learning the information or patterns or features present in that data. Today, most researchers uses web data extractors because the internet contains huge data which makes the process of manual information extraction from the web documents complicated. In this paper, we have studied about different techniques for data extraction used by different authors that takes the user required data from a set of web pages. A comparative analysis of web data extraction techniques is given.","PeriodicalId":355852,"journal":{"name":"2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/STARTUP.2016.7583910","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Web data extraction is the process of extracting user required information from websites. The web document contains data which is not in structured format. From the word web data extraction, we mean the extraction of data that is present in the web documents in HTML format. Then removing the unwanted stuff such as tags, advertisements, videos and so on. Then learning the information or patterns or features present in that data. Today, most researchers uses web data extractors because the internet contains huge data which makes the process of manual information extraction from the web documents complicated. In this paper, we have studied about different techniques for data extraction used by different authors that takes the user required data from a set of web pages. A comparative analysis of web data extraction techniques is given.