基于Web使用挖掘、Web抓取和语义注释的信息提取

2011 International Conference on Computational Intelligence and Communication Networks Pub Date : 2011-10-07 DOI:10.1109/CICN.2011.97

S. K. Malik, S. Rizvi

{"title":"基于Web使用挖掘、Web抓取和语义注释的信息提取","authors":"S. K. Malik, S. Rizvi","doi":"10.1109/CICN.2011.97","DOIUrl":null,"url":null,"abstract":"Extracting useful information from the web is the most significant issue of concern for the realization of semantic web. This may be achieved by several ways among which Web Usage Mining, Web Scrapping and Semantic Annotation plays an important role. Web mining enables to find out the relevant results from the web and is used to extract meaningful information from the discovery patterns kept back in the servers. Web usage mining is a type of web mining which mines the information of access routes/manners of users visiting the web sites. Web scraping, another technique, is a process of extracting useful information from HTML pages which may be implemented using a scripting language known as Prolog Server Pages(PSP) based on Prolog. Third, Semantic annotation is a technique which makes it possible to add semantics and a formal structure to unstructured textual documents, an important aspect in semantic information extraction which may be performed by a tool known as KIM(Knowledge Information Management). In this paper, we revisit, explore and discuss some information extraction techniques on web like web usage mining, web scrapping and semantic annotation for a better or efficient information extraction on the web illustrated with examples.","PeriodicalId":292190,"journal":{"name":"2011 International Conference on Computational Intelligence and Communication Networks","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"68","resultStr":"{\"title\":\"Information Extraction Using Web Usage Mining, Web Scrapping and Semantic Annotation\",\"authors\":\"S. K. Malik, S. Rizvi\",\"doi\":\"10.1109/CICN.2011.97\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Extracting useful information from the web is the most significant issue of concern for the realization of semantic web. This may be achieved by several ways among which Web Usage Mining, Web Scrapping and Semantic Annotation plays an important role. Web mining enables to find out the relevant results from the web and is used to extract meaningful information from the discovery patterns kept back in the servers. Web usage mining is a type of web mining which mines the information of access routes/manners of users visiting the web sites. Web scraping, another technique, is a process of extracting useful information from HTML pages which may be implemented using a scripting language known as Prolog Server Pages(PSP) based on Prolog. Third, Semantic annotation is a technique which makes it possible to add semantics and a formal structure to unstructured textual documents, an important aspect in semantic information extraction which may be performed by a tool known as KIM(Knowledge Information Management). In this paper, we revisit, explore and discuss some information extraction techniques on web like web usage mining, web scrapping and semantic annotation for a better or efficient information extraction on the web illustrated with examples.\",\"PeriodicalId\":292190,\"journal\":{\"name\":\"2011 International Conference on Computational Intelligence and Communication Networks\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"68\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Computational Intelligence and Communication Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICN.2011.97\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Computational Intelligence and Communication Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICN.2011.97","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 68

摘要

从web中提取有用的信息是实现语义web最重要的问题。这可以通过几种方法来实现，其中Web Usage Mining、Web Scrapping和Semantic Annotation起着重要的作用。Web挖掘能够从Web中找到相关的结果，并用于从保存在服务器中的发现模式中提取有意义的信息。Web使用挖掘是一种挖掘用户访问网站的路径/方式信息的Web挖掘方法。网络抓取是另一种技术，它是从HTML页面中提取有用信息的过程，可以使用基于Prolog的脚本语言Prolog Server pages (PSP)来实现。第三，语义注释是一种为非结构化文本文档添加语义和正式结构的技术，这是语义信息提取的一个重要方面，可以通过称为KIM(知识信息管理)的工具来执行。本文对web上的一些信息提取技术进行了回顾、探索和讨论，如web使用挖掘、web废弃和语义注释等，以期更好或更有效地在web上进行信息提取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Information Extraction Using Web Usage Mining, Web Scrapping and Semantic Annotation

Extracting useful information from the web is the most significant issue of concern for the realization of semantic web. This may be achieved by several ways among which Web Usage Mining, Web Scrapping and Semantic Annotation plays an important role. Web mining enables to find out the relevant results from the web and is used to extract meaningful information from the discovery patterns kept back in the servers. Web usage mining is a type of web mining which mines the information of access routes/manners of users visiting the web sites. Web scraping, another technique, is a process of extracting useful information from HTML pages which may be implemented using a scripting language known as Prolog Server Pages(PSP) based on Prolog. Third, Semantic annotation is a technique which makes it possible to add semantics and a formal structure to unstructured textual documents, an important aspect in semantic information extraction which may be performed by a tool known as KIM(Knowledge Information Management). In this paper, we revisit, explore and discuss some information extraction techniques on web like web usage mining, web scrapping and semantic annotation for a better or efficient information extraction on the web illustrated with examples.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 International Conference on Computational Intelligence and Communication Networks

自引率

0.00%

发文量