A new study on using HTML structures to improve retrieval

Proceedings 11th International Conference on Tools with Artificial Intelligence Pub Date : 1999-11-08 DOI:10.1109/TAI.1999.809831

M. Cutler, H. Deng, S. Maniccam, W. Meng

引用次数: 44

Abstract

Locating useful information effectively form the World Wide Web (WWW) is of wide interest. This paper presents new results on a methodology of using the structures and hyperlinks of HTML documents to improve the effectiveness of retrieving HTML documents. This methodology partitions the occurrences of terms in a document collection into classes according to the tags in which a particular term appears (such as Title, H1-H6, and Anchor). The rationale is that terms appearing in different structures of a document may have different significance in identifying the document. The weighting schemes of traditional information retrieval were extended to include class importance values. We implemented a genetic algorithm to determine a "best so far" class importance factor combination. Our experiments indicate that using this technique the retrieval effectiveness can be improved by 39.6% or higher.

查看原文本刊更多论文

使用HTML结构改进检索的新研究

从万维网(WWW)中有效地定位有用的信息是人们广泛关注的问题。本文提出了一种利用HTML文档的结构和超链接来提高检索HTML文档效率的方法。这种方法根据出现特定术语的标记(如Title、H1-H6和Anchor)，将文档集合中出现的术语划分为类。其基本原理是，在文档的不同结构中出现的术语在识别文档时可能具有不同的意义。将传统信息检索的权重方案扩展到包含类重要值。我们实现了一种遗传算法来确定“迄今为止最佳”类重要因子组合。实验表明，采用该方法，检索效率可提高39.6%以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 11th International Conference on Tools with Artificial Intelligence

自引率

0.00%

发文量