Building a web thesaurus from web link structure

Zheng Chen, Shengping Liu, Wenyin Liu, G. Pu, Wei-Ying Ma
{"title":"Building a web thesaurus from web link structure","authors":"Zheng Chen, Shengping Liu, Wenyin Liu, G. Pu, Wei-Ying Ma","doi":"10.1145/860435.860447","DOIUrl":null,"url":null,"abstract":"Thesaurus has been widely used in many applications, including information retrieval, natural language processing, and question answering. In this paper, we propose a novel approach to automatically constructing a domain-specific thesaurus from the Web using link structure information. The proposed approach is able to identify new terms and reflect the latest relationship between terms as the Web evolves. First, a set of high quality and representative websites of a specific domain is selected. After filtering out navigational links, link analysis is applied to each website to obtain its content structure. Finally, the thesaurus is constructed by merging the content structures of the selected websites. The experimental results on automatic query expansion based on our constructed thesaurus show 20% improvement in search precision compared to the baseline.","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"160 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"80","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/860435.860447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 80

Abstract

Thesaurus has been widely used in many applications, including information retrieval, natural language processing, and question answering. In this paper, we propose a novel approach to automatically constructing a domain-specific thesaurus from the Web using link structure information. The proposed approach is able to identify new terms and reflect the latest relationship between terms as the Web evolves. First, a set of high quality and representative websites of a specific domain is selected. After filtering out navigational links, link analysis is applied to each website to obtain its content structure. Finally, the thesaurus is constructed by merging the content structures of the selected websites. The experimental results on automatic query expansion based on our constructed thesaurus show 20% improvement in search precision compared to the baseline.
从网络链接结构建立一个网络词典
同义词词典在信息检索、自然语言处理和问题回答等领域得到了广泛的应用。在本文中,我们提出了一种利用链接结构信息从Web自动构建特定领域的词库的新方法。所建议的方法能够识别新的术语,并随着Web的发展反映术语之间的最新关系。首先,选择一组高质量且具有代表性的特定领域网站。在过滤出导航链接后,对每个网站进行链接分析,得到其内容结构。最后,通过合并所选网站的内容结构构建同义词典。实验结果表明,基于我们构建的词库的自动查询扩展,搜索精度比基线提高了20%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信