挖掘Web以从文本数据生成主题元数据

Chien-Chung Huang, Shui-Lung Chuang, Lee-Feng Chien
{"title":"挖掘Web以从文本数据生成主题元数据","authors":"Chien-Chung Huang, Shui-Lung Chuang, Lee-Feng Chien","doi":"10.1109/ICDE.2004.1320065","DOIUrl":null,"url":null,"abstract":"Conventional tools for automatic metadata creation mostly extract named entities or patterns from texts and annotate them with information about persons, locations, dates, and so on. However, this kind of entity type information is often too primitive for more advanced intelligent applications such as concept-based search. Here, we try to generate semantically-deep metadata with limited human intervention. The main idea behind our approach is to use Web mining and categorization techniques to create thematic metadata. The proposed approach, comprises of three computational modules: feature extraction, HCQF (hier-concept query formulation) and text instance categorization. The feature extraction module sends the name of text instances to Web search engines, and the returned highly-ranked search-result pages are used to describe them.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Mining the Web for generating thematic metadata from textual data\",\"authors\":\"Chien-Chung Huang, Shui-Lung Chuang, Lee-Feng Chien\",\"doi\":\"10.1109/ICDE.2004.1320065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Conventional tools for automatic metadata creation mostly extract named entities or patterns from texts and annotate them with information about persons, locations, dates, and so on. However, this kind of entity type information is often too primitive for more advanced intelligent applications such as concept-based search. Here, we try to generate semantically-deep metadata with limited human intervention. The main idea behind our approach is to use Web mining and categorization techniques to create thematic metadata. The proposed approach, comprises of three computational modules: feature extraction, HCQF (hier-concept query formulation) and text instance categorization. The feature extraction module sends the name of text instances to Web search engines, and the returned highly-ranked search-result pages are used to describe them.\",\"PeriodicalId\":358862,\"journal\":{\"name\":\"Proceedings. 20th International Conference on Data Engineering\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. 20th International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2004.1320065\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 20th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2004.1320065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

用于自动创建元数据的传统工具主要是从文本中提取命名实体或模式,并用有关人员、位置、日期等信息对它们进行注释。然而,对于更高级的智能应用程序(如基于概念的搜索)来说,这种实体类型信息通常过于原始。在这里,我们尝试在有限的人为干预下生成语义深度的元数据。我们的方法背后的主要思想是使用Web挖掘和分类技术来创建主题元数据。该方法包括三个计算模块:特征提取、HCQF(层次概念查询公式)和文本实例分类。特征提取模块将文本实例的名称发送给Web搜索引擎,并使用返回的高排名搜索结果页面来描述它们。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Mining the Web for generating thematic metadata from textual data
Conventional tools for automatic metadata creation mostly extract named entities or patterns from texts and annotate them with information about persons, locations, dates, and so on. However, this kind of entity type information is often too primitive for more advanced intelligent applications such as concept-based search. Here, we try to generate semantically-deep metadata with limited human intervention. The main idea behind our approach is to use Web mining and categorization techniques to create thematic metadata. The proposed approach, comprises of three computational modules: feature extraction, HCQF (hier-concept query formulation) and text instance categorization. The feature extraction module sends the name of text instances to Web search engines, and the returned highly-ranked search-result pages are used to describe them.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信