Automatic Extraction of Meaning from the Web

Rudi Cilibrasi, P. Vitányi
{"title":"Automatic Extraction of Meaning from the Web","authors":"Rudi Cilibrasi, P. Vitányi","doi":"10.1109/ISIT.2006.261979","DOIUrl":null,"url":null,"abstract":"We consider similarity distances for two types of objects: literal objects that as such contain all of their meaning, like genomes or books, and names for objects. The latter may have literal embodiments like the first type, but may also be abstract like \"red\" or \"Christianity\". For the first type we consider a family of computable distance measures corresponding to parameters expressing similarity according to particular features between pairs of literal objects. For the second type we consider similarity distances generated by Web users corresponding to particular semantic relations between the (names for) the designated objects. For both families we give universal similarity distance measures, incorporating all particular distance measures in the family. In the first case the universal distance is based on compression and in the second case it is based on Google page counts related to search terms. In both cases experiments on a massive scale give evidence of the viability of the approaches","PeriodicalId":115298,"journal":{"name":"2006 IEEE International Symposium on Information Theory","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE International Symposium on Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIT.2006.261979","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30

Abstract

We consider similarity distances for two types of objects: literal objects that as such contain all of their meaning, like genomes or books, and names for objects. The latter may have literal embodiments like the first type, but may also be abstract like "red" or "Christianity". For the first type we consider a family of computable distance measures corresponding to parameters expressing similarity according to particular features between pairs of literal objects. For the second type we consider similarity distances generated by Web users corresponding to particular semantic relations between the (names for) the designated objects. For both families we give universal similarity distance measures, incorporating all particular distance measures in the family. In the first case the universal distance is based on compression and in the second case it is based on Google page counts related to search terms. In both cases experiments on a massive scale give evidence of the viability of the approaches
自动从网络中提取意义
我们考虑两种对象的相似距离:一种是包含其所有含义的文字对象,如基因组或书籍,另一种是对象的名称。后者可能有像第一种类型的文字体现,但也可能是抽象的,如“红色”或“基督教”。对于第一种类型,我们考虑一组可计算的距离度量,对应于根据文字对象对之间的特定特征表示相似性的参数。对于第二种类型,我们考虑对应于指定对象(名称)之间特定语义关系的Web用户生成的相似距离。对于这两个家庭,我们给出了普遍的相似距离度量,包括家庭中所有特定的距离度量。在第一种情况下,通用距离是基于压缩的,在第二种情况下,它是基于与搜索词相关的谷歌页面数。在这两种情况下,大规模的实验都证明了这些方法的可行性
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信