使用搜索引擎建模非组合表达式

Cheikh M. Bamba Dione, Christer Johansson
{"title":"使用搜索引擎建模非组合表达式","authors":"Cheikh M. Bamba Dione, Christer Johansson","doi":"10.29007/4JL9","DOIUrl":null,"url":null,"abstract":"Non-compositional multi-word expressions present great challenges to natural language processing applications. In this paper, we present a method for modeling non-compositional expressions based on the assumption that the meaning of expressions depends on context. Therefore, context words can be used to select documents and separate documents where the expression has different meanings. Deviation from a baseline is measured using serendipity (i.e. the pointwise effect size). We used this statistical measure to mark which patterns are over-and under-represented and to take a decision if the pattern under scrutiny belongs to the meaning selected by the context words or not. We used the Google search engine to find document frequency estimates. When used with Google document frequency estimates, the serendipity measure closely mirrors some human intuitions on the preferred alternative.","PeriodicalId":277939,"journal":{"name":"2018 9th International Conference on Awareness Science and Technology (iCAST)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Modeling Non-Compositional Expressions using a Search Engine\",\"authors\":\"Cheikh M. Bamba Dione, Christer Johansson\",\"doi\":\"10.29007/4JL9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Non-compositional multi-word expressions present great challenges to natural language processing applications. In this paper, we present a method for modeling non-compositional expressions based on the assumption that the meaning of expressions depends on context. Therefore, context words can be used to select documents and separate documents where the expression has different meanings. Deviation from a baseline is measured using serendipity (i.e. the pointwise effect size). We used this statistical measure to mark which patterns are over-and under-represented and to take a decision if the pattern under scrutiny belongs to the meaning selected by the context words or not. We used the Google search engine to find document frequency estimates. When used with Google document frequency estimates, the serendipity measure closely mirrors some human intuitions on the preferred alternative.\",\"PeriodicalId\":277939,\"journal\":{\"name\":\"2018 9th International Conference on Awareness Science and Technology (iCAST)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 9th International Conference on Awareness Science and Technology (iCAST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.29007/4JL9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 9th International Conference on Awareness Science and Technology (iCAST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29007/4JL9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

非组合多词表达式对自然语言处理的应用提出了巨大的挑战。在本文中,我们提出了一种基于表达式的意义依赖于上下文的假设来建模非组合表达式的方法。因此,上下文词可以用来选择文档,并将表达不同含义的文档分开。与基线的偏差是用偶然性(即逐点效应大小)来测量的。我们使用这种统计方法来标记哪些模式被过度代表,哪些模式未被充分代表,并决定被审查的模式是否属于上下文单词选择的含义。我们使用谷歌搜索引擎来查找文档频率估计。当与谷歌文档频率估计一起使用时,偶然性衡量标准密切反映了人类对首选替代方案的一些直觉。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Modeling Non-Compositional Expressions using a Search Engine
Non-compositional multi-word expressions present great challenges to natural language processing applications. In this paper, we present a method for modeling non-compositional expressions based on the assumption that the meaning of expressions depends on context. Therefore, context words can be used to select documents and separate documents where the expression has different meanings. Deviation from a baseline is measured using serendipity (i.e. the pointwise effect size). We used this statistical measure to mark which patterns are over-and under-represented and to take a decision if the pattern under scrutiny belongs to the meaning selected by the context words or not. We used the Google search engine to find document frequency estimates. When used with Google document frequency estimates, the serendipity measure closely mirrors some human intuitions on the preferred alternative.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信