An automatic corpus based method for a building Multiple Fuzzy Word Dataset

David Chandran, Keeley A. Crockett, D. Mclean, Alan Crispin
{"title":"An automatic corpus based method for a building Multiple Fuzzy Word Dataset","authors":"David Chandran, Keeley A. Crockett, D. Mclean, Alan Crispin","doi":"10.1109/FUZZ-IEEE.2015.7337877","DOIUrl":null,"url":null,"abstract":"Fuzzy sentence semantic similarity measures are designed to be applied to real world problems where a computer system is required to assess the similarity between human natural language and words or prototype sentences stored within a knowledge base. Such measures are often developed for a specific corpus/domain where a limited set of words and sentences are evaluated. As new “fuzzy” measures are developed the research challenge is on how to evaluate them. Traditional approaches have involved rigorous and complex human involvement in compiling benchmark datasets and obtaining human similarity measures. Existing datasets often contain limited fuzzy words and do allow the fuzzy measures to be exhaustively tested. This paper presents an automatic method for the generation of a Multiple Fuzzy Word Dataset (MFWD) from a corpus. A Fuzzy Sentence Pairing Algorithm is used to extract and augment high, medium and low similarity sentence pairs with multiple fuzzy words. Human ratings are collected through crowdsourcing and the MFWD is evaluated using both fuzzy and traditional sentence similarity measures. The results indicated that fuzzy measures returned a higher correlation with human ratings compared with traditional measures.","PeriodicalId":185191,"journal":{"name":"2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)","volume":"411 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FUZZ-IEEE.2015.7337877","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Fuzzy sentence semantic similarity measures are designed to be applied to real world problems where a computer system is required to assess the similarity between human natural language and words or prototype sentences stored within a knowledge base. Such measures are often developed for a specific corpus/domain where a limited set of words and sentences are evaluated. As new “fuzzy” measures are developed the research challenge is on how to evaluate them. Traditional approaches have involved rigorous and complex human involvement in compiling benchmark datasets and obtaining human similarity measures. Existing datasets often contain limited fuzzy words and do allow the fuzzy measures to be exhaustively tested. This paper presents an automatic method for the generation of a Multiple Fuzzy Word Dataset (MFWD) from a corpus. A Fuzzy Sentence Pairing Algorithm is used to extract and augment high, medium and low similarity sentence pairs with multiple fuzzy words. Human ratings are collected through crowdsourcing and the MFWD is evaluated using both fuzzy and traditional sentence similarity measures. The results indicated that fuzzy measures returned a higher correlation with human ratings compared with traditional measures.
基于语料库的多模糊词数据集自动构建方法
模糊句子语义相似度度量的设计是为了应用于需要计算机系统评估人类自然语言与知识库中存储的单词或原型句子之间的相似度的现实世界问题。这些措施通常是针对特定的语料库/领域开发的,其中评估有限的单词和句子集。随着新的“模糊”度量方法的发展,如何对其进行评价是研究的挑战。传统的方法在编译基准数据集和获得人类相似性度量方面涉及严格和复杂的人类参与。现有的数据集通常包含有限的模糊词,并且确实允许模糊度量进行详尽的测试。提出了一种从语料库中自动生成多模糊词数据集的方法。模糊句子配对算法用于提取和扩充具有多个模糊词的高、中、低相似句子对。通过众包收集人类评分,并使用模糊和传统的句子相似度度量来评估MFWD。结果表明,与传统措施相比,模糊措施与人类评级的相关性更高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信