网页与文本文档搭配纠错框架

Alan Varghese, A. Varde, Jing Peng, Eileen Fitzpatrick
{"title":"网页与文本文档搭配纠错框架","authors":"Alan Varghese, A. Varde, Jing Peng, Eileen Fitzpatrick","doi":"10.1145/2830544.2830548","DOIUrl":null,"url":null,"abstract":"Much of the English in text documents today comes from nonnative speakers. Web searches are also conducted very often by non-native speakers. Though highly qualified in their respective fields, these speakers could potentially make errors in collocation, e.g., \"dark money\" and \"stock agora\" (instead of the more appropriate English expressions \"black money\" and \"stock market\" respectively). These may arise due to literal translation from the respective speaker's native language or other factors. Such errors could cause problems in contexts such as querying over Web pages, correct understanding of text documents and more. This paper proposes a framework called CollOrder to detect such collocation errors and suggest correctly ordered collocated responses for improving the semantics. This framework integrates machine learning approaches with natural language processing techniques, proposing suitable heuristics to provide responses to collocation errors, ranked in the order of correctness. We discuss the proposed framework with algorithms and experimental evaluation in this paper. We claim that it would be useful in semantically enhancing Web querying e.g., financial news, online shopping etc. It would also help in providing automated error correction in machine translated documents and offering assistance to people using ESL tools.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"67 1","pages":"14-23"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"A Framework for Collocation Error Correction in Web Pages and Text Documents\",\"authors\":\"Alan Varghese, A. Varde, Jing Peng, Eileen Fitzpatrick\",\"doi\":\"10.1145/2830544.2830548\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Much of the English in text documents today comes from nonnative speakers. Web searches are also conducted very often by non-native speakers. Though highly qualified in their respective fields, these speakers could potentially make errors in collocation, e.g., \\\"dark money\\\" and \\\"stock agora\\\" (instead of the more appropriate English expressions \\\"black money\\\" and \\\"stock market\\\" respectively). These may arise due to literal translation from the respective speaker's native language or other factors. Such errors could cause problems in contexts such as querying over Web pages, correct understanding of text documents and more. This paper proposes a framework called CollOrder to detect such collocation errors and suggest correctly ordered collocated responses for improving the semantics. This framework integrates machine learning approaches with natural language processing techniques, proposing suitable heuristics to provide responses to collocation errors, ranked in the order of correctness. We discuss the proposed framework with algorithms and experimental evaluation in this paper. We claim that it would be useful in semantically enhancing Web querying e.g., financial news, online shopping etc. It would also help in providing automated error correction in machine translated documents and offering assistance to people using ESL tools.\",\"PeriodicalId\":90050,\"journal\":{\"name\":\"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining\",\"volume\":\"67 1\",\"pages\":\"14-23\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2830544.2830548\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2830544.2830548","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

今天文本文档中的许多英语来自非母语人士。网络搜索也经常由非母语人士进行。虽然这些演讲者在各自的领域都很合格,但他们可能会在搭配上犯错误,例如,“黑钱”和“股票市场”(而不是更合适的英语表达“黑钱”和“股票市场”)。这可能是由于从各自的说话者的母语直译或其他因素造成的。这样的错误可能会在诸如对Web页面的查询、对文本文档的正确理解等上下文中导致问题。本文提出了一个名为CollOrder的框架来检测这些搭配错误,并建议正确排序的搭配响应以提高语义。该框架将机器学习方法与自然语言处理技术相结合,提出合适的启发式方法来提供对搭配错误的响应,并按正确性排序。本文讨论了该框架的算法和实验评估。我们声称它将在语义上增强Web查询,例如金融新闻,在线购物等方面很有用。它还有助于在机器翻译文件中提供自动纠错,并为使用ESL工具的人提供帮助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Framework for Collocation Error Correction in Web Pages and Text Documents
Much of the English in text documents today comes from nonnative speakers. Web searches are also conducted very often by non-native speakers. Though highly qualified in their respective fields, these speakers could potentially make errors in collocation, e.g., "dark money" and "stock agora" (instead of the more appropriate English expressions "black money" and "stock market" respectively). These may arise due to literal translation from the respective speaker's native language or other factors. Such errors could cause problems in contexts such as querying over Web pages, correct understanding of text documents and more. This paper proposes a framework called CollOrder to detect such collocation errors and suggest correctly ordered collocated responses for improving the semantics. This framework integrates machine learning approaches with natural language processing techniques, proposing suitable heuristics to provide responses to collocation errors, ranked in the order of correctness. We discuss the proposed framework with algorithms and experimental evaluation in this paper. We claim that it would be useful in semantically enhancing Web querying e.g., financial news, online shopping etc. It would also help in providing automated error correction in machine translated documents and offering assistance to people using ESL tools.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信