IKEA: Unsupervised domain-specific keyword-expansion

Joobin Gharibshah, Jakapun Tachaiya, Arman Irani, E. Papalexakis, M. Faloutsos
{"title":"IKEA: Unsupervised domain-specific keyword-expansion","authors":"Joobin Gharibshah, Jakapun Tachaiya, Arman Irani, E. Papalexakis, M. Faloutsos","doi":"10.1109/ASONAM55673.2022.10068656","DOIUrl":null,"url":null,"abstract":"How can we expand an initial set of keywords with a target domain in mind? A possible application is to use the expanded set of words to search for specific information within the domain of interest. Here, we focus on online forums and specifically security forums. We propose IKEA, an iterative embedding-based approach to expand a set of keywords with a domain in mind. The novelty of our approach is three-fold: (a) we use two similarity expansions in the word-word and post-post spaces, (b) we use an iterative approach in each of these expansions, and (c) we provide a flexible ranking of the identified words to meet the user needs. We evaluate our method with data from three security forums that span five years of activity and the widely-used Fire benchmark. IKEA outperforms previous solutions by identifying more relevant keywords: it exhibits more than 0.82 MAP and 0.85 NDCG in a wide range of initial keyword sets. We see our approach as an essential building block in developing methods for harnessing the wealth of information available in online forums.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASONAM55673.2022.10068656","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

How can we expand an initial set of keywords with a target domain in mind? A possible application is to use the expanded set of words to search for specific information within the domain of interest. Here, we focus on online forums and specifically security forums. We propose IKEA, an iterative embedding-based approach to expand a set of keywords with a domain in mind. The novelty of our approach is three-fold: (a) we use two similarity expansions in the word-word and post-post spaces, (b) we use an iterative approach in each of these expansions, and (c) we provide a flexible ranking of the identified words to meet the user needs. We evaluate our method with data from three security forums that span five years of activity and the widely-used Fire benchmark. IKEA outperforms previous solutions by identifying more relevant keywords: it exhibits more than 0.82 MAP and 0.85 NDCG in a wide range of initial keyword sets. We see our approach as an essential building block in developing methods for harnessing the wealth of information available in online forums.
宜家:无监督的特定领域关键字扩展
我们如何扩展目标域的初始关键字集?一个可能的应用是使用扩展的词集来搜索感兴趣领域内的特定信息。在这里,我们关注在线论坛,特别是安全论坛。我们提出IKEA,这是一种基于迭代嵌入的方法,用于扩展一组具有特定领域的关键字。我们方法的新颖之处在于三个方面:(a)我们在word-word和post-post空间中使用了两个相似展开,(b)我们在每个扩展中使用了迭代方法,以及(c)我们提供了识别词的灵活排名以满足用户需求。我们使用来自三个安全论坛的数据来评估我们的方法,这些数据跨越了五年的活动和广泛使用的Fire基准。宜家通过识别更多相关的关键字优于以前的解决方案:在广泛的初始关键字集中,它展示了超过0.82 MAP和0.85 NDCG。我们认为,我们的方法是开发利用在线论坛中提供的丰富信息的方法的重要组成部分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信