Interactive optimization of embedding-based text similarity calculations

IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING
D. Witschard, Ilir Jusufi, R. M. Martins, K. Kucher, A. Kerren
{"title":"Interactive optimization of embedding-based text similarity calculations","authors":"D. Witschard, Ilir Jusufi, R. M. Martins, K. Kucher, A. Kerren","doi":"10.1177/14738716221114372","DOIUrl":null,"url":null,"abstract":"Comparing text documents is an essential task for a variety of applications within diverse research fields, and several different methods have been developed for this. However, calculating text similarity is an ambiguous and context-dependent task, so many open challenges still exist. In this paper, we present a novel method for text similarity calculations based on the combination of embedding technology and ensemble methods. By using several embeddings, instead of only one, we show that it is possible to achieve higher quality, which in turn is a key factor for developing high-performing applications for text similarity exploitation. We also provide a prototype visual analytics tool which helps the analyst to find optimal performing ensembles and gain insights to the inner workings of the similarity calculations. Furthermore, we discuss the generalizability of our key ideas to fields beyond the scope of text analysis.","PeriodicalId":50360,"journal":{"name":"Information Visualization","volume":"21 1","pages":"335 - 353"},"PeriodicalIF":1.8000,"publicationDate":"2022-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Visualization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/14738716221114372","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 4

Abstract

Comparing text documents is an essential task for a variety of applications within diverse research fields, and several different methods have been developed for this. However, calculating text similarity is an ambiguous and context-dependent task, so many open challenges still exist. In this paper, we present a novel method for text similarity calculations based on the combination of embedding technology and ensemble methods. By using several embeddings, instead of only one, we show that it is possible to achieve higher quality, which in turn is a key factor for developing high-performing applications for text similarity exploitation. We also provide a prototype visual analytics tool which helps the analyst to find optimal performing ensembles and gain insights to the inner workings of the similarity calculations. Furthermore, we discuss the generalizability of our key ideas to fields beyond the scope of text analysis.
基于嵌入的文本相似度计算的交互式优化
比较文本文档是各种研究领域中各种应用程序的基本任务,为此已经开发了几种不同的方法。然而,文本相似度的计算是一个模糊且依赖于上下文的任务,因此仍然存在许多未解决的问题。本文提出了一种基于嵌入技术和集成方法相结合的文本相似度计算方法。通过使用多个嵌入,而不是只有一个嵌入,我们表明有可能实现更高的质量,这反过来又是开发高性能文本相似性利用应用程序的关键因素。我们还提供了一个原型可视化分析工具,帮助分析人员找到最佳的性能组合,并深入了解相似度计算的内部工作原理。此外,我们还讨论了我们的关键思想在文本分析范围之外的领域的概括性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information Visualization
Information Visualization COMPUTER SCIENCE, SOFTWARE ENGINEERING-
CiteScore
5.40
自引率
0.00%
发文量
16
审稿时长
>12 weeks
期刊介绍: Information Visualization is essential reading for researchers and practitioners of information visualization and is of interest to computer scientists and data analysts working on related specialisms. This journal is an international, peer-reviewed journal publishing articles on fundamental research and applications of information visualization. The journal acts as a dedicated forum for the theories, methodologies, techniques and evaluations of information visualization and its applications. The journal is a core vehicle for developing a generic research agenda for the field by identifying and developing the unique and significant aspects of information visualization. Emphasis is placed on interdisciplinary material and on the close connection between theory and practice. This journal is a member of the Committee on Publication Ethics (COPE).
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信