基于嵌入的文本相似度计算的交互式优化

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Information Visualization Pub Date : 2022-08-03 DOI:10.1177/14738716221114372

D. Witschard, Ilir Jusufi, R. M. Martins, K. Kucher, A. Kerren

{"title":"基于嵌入的文本相似度计算的交互式优化","authors":"D. Witschard, Ilir Jusufi, R. M. Martins, K. Kucher, A. Kerren","doi":"10.1177/14738716221114372","DOIUrl":null,"url":null,"abstract":"Comparing text documents is an essential task for a variety of applications within diverse research fields, and several different methods have been developed for this. However, calculating text similarity is an ambiguous and context-dependent task, so many open challenges still exist. In this paper, we present a novel method for text similarity calculations based on the combination of embedding technology and ensemble methods. By using several embeddings, instead of only one, we show that it is possible to achieve higher quality, which in turn is a key factor for developing high-performing applications for text similarity exploitation. We also provide a prototype visual analytics tool which helps the analyst to find optimal performing ensembles and gain insights to the inner workings of the similarity calculations. Furthermore, we discuss the generalizability of our key ideas to fields beyond the scope of text analysis.","PeriodicalId":50360,"journal":{"name":"Information Visualization","volume":"21 1","pages":"335 - 353"},"PeriodicalIF":2.0000,"publicationDate":"2022-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Interactive optimization of embedding-based text similarity calculations\",\"authors\":\"D. Witschard, Ilir Jusufi, R. M. Martins, K. Kucher, A. Kerren\",\"doi\":\"10.1177/14738716221114372\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Comparing text documents is an essential task for a variety of applications within diverse research fields, and several different methods have been developed for this. However, calculating text similarity is an ambiguous and context-dependent task, so many open challenges still exist. In this paper, we present a novel method for text similarity calculations based on the combination of embedding technology and ensemble methods. By using several embeddings, instead of only one, we show that it is possible to achieve higher quality, which in turn is a key factor for developing high-performing applications for text similarity exploitation. We also provide a prototype visual analytics tool which helps the analyst to find optimal performing ensembles and gain insights to the inner workings of the similarity calculations. Furthermore, we discuss the generalizability of our key ideas to fields beyond the scope of text analysis.\",\"PeriodicalId\":50360,\"journal\":{\"name\":\"Information Visualization\",\"volume\":\"21 1\",\"pages\":\"335 - 353\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2022-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Visualization\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1177/14738716221114372\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Visualization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/14738716221114372","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 4

摘要

比较文本文档是各种研究领域中各种应用程序的基本任务，为此已经开发了几种不同的方法。然而，文本相似度的计算是一个模糊且依赖于上下文的任务，因此仍然存在许多未解决的问题。本文提出了一种基于嵌入技术和集成方法相结合的文本相似度计算方法。通过使用多个嵌入，而不是只有一个嵌入，我们表明有可能实现更高的质量，这反过来又是开发高性能文本相似性利用应用程序的关键因素。我们还提供了一个原型可视化分析工具，帮助分析人员找到最佳的性能组合，并深入了解相似度计算的内部工作原理。此外，我们还讨论了我们的关键思想在文本分析范围之外的领域的概括性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Interactive optimization of embedding-based text similarity calculations

Comparing text documents is an essential task for a variety of applications within diverse research fields, and several different methods have been developed for this. However, calculating text similarity is an ambiguous and context-dependent task, so many open challenges still exist. In this paper, we present a novel method for text similarity calculations based on the combination of embedding technology and ensemble methods. By using several embeddings, instead of only one, we show that it is possible to achieve higher quality, which in turn is a key factor for developing high-performing applications for text similarity exploitation. We also provide a prototype visual analytics tool which helps the analyst to find optimal performing ensembles and gain insights to the inner workings of the similarity calculations. Furthermore, we discuss the generalizability of our key ideas to fields beyond the scope of text analysis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Visualization COMPUTER SCIENCE, SOFTWARE ENGINEERING-

CiteScore

5.40

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Information Visualization is essential reading for researchers and practitioners of information visualization and is of interest to computer scientists and data analysts working on related specialisms. This journal is an international, peer-reviewed journal publishing articles on fundamental research and applications of information visualization. The journal acts as a dedicated forum for the theories, methodologies, techniques and evaluations of information visualization and its applications. The journal is a core vehicle for developing a generic research agenda for the field by identifying and developing the unique and significant aspects of information visualization. Emphasis is placed on interdisciplinary material and on the close connection between theory and practice. This journal is a member of the Committee on Publication Ethics (COPE).