New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation

IF 3.8 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Visual Informatics Pub Date : 2022-06-01 DOI:10.1016/j.visinf.2022.04.003

Robert Gove, Lucas Cadalzo, Nicholas Leiby, Jedediah M. Singer, Alexander Zaitzeff

{"title":"New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation","authors":"Robert Gove, Lucas Cadalzo, Nicholas Leiby, Jedediah M. Singer, Alexander Zaitzeff","doi":"10.1016/j.visinf.2022.04.003","DOIUrl":null,"url":null,"abstract":"<div><p>We present new guidelines for choosing hyperparameters for t-SNE and an evaluation comparing these guidelines to current ones. These guidelines include a proposed empirically optimum guideline derived from a t-SNE hyperparameter grid search over a large collection of data sets. We also introduce a new method to featurize data sets using graph-based metrics called scagnostics; we use these features to train a neural network that predicts optimal t-SNE hyperparameters for the respective data set. This neural network has the potential to simplify the use of t-SNE by removing guesswork about which hyperparameters will produce the best embedding. We evaluate and compare our neural network-derived and empirically optimum hyperparameters to several other t-SNE hyperparameter guidelines from the literature on 68 data sets. The hyperparameters predicted by our neural network yield embeddings with similar accuracy as the best current t-SNE guidelines. Using our empirically optimum hyperparameters is simpler than following previously published guidelines but yields more accurate embeddings, in some cases by a statistically significant margin. We find that the useful ranges for t-SNE hyperparameters are narrower and include smaller values than previously reported in the literature. Importantly, we also quantify the potential for future improvements in this area: using data from a grid search of t-SNE hyperparameters we find that an optimal selection method could improve embedding accuracy by up to two percentage points over the methods examined in this paper.</p></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":"6 2","pages":"Pages 87-97"},"PeriodicalIF":3.8000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468502X22000201/pdfft?md5=d092541f65d22cc8dfb4e8ef46a1293b&pid=1-s2.0-S2468502X22000201-main.pdf","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Visual Informatics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468502X22000201","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 13

Abstract

We present new guidelines for choosing hyperparameters for t-SNE and an evaluation comparing these guidelines to current ones. These guidelines include a proposed empirically optimum guideline derived from a t-SNE hyperparameter grid search over a large collection of data sets. We also introduce a new method to featurize data sets using graph-based metrics called scagnostics; we use these features to train a neural network that predicts optimal t-SNE hyperparameters for the respective data set. This neural network has the potential to simplify the use of t-SNE by removing guesswork about which hyperparameters will produce the best embedding. We evaluate and compare our neural network-derived and empirically optimum hyperparameters to several other t-SNE hyperparameter guidelines from the literature on 68 data sets. The hyperparameters predicted by our neural network yield embeddings with similar accuracy as the best current t-SNE guidelines. Using our empirically optimum hyperparameters is simpler than following previously published guidelines but yields more accurate embeddings, in some cases by a statistically significant margin. We find that the useful ranges for t-SNE hyperparameters are narrower and include smaller values than previously reported in the literature. Importantly, we also quantify the potential for future improvements in this area: using data from a grid search of t-SNE hyperparameters we find that an optimal selection method could improve embedding accuracy by up to two percentage points over the methods examined in this paper.

查看原文本刊更多论文

使用t-SNE的新指南:可选默认值、超参数选择自动化和比较评估

我们提出了选择t-SNE超参数的新指南，并将这些指南与当前指南进行了比较。这些指导方针包括一个从大量数据集上的t-SNE超参数网格搜索得出的经验优化指导方针。我们还引入了一种新方法，使用基于图的度量来描述数据集，称为scagnostics;我们使用这些特征来训练一个神经网络，该网络可以预测相应数据集的最佳t-SNE超参数。这个神经网络有可能通过消除对哪个超参数将产生最佳嵌入的猜测来简化t-SNE的使用。我们评估并比较了我们的神经网络衍生的和经验最优的超参数与来自68个数据集的文献中的其他几个t-SNE超参数指南。我们的神经网络预测的超参数产生的嵌入具有与当前最佳t-SNE指南相似的精度。使用我们的经验最优超参数比遵循先前发布的指南更简单，但产生更准确的嵌入，在某些情况下具有统计上显著的优势。我们发现t-SNE超参数的有用范围比以前文献报道的更窄，包括更小的值。重要的是，我们还量化了该领域未来改进的潜力:使用来自t-SNE超参数网格搜索的数据，我们发现最优选择方法可以将嵌入精度提高两个百分点，超过本文所研究的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊