New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation

IF 3.8 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Robert Gove, Lucas Cadalzo, Nicholas Leiby, Jedediah M. Singer, Alexander Zaitzeff
{"title":"New guidance for using t-SNE: Alternative defaults, hyperparameter selection automation, and comparative evaluation","authors":"Robert Gove,&nbsp;Lucas Cadalzo,&nbsp;Nicholas Leiby,&nbsp;Jedediah M. Singer,&nbsp;Alexander Zaitzeff","doi":"10.1016/j.visinf.2022.04.003","DOIUrl":null,"url":null,"abstract":"<div><p>We present new guidelines for choosing hyperparameters for t-SNE and an evaluation comparing these guidelines to current ones. These guidelines include a proposed empirically optimum guideline derived from a t-SNE hyperparameter grid search over a large collection of data sets. We also introduce a new method to featurize data sets using graph-based metrics called scagnostics; we use these features to train a neural network that predicts optimal t-SNE hyperparameters for the respective data set. This neural network has the potential to simplify the use of t-SNE by removing guesswork about which hyperparameters will produce the best embedding. We evaluate and compare our neural network-derived and empirically optimum hyperparameters to several other t-SNE hyperparameter guidelines from the literature on 68 data sets. The hyperparameters predicted by our neural network yield embeddings with similar accuracy as the best current t-SNE guidelines. Using our empirically optimum hyperparameters is simpler than following previously published guidelines but yields more accurate embeddings, in some cases by a statistically significant margin. We find that the useful ranges for t-SNE hyperparameters are narrower and include smaller values than previously reported in the literature. Importantly, we also quantify the potential for future improvements in this area: using data from a grid search of t-SNE hyperparameters we find that an optimal selection method could improve embedding accuracy by up to two percentage points over the methods examined in this paper.</p></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":"6 2","pages":"Pages 87-97"},"PeriodicalIF":3.8000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468502X22000201/pdfft?md5=d092541f65d22cc8dfb4e8ef46a1293b&pid=1-s2.0-S2468502X22000201-main.pdf","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Visual Informatics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468502X22000201","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 13

Abstract

We present new guidelines for choosing hyperparameters for t-SNE and an evaluation comparing these guidelines to current ones. These guidelines include a proposed empirically optimum guideline derived from a t-SNE hyperparameter grid search over a large collection of data sets. We also introduce a new method to featurize data sets using graph-based metrics called scagnostics; we use these features to train a neural network that predicts optimal t-SNE hyperparameters for the respective data set. This neural network has the potential to simplify the use of t-SNE by removing guesswork about which hyperparameters will produce the best embedding. We evaluate and compare our neural network-derived and empirically optimum hyperparameters to several other t-SNE hyperparameter guidelines from the literature on 68 data sets. The hyperparameters predicted by our neural network yield embeddings with similar accuracy as the best current t-SNE guidelines. Using our empirically optimum hyperparameters is simpler than following previously published guidelines but yields more accurate embeddings, in some cases by a statistically significant margin. We find that the useful ranges for t-SNE hyperparameters are narrower and include smaller values than previously reported in the literature. Importantly, we also quantify the potential for future improvements in this area: using data from a grid search of t-SNE hyperparameters we find that an optimal selection method could improve embedding accuracy by up to two percentage points over the methods examined in this paper.

使用t-SNE的新指南:可选默认值、超参数选择自动化和比较评估
我们提出了选择t-SNE超参数的新指南,并将这些指南与当前指南进行了比较。这些指导方针包括一个从大量数据集上的t-SNE超参数网格搜索得出的经验优化指导方针。我们还引入了一种新方法,使用基于图的度量来描述数据集,称为scagnostics;我们使用这些特征来训练一个神经网络,该网络可以预测相应数据集的最佳t-SNE超参数。这个神经网络有可能通过消除对哪个超参数将产生最佳嵌入的猜测来简化t-SNE的使用。我们评估并比较了我们的神经网络衍生的和经验最优的超参数与来自68个数据集的文献中的其他几个t-SNE超参数指南。我们的神经网络预测的超参数产生的嵌入具有与当前最佳t-SNE指南相似的精度。使用我们的经验最优超参数比遵循先前发布的指南更简单,但产生更准确的嵌入,在某些情况下具有统计上显著的优势。我们发现t-SNE超参数的有用范围比以前文献报道的更窄,包括更小的值。重要的是,我们还量化了该领域未来改进的潜力:使用来自t-SNE超参数网格搜索的数据,我们发现最优选择方法可以将嵌入精度提高两个百分点,超过本文所研究的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Visual Informatics
Visual Informatics Computer Science-Computer Graphics and Computer-Aided Design
CiteScore
6.70
自引率
3.30%
发文量
33
审稿时长
79 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信