СРАВНИТЕЛЬНЫЙ АНАЛИЗ ТОЧНОСТИ ОПРЕДЕЛЕНИЯ СДВИГА БРИЛЛЮЭНОВСКОЙ ЧАСТОТЫ В ЭКСТРЕМАЛЬНО ЗАШУМЛЕННЫХ СПЕКТРАХ РАЗЛИЧНЫМИ КОРРЕЛЯЦИОННЫМИ МЕТОДАМИ

А. И. Кривошеев, Ю. А. Константинов, Ф. Л. Барков, В. П. Первадчук
{"title":"СРАВНИТЕЛЬНЫЙ АНАЛИЗ ТОЧНОСТИ ОПРЕДЕЛЕНИЯ СДВИГА БРИЛЛЮЭНОВСКОЙ ЧАСТОТЫ В ЭКСТРЕМАЛЬНО ЗАШУМЛЕННЫХ СПЕКТРАХ РАЗЛИЧНЫМИ КОРРЕЛЯЦИОННЫМИ МЕТОДАМИ","authors":"А. И. Кривошеев, Ю. А. Константинов, Ф. Л. Барков, В. П. Первадчук","doi":"10.31857/s0032816221050062","DOIUrl":null,"url":null,"abstract":"— Visualization of multidimensional data is the most important stage of data research. Often, decisions on the further stages of the study are made from the flat view of the data based on \"rough proportions\". High visibility and persuasiveness of representation on the plane of multidimensional vectors with the preservation of distances is used in models of distributive semantics (Word2Vec, GloVe, NaVec) successfully. On the other hand, the inaccuracy of the two-dimensional projection can lead to time being spent searching for non-existent multidimensional structures. The author set the task to evaluate the accuracy of dimensionality reduction methods with the following limitations: multi-dimensionality arises as a result of vector representation of text documents, dimensionality reduction is aimed at visualization on the plane. In numerous methods of dimension reduction, there is no separate class of approaches specifically for visualization. To measure the accuracy, an approach was chosen using marked-up data and quantifying the preservation of the markup while reducing the dimension. The author investigated 12 methods of reducing the dimension on two labeled data sets in Russian and English. Using the Silhouette Coefficient metric, the most accurate visualization method for text data was determined as UMAP with the Hellinger distance as the metric.","PeriodicalId":210627,"journal":{"name":"Приборы и техника эксперимента","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Приборы и техника эксперимента","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31857/s0032816221050062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

— Visualization of multidimensional data is the most important stage of data research. Often, decisions on the further stages of the study are made from the flat view of the data based on "rough proportions". High visibility and persuasiveness of representation on the plane of multidimensional vectors with the preservation of distances is used in models of distributive semantics (Word2Vec, GloVe, NaVec) successfully. On the other hand, the inaccuracy of the two-dimensional projection can lead to time being spent searching for non-existent multidimensional structures. The author set the task to evaluate the accuracy of dimensionality reduction methods with the following limitations: multi-dimensionality arises as a result of vector representation of text documents, dimensionality reduction is aimed at visualization on the plane. In numerous methods of dimension reduction, there is no separate class of approaches specifically for visualization. To measure the accuracy, an approach was chosen using marked-up data and quantifying the preservation of the markup while reducing the dimension. The author investigated 12 methods of reducing the dimension on two labeled data sets in Russian and English. Using the Silhouette Coefficient metric, the most accurate visualization method for text data was determined as UMAP with the Hellinger distance as the metric.
在不同相关方法的极端噪声频谱中,比较布里洛恩频率变化定义的准确性分析
多维数据可视化是数据研究中最重要的阶段。通常,研究进一步阶段的决定是基于“粗略比例”的数据的平面视图做出的。在分布语义模型(Word2Vec, GloVe, NaVec)中成功地应用了具有距离保持的多维向量平面表示的高可见性和说服力。另一方面,二维投影的不准确性可能导致花费时间搜索不存在的多维结构。作者设定的任务是评估降维方法的准确性,但存在以下限制:多维是由于文本文档的向量表示而产生的,降维的目的是在平面上可视化。在许多降维方法中,没有专门针对可视化的单独的方法类别。为了测量准确性,选择了一种方法,使用标记数据并在降低维度的同时量化标记的保存。本文研究了俄语和英语两种标注数据集的12种降维方法。采用Silhouette Coefficient度量,确定文本数据最精确的可视化方法为以Hellinger距离为度量的UMAP。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信