Comparing Word Embeddings through Visualisation

Pedro Santos, Nuno Datia, Matilde Pato, J. Sobral
{"title":"Comparing Word Embeddings through Visualisation","authors":"Pedro Santos, Nuno Datia, Matilde Pato, J. Sobral","doi":"10.1109/IV56949.2022.00024","DOIUrl":null,"url":null,"abstract":"Asset management is a branch of facilities management that is responsible for the operation and maintenance of assets. The most common means of managing assets and their life-cycle is through requests and work orders. A request is used to report an occurrence that is detected either by a sensory device, a technician, or non-technical personnel; they are used to pointing out that something is wrong in a given asset, and needs appropriate attention. Depending on the problem, a request can give rise to a work order if the solution is not trivial. Work orders consist in technical reports that specify the asset that needs intervention and has the details about the work to be done or, in the case that the work is unknown from the start, the characteristics of the malfunctioning. Work orders contain a set of words, free text, that are not restricted from a fixed set of vocabulary, making it difficult to automatically analyse them. In this paper, we discuss the application of modern Natural Language Processing techniques to process the work order's description, while presenting a comparison between two Word Embedding models - Word2Vec and Fasttext- through semantic similarity tests between the encoded words, and a visualisation of the vector space through dimensionality reduction of the encoded vectors. The results show a better performance of the Fasttext approach, considering the semantics of the results.","PeriodicalId":153161,"journal":{"name":"2022 26th International Conference Information Visualisation (IV)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 26th International Conference Information Visualisation (IV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IV56949.2022.00024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Asset management is a branch of facilities management that is responsible for the operation and maintenance of assets. The most common means of managing assets and their life-cycle is through requests and work orders. A request is used to report an occurrence that is detected either by a sensory device, a technician, or non-technical personnel; they are used to pointing out that something is wrong in a given asset, and needs appropriate attention. Depending on the problem, a request can give rise to a work order if the solution is not trivial. Work orders consist in technical reports that specify the asset that needs intervention and has the details about the work to be done or, in the case that the work is unknown from the start, the characteristics of the malfunctioning. Work orders contain a set of words, free text, that are not restricted from a fixed set of vocabulary, making it difficult to automatically analyse them. In this paper, we discuss the application of modern Natural Language Processing techniques to process the work order's description, while presenting a comparison between two Word Embedding models - Word2Vec and Fasttext- through semantic similarity tests between the encoded words, and a visualisation of the vector space through dimensionality reduction of the encoded vectors. The results show a better performance of the Fasttext approach, considering the semantics of the results.
通过可视化比较词嵌入
资产管理是设施管理的一个分支,负责资产的运营和维护。管理资产及其生命周期的最常见方法是通过请求和工作命令。请求用于报告由传感设备、技术人员或非技术人员检测到的事件;他们习惯于指出某项资产存在问题,需要适当的关注。根据问题的不同,如果解决方案不是微不足道的,则请求可以产生工作命令。工作指令包含在技术报告中,该报告指定需要干预的资产,并具有要完成的工作的详细信息,或者在工作从一开始就未知的情况下,包含故障的特征。工作订单包含一组单词,自由文本,不受固定词汇集的限制,因此很难自动分析它们。在本文中,我们讨论了现代自然语言处理技术在处理工单描述中的应用,同时通过编码词之间的语义相似度测试比较了两种词嵌入模型——Word2Vec和Fasttext,并通过编码向量的降维实现了向量空间的可视化。考虑到结果的语义,结果显示Fasttext方法具有更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信