Anomaly Detection in Lexical Definitions via One-Class Classification Techniques

Sawittree Jumpathong, Kanyanut Kriengket, P. Boonkwan, T. Supnithi
{"title":"Anomaly Detection in Lexical Definitions via One-Class Classification Techniques","authors":"Sawittree Jumpathong, Kanyanut Kriengket, P. Boonkwan, T. Supnithi","doi":"10.1109/iSAI-NLP54397.2021.9678166","DOIUrl":null,"url":null,"abstract":"It takes a long time to build vocabularies and their definitions because they must be approved only by the experts in the meeting of building vocabularies and the definitions are also unstructured. To save time, we applied three techniques of classification to the experiments that are one-class SVMs, isolation forests, and local outlier factors, and also observed how well the method can suggest word definition status via the accuracy. As a result, the local outlier factors obtained the highest accuracy when they used vectors that were produced by USE. They can recognize the boundary of the approved class better and there are several approved clusters and outliers are scattered among them. Also, it is found that the detected status of definitions is both identical and opposite to the reference one. For the patterns of definition writing, the approved definitions are always written in the logical order, and start with wide or general information, then is followed by specific details, examples, and references of English terms or examples. In case of the rejected definitions, they are not always written in the logical order, and their definition patterns are also various - only Thai translation, Thai translation with related entries, parts of speech (POS), Thai translation, related entries, and English term references followed by definitions, etc.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

It takes a long time to build vocabularies and their definitions because they must be approved only by the experts in the meeting of building vocabularies and the definitions are also unstructured. To save time, we applied three techniques of classification to the experiments that are one-class SVMs, isolation forests, and local outlier factors, and also observed how well the method can suggest word definition status via the accuracy. As a result, the local outlier factors obtained the highest accuracy when they used vectors that were produced by USE. They can recognize the boundary of the approved class better and there are several approved clusters and outliers are scattered among them. Also, it is found that the detected status of definitions is both identical and opposite to the reference one. For the patterns of definition writing, the approved definitions are always written in the logical order, and start with wide or general information, then is followed by specific details, examples, and references of English terms or examples. In case of the rejected definitions, they are not always written in the logical order, and their definition patterns are also various - only Thai translation, Thai translation with related entries, parts of speech (POS), Thai translation, related entries, and English term references followed by definitions, etc.
基于单类分类技术的词汇定义异常检测
构建词汇表及其定义需要花费很长时间,因为它们必须仅由构建词汇表会议的专家批准,而且定义也是非结构化的。为了节省时间,我们将单类支持向量机、隔离森林和局部离群因子三种分类技术应用到实验中,并观察了该方法如何通过准确率来提示单词定义状态。因此,当使用USE生成的向量时,局部离群因子获得了最高的精度。它们能较好地识别被批准类的边界,并且被批准的类有几个,离群值分散在其中。此外,我们还发现定义的检测状态与参考定义的检测状态既相同又相反。对于定义的写作模式,批准的定义总是按照逻辑顺序书写,并以广泛或一般的信息开始,然后是特定的细节、示例和对英语术语或示例的引用。对于被拒绝的定义,它们并不总是按照逻辑顺序编写,而且它们的定义模式也多种多样——只有泰语翻译、带相关条目的泰语翻译、词性(POS)、泰语翻译、相关条目和后跟定义的英语术语引用等。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信