基于fca的马来语文本分类自动获取方法的语言模式研究

Mohd Zakree Ahmad Nazri, S. Shamsudin, A. Abu Bakar, Tarmizi Abd Ghani
{"title":"基于fca的马来语文本分类自动获取方法的语言模式研究","authors":"Mohd Zakree Ahmad Nazri, S. Shamsudin, A. Abu Bakar, Tarmizi Abd Ghani","doi":"10.1109/ITSIM.2008.4631709","DOIUrl":null,"url":null,"abstract":"Previous work has shown that Formal Concept Analysis (FCA) can be used to automatically acquire taxonomies from Indo-European text. The taxonomies are built via FCA using syntactic dependencies as attributes such as verb/head-object, verb/head-subject and verb/prepositional phrase-complement. This paper discusses the overall process of learning taxonomy using FCA with the same syntactic dependencies as the English language which is then applied on Malay texts. Malay, an Austronesian language follows the same Subject-Verb-Object sentence structure like English but syntactically different. The result shows a lower recall and precision compared to related work in other languages. The poor result is caused by several factors such as the selection of smoothing technique. The experimental result indicates that the current smoothing technique with FCA does not produce good results. Therefore, as an addition to the syntactic dependencies, we used linguistic pattern such as Hearst’s pattern in finding similarities between terms. We compare the results of our technique against the cosine used in the FCA-based taxonomy learning approach. The proposed technique attains both higher precision and recall than the previous technique.","PeriodicalId":314159,"journal":{"name":"2008 International Symposium on Information Technology","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Using linguistic patterns in FCA-based approach for automatic acquisition of taxonomies from Malay text\",\"authors\":\"Mohd Zakree Ahmad Nazri, S. Shamsudin, A. Abu Bakar, Tarmizi Abd Ghani\",\"doi\":\"10.1109/ITSIM.2008.4631709\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Previous work has shown that Formal Concept Analysis (FCA) can be used to automatically acquire taxonomies from Indo-European text. The taxonomies are built via FCA using syntactic dependencies as attributes such as verb/head-object, verb/head-subject and verb/prepositional phrase-complement. This paper discusses the overall process of learning taxonomy using FCA with the same syntactic dependencies as the English language which is then applied on Malay texts. Malay, an Austronesian language follows the same Subject-Verb-Object sentence structure like English but syntactically different. The result shows a lower recall and precision compared to related work in other languages. The poor result is caused by several factors such as the selection of smoothing technique. The experimental result indicates that the current smoothing technique with FCA does not produce good results. Therefore, as an addition to the syntactic dependencies, we used linguistic pattern such as Hearst’s pattern in finding similarities between terms. We compare the results of our technique against the cosine used in the FCA-based taxonomy learning approach. The proposed technique attains both higher precision and recall than the previous technique.\",\"PeriodicalId\":314159,\"journal\":{\"name\":\"2008 International Symposium on Information Technology\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 International Symposium on Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITSIM.2008.4631709\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Symposium on Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITSIM.2008.4631709","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

先前的研究表明,形式概念分析(FCA)可以用于从印欧语文本中自动获取分类。分类法是通过FCA使用语法依赖关系作为属性构建的,例如动词/头-宾语、动词/头-主语和动词/介词短语补语。本文讨论了使用FCA学习分类的总体过程,该过程具有与英语相同的句法依赖关系,然后应用于马来语文本。马来语是南岛语系的一种语言,它遵循与英语相同的主谓宾句结构,但句法不同。结果表明,与其他语言的相关工作相比,该方法的查全率和查准率较低。结果不理想是由平滑技术的选择等因素造成的。实验结果表明,现有的FCA平滑技术效果不佳。因此,作为句法依赖性的补充,我们使用诸如赫斯特模式之类的语言模式来查找术语之间的相似性。我们将我们的技术结果与基于fca的分类学习方法中使用的余弦进行比较。该方法具有较高的查全率和查全率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using linguistic patterns in FCA-based approach for automatic acquisition of taxonomies from Malay text
Previous work has shown that Formal Concept Analysis (FCA) can be used to automatically acquire taxonomies from Indo-European text. The taxonomies are built via FCA using syntactic dependencies as attributes such as verb/head-object, verb/head-subject and verb/prepositional phrase-complement. This paper discusses the overall process of learning taxonomy using FCA with the same syntactic dependencies as the English language which is then applied on Malay texts. Malay, an Austronesian language follows the same Subject-Verb-Object sentence structure like English but syntactically different. The result shows a lower recall and precision compared to related work in other languages. The poor result is caused by several factors such as the selection of smoothing technique. The experimental result indicates that the current smoothing technique with FCA does not produce good results. Therefore, as an addition to the syntactic dependencies, we used linguistic pattern such as Hearst’s pattern in finding similarities between terms. We compare the results of our technique against the cosine used in the FCA-based taxonomy learning approach. The proposed technique attains both higher precision and recall than the previous technique.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信