Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem

IF 0.9 Q3 ENGINEERING, MULTIDISCIPLINARY
Miguel Alexis Solano-Jiménez, Jose Julio Tobar-Cifuentes, Luz Marina Sierra-Martínez, C. Cobos-Lozada
{"title":"Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem","authors":"Miguel Alexis Solano-Jiménez, Jose Julio Tobar-Cifuentes, Luz Marina Sierra-Martínez, C. Cobos-Lozada","doi":"10.19053/01211129.V29.N54.2020.11762","DOIUrl":null,"url":null,"abstract":"Part-of-Speech Tagging (POST) is a complex task in the preprocessing of Natural Language Processing applications. Tagging has been tackled from statistical information and rule-based approaches, making use of a range of methods. Most recently, metaheuristic algorithms have gained attention while being used in a wide variety of knowledge areas, with good results. As a result, they were deployed in this research in a POST problem to assign the best sequence of tags (roles) for the words of a sentence based on information statistics. This process was carried out in two cycles, each of them comprised four phases, allowing the adaptation to the tagging problem in metaheuristic algorithms such as Particle Swarm Optimization, Jaya, Random-Restart Hill Climbing, and a memetic algorithm based on Global-Best Harmony Search as a global optimizer, and on Hill Climbing as a local optimizer. In the consolidation of each algorithm, preliminary experiments were carried out (using cross-validation) to adjust the parameters of each algorithm and, thus, evaluate them 1 Universidad del Cauca (Popayán-Cauca, Colombia). miguelsolano@unicauca.edu.co. ORCID: 0000-00031936-3488 2 Universidad del Cauca (Popayán-Cauca, Colombia). josej@unicauca.edu.co. ORCID: 0000-0002-5436-0816 3 Ph. D. Universidad del Cauca (Popayán-Cauca, Colombia). lsierra@unicauca.edu.co. ORCID: 0000-00033847-3324 4 Ph. D. Universidad del Cauca (Popayán-Cauca, Colombia). ccobos@unicauca.edu.co. ORCID: 0000-00026263-1911 Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem Revista Facultad de Ingeniería (Rev. Fac. Ing.) Vol. 29 (54), e11762. 2020. Tunja-Boyacá, Colombia. L-ISSN: 0121-1129, e-ISSN: 2357-5328, DOI: https://doi.org/10.19053/01211129.v29.n54.2020.11762 on the datasets of the complete tagged corpus: IULA (Spanish), Brown (English) and Nasa Yuwe (Nasa). The results obtained by the proposed taggers were compared, and the Friedman and Wilcoxon statistical tests were applied, confirming that the proposed memetic, GBHS Tagger, obtained better results in precision. The proposed taggers make an important contribution to POST for traditional languages (English and Spanish), non-traditional languages (Nasa Yuwe), and their application areas.","PeriodicalId":21428,"journal":{"name":"Revista Facultad De Ingenieria-universidad De Antioquia","volume":null,"pages":null},"PeriodicalIF":0.9000,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revista Facultad De Ingenieria-universidad De Antioquia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.19053/01211129.V29.N54.2020.11762","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Part-of-Speech Tagging (POST) is a complex task in the preprocessing of Natural Language Processing applications. Tagging has been tackled from statistical information and rule-based approaches, making use of a range of methods. Most recently, metaheuristic algorithms have gained attention while being used in a wide variety of knowledge areas, with good results. As a result, they were deployed in this research in a POST problem to assign the best sequence of tags (roles) for the words of a sentence based on information statistics. This process was carried out in two cycles, each of them comprised four phases, allowing the adaptation to the tagging problem in metaheuristic algorithms such as Particle Swarm Optimization, Jaya, Random-Restart Hill Climbing, and a memetic algorithm based on Global-Best Harmony Search as a global optimizer, and on Hill Climbing as a local optimizer. In the consolidation of each algorithm, preliminary experiments were carried out (using cross-validation) to adjust the parameters of each algorithm and, thus, evaluate them 1 Universidad del Cauca (Popayán-Cauca, Colombia). miguelsolano@unicauca.edu.co. ORCID: 0000-00031936-3488 2 Universidad del Cauca (Popayán-Cauca, Colombia). josej@unicauca.edu.co. ORCID: 0000-0002-5436-0816 3 Ph. D. Universidad del Cauca (Popayán-Cauca, Colombia). lsierra@unicauca.edu.co. ORCID: 0000-00033847-3324 4 Ph. D. Universidad del Cauca (Popayán-Cauca, Colombia). ccobos@unicauca.edu.co. ORCID: 0000-00026263-1911 Adaptation, Comparison, and Improvement of Metaheuristic Algorithms to the Part-of-Speech Tagging Problem Revista Facultad de Ingeniería (Rev. Fac. Ing.) Vol. 29 (54), e11762. 2020. Tunja-Boyacá, Colombia. L-ISSN: 0121-1129, e-ISSN: 2357-5328, DOI: https://doi.org/10.19053/01211129.v29.n54.2020.11762 on the datasets of the complete tagged corpus: IULA (Spanish), Brown (English) and Nasa Yuwe (Nasa). The results obtained by the proposed taggers were compared, and the Friedman and Wilcoxon statistical tests were applied, confirming that the proposed memetic, GBHS Tagger, obtained better results in precision. The proposed taggers make an important contribution to POST for traditional languages (English and Spanish), non-traditional languages (Nasa Yuwe), and their application areas.
词性标注问题的元启发式算法的适应、比较与改进
词性标注是自然语言处理应用中的一项复杂的预处理任务。从统计信息和基于规则的方法,利用一系列方法来处理标记。最近,元启发式算法在广泛的知识领域得到了广泛的应用,并取得了良好的效果。因此,本研究将它们部署在POST问题中,根据信息统计为句子中的单词分配最佳标记(角色)序列。该过程分两个周期进行,每个周期由四个阶段组成,允许适应元启发式算法中的标记问题,例如粒子群优化,Jaya,随机重启爬坡,以及基于全局最佳和谐搜索的模因算法作为全局优化器,以爬坡作为局部优化器。在每个算法的巩固过程中,进行了初步实验(采用交叉验证),调整每个算法的参数,从而对它们进行评估1考卡大学(Popayán-Cauca,哥伦比亚)。miguelsolano@unicauca.edu.co。2考卡大学(Popayán-Cauca,哥伦比亚);josej@unicauca.edu.co。3哥伦比亚考卡大学博士(Popayán-Cauca,哥伦比亚)。lsierra@unicauca.edu.co。4哥伦比亚考卡大学博士(Popayán-Cauca,哥伦比亚)ccobos@unicauca.edu.co。词性标注问题的元启发式算法的适应、比较和改进[j] .中文信息学报Ingeniería (Rev. Fac)。荷兰国际集团(Ing))。第29卷(54),e11762。2020. Tunja-Boyaca,哥伦比亚。L-ISSN: 0121-1129, e-ISSN: 2357-5328, DOI: https://doi.org/10.19053/01211129.v29.n54.2020.11762完整标记语料库的数据集:IULA(西班牙语),Brown(英语)和Nasa Yuwe (Nasa)。对所提出的标记器得到的结果进行比较,并应用Friedman和Wilcoxon统计检验,证实所提出的模因GBHS Tagger在精度上取得了更好的结果。所提出的标注器对传统语言(英语和西班牙语)、非传统语言(Nasa Yuwe)及其应用领域的POST做出了重要贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.00
自引率
0.00%
发文量
27
审稿时长
2 months
期刊介绍: Revista Facultad de Ingenieria started in 1984 and is a publication of the School of Engineering at the University of Antioquia. The main objective of the journal is to promote and stimulate the publishing of national and international scientific research results. The journal publishes original articles, resulting from scientific research, experimental and or simulation studies in engineering sciences, technology, and similar disciplines (Electronics, Telecommunications, Bioengineering, Biotechnology, Electrical, Computer Science, Mechanical, Chemical, Environmental, Materials, Sanitary, Civil and Industrial Engineering). In exceptional cases, the journal will publish insightful articles related to current important subjects, or revision articles representing a significant contribution to the contextualization of the state of the art in a known relevant topic. Case reports will only be published when those cases are related to studies in which the validity of a methodology is being proven for the first time, or when a significant contribution to the knowledge of an unexplored system can be proven. All published articles have undergone a peer review process, carried out by experts recognized for their knowledge and contributions to the relevant field. To adapt the Journal to international standards and to promote the visibility of the published articles; and therefore, to have a greater impact in the global academic community, after November 1st 2013, the journal will accept only manuscripts written in English for reviewing and publication. Revista Facultad de Ingeniería –redin is entirely financed by University of Antioquia Since 2015, every article accepted for publication in the journal is assigned a DOI number.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信