Efficient Adaptive Convolutional Model Based on Label Embedding for Text Classification Using Low Resource Languages

V. K. Agbesi, Chen Wenyu, Abush S. Ameneshewa, E. Odame, Koffi Dumor, Judith Ayekai Browne
{"title":"Efficient Adaptive Convolutional Model Based on Label Embedding for Text Classification Using Low Resource Languages","authors":"V. K. Agbesi, Chen Wenyu, Abush S. Ameneshewa, E. Odame, Koffi Dumor, Judith Ayekai Browne","doi":"10.1145/3596947.3596962","DOIUrl":null,"url":null,"abstract":"Text classification technology has been efficiently deployed in numerous organizational applications, including subject tagging, intent, event detection, spam filtering, and email routing. This also helps organizations streamline processes, enhance data-driven operations, and evaluate and analyze textual resources quickly and economically. This progress results from numerous studies on high-resource language-based text classification tasks. However, research in low-resource languages, including Ewe, Arabic, Filipino, and Kazakh, lags behind other high-resource languages like English. Also, the most difficult aspect of text classification using low-resource languages is identifying the optimal set of filters for its feature extraction. This is due to their complex morphology, linguistic diversity, multilingualism, and syntax. Studies that have explored these problems failed to efficiently use label information to better the performance of their methods. As a result, the label information for these languages needs to be adequately utilized to enhance classification results. To solve this problem, this study proposes an efficient adaptive convolutional model based on label embedding (EAdaCLE) to efficiently represent label information and utilize the learned label representations for various text classification tasks. EAdaCLE has adaptively engineered convolutional filters trained on inputs based on label embeddings generated in the same network as the text vectors. EAdaCLE ensures the adaptability of adaptive convolution and completely obtains label data as a supporting function to enhance the classification results. Extensive experiments indicate that our technique is more reliable than other methods on four low-resource public datasets.","PeriodicalId":183071,"journal":{"name":"Proceedings of the 2023 7th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 7th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3596947.3596962","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Text classification technology has been efficiently deployed in numerous organizational applications, including subject tagging, intent, event detection, spam filtering, and email routing. This also helps organizations streamline processes, enhance data-driven operations, and evaluate and analyze textual resources quickly and economically. This progress results from numerous studies on high-resource language-based text classification tasks. However, research in low-resource languages, including Ewe, Arabic, Filipino, and Kazakh, lags behind other high-resource languages like English. Also, the most difficult aspect of text classification using low-resource languages is identifying the optimal set of filters for its feature extraction. This is due to their complex morphology, linguistic diversity, multilingualism, and syntax. Studies that have explored these problems failed to efficiently use label information to better the performance of their methods. As a result, the label information for these languages needs to be adequately utilized to enhance classification results. To solve this problem, this study proposes an efficient adaptive convolutional model based on label embedding (EAdaCLE) to efficiently represent label information and utilize the learned label representations for various text classification tasks. EAdaCLE has adaptively engineered convolutional filters trained on inputs based on label embeddings generated in the same network as the text vectors. EAdaCLE ensures the adaptability of adaptive convolution and completely obtains label data as a supporting function to enhance the classification results. Extensive experiments indicate that our technique is more reliable than other methods on four low-resource public datasets.
基于标签嵌入的高效自适应卷积模型在低资源语言文本分类中的应用
文本分类技术已经有效地部署在许多组织应用程序中,包括主题标记、意图、事件检测、垃圾邮件过滤和电子邮件路由。这也帮助组织简化流程,增强数据驱动的操作,并快速经济地评估和分析文本资源。这一进展源于对基于高资源语言的文本分类任务的大量研究。然而,对低资源语言的研究,包括伊维语、阿拉伯语、菲律宾语和哈萨克语,落后于其他高资源语言,如英语。此外,使用低资源语言进行文本分类的最困难的方面是确定用于其特征提取的最佳过滤器集。这是由于它们复杂的形态、语言多样性、多语性和句法。探索这些问题的研究未能有效地利用标签信息来提高其方法的性能。因此,需要充分利用这些语言的标签信息来增强分类结果。为了解决这一问题,本研究提出了一种高效的基于标签嵌入的自适应卷积模型(EAdaCLE)来高效地表示标签信息,并将学习到的标签表示用于各种文本分类任务。EAdaCLE基于与文本向量相同的网络中生成的标签嵌入,对输入进行了自适应的卷积滤波器训练。EAdaCLE保证了自适应卷积的适应性,完全获得标签数据作为支持函数,增强了分类结果。大量的实验表明,我们的技术在四个低资源公共数据集上比其他方法更可靠。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信