如何分割土耳其语词进行神经文本分类?

Abdullah Al Nahas, Aysenur Kulunk, Burak Gözütok, S. Kalkan, Hakki Yagiz Erdinc
{"title":"如何分割土耳其语词进行神经文本分类?","authors":"Abdullah Al Nahas, Aysenur Kulunk, Burak Gözütok, S. Kalkan, Hakki Yagiz Erdinc","doi":"10.1109/INISTA49547.2020.9194661","DOIUrl":null,"url":null,"abstract":"Neural text classifiers of agglutinative languages often suffer from large vocabulary sizes of training data and high out of vocabulary rates during the test time. The natural language processing community has developed and used numerous word segmentation procedures to alleviate these problems. However, their effect on the performance of neural classifiers of Turkish documents requires further investigation. In this empirical study, we carry out an extensive series of experiments to investigate the effect of the choice of word segmentation procedure on the performance of three different neural text classifiers on Turkish documents across multiple domains. Our experiments show that the choice of word segmentation procedure is another hyperparameter that needs tuning. This choice may depend on the domain and the neural architecture.","PeriodicalId":124632,"journal":{"name":"2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"How to Segment Turkish Words for Neural Text Classification?\",\"authors\":\"Abdullah Al Nahas, Aysenur Kulunk, Burak Gözütok, S. Kalkan, Hakki Yagiz Erdinc\",\"doi\":\"10.1109/INISTA49547.2020.9194661\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural text classifiers of agglutinative languages often suffer from large vocabulary sizes of training data and high out of vocabulary rates during the test time. The natural language processing community has developed and used numerous word segmentation procedures to alleviate these problems. However, their effect on the performance of neural classifiers of Turkish documents requires further investigation. In this empirical study, we carry out an extensive series of experiments to investigate the effect of the choice of word segmentation procedure on the performance of three different neural text classifiers on Turkish documents across multiple domains. Our experiments show that the choice of word segmentation procedure is another hyperparameter that needs tuning. This choice may depend on the domain and the neural architecture.\",\"PeriodicalId\":124632,\"journal\":{\"name\":\"2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA)\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INISTA49547.2020.9194661\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INISTA49547.2020.9194661","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

黏着语言的神经文本分类器往往存在训练数据词汇量大、测试过程中词汇失语率高的问题。自然语言处理社区已经开发并使用了许多分词程序来缓解这些问题。然而,它们对土耳其语文件神经分类器性能的影响需要进一步研究。在这项实证研究中,我们进行了一系列广泛的实验,以研究分词过程的选择对三种不同的神经文本分类器在跨多个域的土耳其语文档上的性能的影响。我们的实验表明,分词过程的选择是另一个需要调整的超参数。这种选择可能取决于领域和神经结构。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
How to Segment Turkish Words for Neural Text Classification?
Neural text classifiers of agglutinative languages often suffer from large vocabulary sizes of training data and high out of vocabulary rates during the test time. The natural language processing community has developed and used numerous word segmentation procedures to alleviate these problems. However, their effect on the performance of neural classifiers of Turkish documents requires further investigation. In this empirical study, we carry out an extensive series of experiments to investigate the effect of the choice of word segmentation procedure on the performance of three different neural text classifiers on Turkish documents across multiple domains. Our experiments show that the choice of word segmentation procedure is another hyperparameter that needs tuning. This choice may depend on the domain and the neural architecture.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信