Discovering of Personal Name Prefix Patterns in Thai Researcher Corpus and Its Application

Nongnuch Ketui, Nattapong Tongtep, T. Theeramunkong
{"title":"Discovering of Personal Name Prefix Patterns in Thai Researcher Corpus and Its Application","authors":"Nongnuch Ketui, Nattapong Tongtep, T. Theeramunkong","doi":"10.1109/ecti-con49241.2020.9158214","DOIUrl":null,"url":null,"abstract":"In the context of information extraction, a person’s name is one of the important named entities to be extracted which are applied to the question-answering and summarizing tasks. However, the boundary of a person’s name is still ambiguous since there are several writing patterns of a person’s name from online public data sources such as news, events, and researcher corpora. To extract, identify, and unify the person’s name, discovering the name prefix can be applied as clue words or phrases to such processes. In this paper, the name prefix discovering framework is proposed for collecting the integrated researcher corpus from various data sources and extracting name prefix patterns. Four main functions of the proposed framework are collecting data from data sources, tagging entities, preprocessing the researcher’s names, and finding the pattern of the personal name prefix. In this work, six data sources are gathered and ten entities related to the research domain are focused. The preprocessing data uses three sub-processes to provide the researcher’s name. The result shows that the 408 personal name prefixes are extracted. Moreover, the API development for extracting a person or researcher’s name is implemented using a Flask Python framework. The output of this work can be used to support the researcher’s name identification from the integrated researcher corpus.","PeriodicalId":371552,"journal":{"name":"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)","volume":"2008 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ecti-con49241.2020.9158214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the context of information extraction, a person’s name is one of the important named entities to be extracted which are applied to the question-answering and summarizing tasks. However, the boundary of a person’s name is still ambiguous since there are several writing patterns of a person’s name from online public data sources such as news, events, and researcher corpora. To extract, identify, and unify the person’s name, discovering the name prefix can be applied as clue words or phrases to such processes. In this paper, the name prefix discovering framework is proposed for collecting the integrated researcher corpus from various data sources and extracting name prefix patterns. Four main functions of the proposed framework are collecting data from data sources, tagging entities, preprocessing the researcher’s names, and finding the pattern of the personal name prefix. In this work, six data sources are gathered and ten entities related to the research domain are focused. The preprocessing data uses three sub-processes to provide the researcher’s name. The result shows that the 408 personal name prefixes are extracted. Moreover, the API development for extracting a person or researcher’s name is implemented using a Flask Python framework. The output of this work can be used to support the researcher’s name identification from the integrated researcher corpus.
泰语研究者语料中人名前缀模式的发现及其应用
在信息抽取中,人名是抽取的重要命名实体之一,用于问答和总结任务。然而,由于从新闻、事件和研究人员语料库等在线公共数据源中存在几种人名的书写模式,人名的边界仍然是模糊的。为了提取、识别和统一人名,发现姓名前缀可以作为线索词或短语应用于这些过程。本文提出了一个名称前缀发现框架,用于从各种数据源中收集集成研究者语料库并提取名称前缀模式。该框架的四个主要功能是:从数据源中收集数据、标记实体、对研究人员姓名进行预处理以及查找个人姓名前缀的模式。在这项工作中,收集了六个数据源,并集中了与研究领域相关的十个实体。预处理数据使用三个子过程来提供研究人员的姓名。结果表明,提取了408个个人姓名前缀。此外,用于提取个人或研究人员姓名的API开发是使用Flask Python框架实现的。这项工作的输出可用于支持从集成的研究人员语料库中识别研究人员的姓名。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信