Discovering of Personal Name Prefix Patterns in Thai Researcher Corpus and Its Application

2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) Pub Date : 2020-06-01 DOI:10.1109/ecti-con49241.2020.9158214

Nongnuch Ketui, Nattapong Tongtep, T. Theeramunkong

{"title":"Discovering of Personal Name Prefix Patterns in Thai Researcher Corpus and Its Application","authors":"Nongnuch Ketui, Nattapong Tongtep, T. Theeramunkong","doi":"10.1109/ecti-con49241.2020.9158214","DOIUrl":null,"url":null,"abstract":"In the context of information extraction, a person’s name is one of the important named entities to be extracted which are applied to the question-answering and summarizing tasks. However, the boundary of a person’s name is still ambiguous since there are several writing patterns of a person’s name from online public data sources such as news, events, and researcher corpora. To extract, identify, and unify the person’s name, discovering the name prefix can be applied as clue words or phrases to such processes. In this paper, the name prefix discovering framework is proposed for collecting the integrated researcher corpus from various data sources and extracting name prefix patterns. Four main functions of the proposed framework are collecting data from data sources, tagging entities, preprocessing the researcher’s names, and finding the pattern of the personal name prefix. In this work, six data sources are gathered and ten entities related to the research domain are focused. The preprocessing data uses three sub-processes to provide the researcher’s name. The result shows that the 408 personal name prefixes are extracted. Moreover, the API development for extracting a person or researcher’s name is implemented using a Flask Python framework. The output of this work can be used to support the researcher’s name identification from the integrated researcher corpus.","PeriodicalId":371552,"journal":{"name":"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)","volume":"2008 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ecti-con49241.2020.9158214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In the context of information extraction, a person’s name is one of the important named entities to be extracted which are applied to the question-answering and summarizing tasks. However, the boundary of a person’s name is still ambiguous since there are several writing patterns of a person’s name from online public data sources such as news, events, and researcher corpora. To extract, identify, and unify the person’s name, discovering the name prefix can be applied as clue words or phrases to such processes. In this paper, the name prefix discovering framework is proposed for collecting the integrated researcher corpus from various data sources and extracting name prefix patterns. Four main functions of the proposed framework are collecting data from data sources, tagging entities, preprocessing the researcher’s names, and finding the pattern of the personal name prefix. In this work, six data sources are gathered and ten entities related to the research domain are focused. The preprocessing data uses three sub-processes to provide the researcher’s name. The result shows that the 408 personal name prefixes are extracted. Moreover, the API development for extracting a person or researcher’s name is implemented using a Flask Python framework. The output of this work can be used to support the researcher’s name identification from the integrated researcher corpus.

查看原文本刊更多论文

泰语研究者语料中人名前缀模式的发现及其应用

在信息抽取中，人名是抽取的重要命名实体之一，用于问答和总结任务。然而，由于从新闻、事件和研究人员语料库等在线公共数据源中存在几种人名的书写模式，人名的边界仍然是模糊的。为了提取、识别和统一人名，发现姓名前缀可以作为线索词或短语应用于这些过程。本文提出了一个名称前缀发现框架，用于从各种数据源中收集集成研究者语料库并提取名称前缀模式。该框架的四个主要功能是:从数据源中收集数据、标记实体、对研究人员姓名进行预处理以及查找个人姓名前缀的模式。在这项工作中，收集了六个数据源，并集中了与研究领域相关的十个实体。预处理数据使用三个子过程来提供研究人员的姓名。结果表明，提取了408个个人姓名前缀。此外，用于提取个人或研究人员姓名的API开发是使用Flask Python框架实现的。这项工作的输出可用于支持从集成的研究人员语料库中识别研究人员的姓名。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)

自引率

0.00%

发文量