基于KOS和深度学习的旅游领域数据集标注

IF 1.7 3区 管理学 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE
G. Aracri, A. Folino, Stefano Silvestri
{"title":"基于KOS和深度学习的旅游领域数据集标注","authors":"G. Aracri, A. Folino, Stefano Silvestri","doi":"10.1108/jd-02-2023-0019","DOIUrl":null,"url":null,"abstract":"PurposeThe purpose of this paper is to propose a methodology for the enrichment and tailoring of a knowledge organization system (KOS), in order to support the information extraction (IE) task for the analysis of documents in the tourism domain. In particular, the KOS is used to develop a named entity recognition (NER) system.Design/methodology/approachA method to improve and customize an available thesaurus by leveraging documents related to the tourism in Italy is firstly presented. Then, the obtained thesaurus is used to create an annotated NER corpus, exploiting both distant supervision, deep learning and a light human supervision.FindingsThe study shows that a customized KOS can effectively support IE tasks when applied to documents belonging to the same domains and types used for its construction. Moreover, it is very useful to support and ease the annotation task using the proposed methodology, allowing to annotate a corpus with a fraction of the effort required for a manual annotation.Originality/valueThe paper explores an alternative use of a KOS, proposing an innovative NER corpus annotation methodology. Moreover, the KOS and the annotated NER data set will be made publicly available.","PeriodicalId":47969,"journal":{"name":"Journal of Documentation","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrated use of KOS and deep learning for data set annotation in tourism domain\",\"authors\":\"G. Aracri, A. Folino, Stefano Silvestri\",\"doi\":\"10.1108/jd-02-2023-0019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"PurposeThe purpose of this paper is to propose a methodology for the enrichment and tailoring of a knowledge organization system (KOS), in order to support the information extraction (IE) task for the analysis of documents in the tourism domain. In particular, the KOS is used to develop a named entity recognition (NER) system.Design/methodology/approachA method to improve and customize an available thesaurus by leveraging documents related to the tourism in Italy is firstly presented. Then, the obtained thesaurus is used to create an annotated NER corpus, exploiting both distant supervision, deep learning and a light human supervision.FindingsThe study shows that a customized KOS can effectively support IE tasks when applied to documents belonging to the same domains and types used for its construction. Moreover, it is very useful to support and ease the annotation task using the proposed methodology, allowing to annotate a corpus with a fraction of the effort required for a manual annotation.Originality/valueThe paper explores an alternative use of a KOS, proposing an innovative NER corpus annotation methodology. Moreover, the KOS and the annotated NER data set will be made publicly available.\",\"PeriodicalId\":47969,\"journal\":{\"name\":\"Journal of Documentation\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Documentation\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.1108/jd-02-2023-0019\",\"RegionNum\":3,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Documentation","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1108/jd-02-2023-0019","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0

摘要

目的本文的目的是提出一种丰富和定制知识组织系统(KOS)的方法,以支持旅游领域文档分析的信息提取(IE)任务。特别是,KOS用于开发命名实体识别(NER)系统。设计/方法/方法首次提出了一种利用与意大利旅游业相关的文件来改进和定制可用词库的方法。然后,使用获得的词库创建带注释的NER语料库,利用远程监督、深度学习和轻度人类监督。发现研究表明,当应用于属于用于构建的相同域和类型的文档时,定制的KOS可以有效地支持IE任务。此外,使用所提出的方法来支持和简化注释任务是非常有用的,允许用手动注释所需的一小部分工作量来注释语料库。原创性/价值本文探讨了KOS的另一种使用方式,提出了一种创新的NER语料库注释方法。此外,KOS和注释的NER数据集将公开。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Integrated use of KOS and deep learning for data set annotation in tourism domain
PurposeThe purpose of this paper is to propose a methodology for the enrichment and tailoring of a knowledge organization system (KOS), in order to support the information extraction (IE) task for the analysis of documents in the tourism domain. In particular, the KOS is used to develop a named entity recognition (NER) system.Design/methodology/approachA method to improve and customize an available thesaurus by leveraging documents related to the tourism in Italy is firstly presented. Then, the obtained thesaurus is used to create an annotated NER corpus, exploiting both distant supervision, deep learning and a light human supervision.FindingsThe study shows that a customized KOS can effectively support IE tasks when applied to documents belonging to the same domains and types used for its construction. Moreover, it is very useful to support and ease the annotation task using the proposed methodology, allowing to annotate a corpus with a fraction of the effort required for a manual annotation.Originality/valueThe paper explores an alternative use of a KOS, proposing an innovative NER corpus annotation methodology. Moreover, the KOS and the annotated NER data set will be made publicly available.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Documentation
Journal of Documentation INFORMATION SCIENCE & LIBRARY SCIENCE-
CiteScore
4.20
自引率
14.30%
发文量
72
期刊介绍: The scope of the Journal of Documentation is broadly information sciences, encompassing all of the academic and professional disciplines which deal with recorded information. These include, but are certainly not limited to: ■Information science, librarianship and related disciplines ■Information and knowledge management ■Information and knowledge organisation ■Information seeking and retrieval, and human information behaviour ■Information and digital literacies
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信