ANNOTATING UNSTRUCTURED TEXTS FOR ENHANCING SEMANTIC ANALYSIS PROCESSES

Tiago Fraga, Orlando Belo, Anabela Barros
{"title":"ANNOTATING UNSTRUCTURED TEXTS FOR ENHANCING SEMANTIC ANALYSIS PROCESSES","authors":"Tiago Fraga, Orlando Belo, Anabela Barros","doi":"10.33965/ijcsis_2023180103","DOIUrl":null,"url":null,"abstract":"Annotation is a powerful instrument for enhancing knowledge containing in texts. When developing a text analysis process, we often make notes for identifying and characterizing concepts and relationships, or highlighting aspects in the text that could go unnoticed by some of its readers. In addition, text annotation can enrich the semantics of texts, giving them more value through the introduction of comments, explanations, references, among many other things. Today, most text annotation processes are carried out helped by computational tools, whose functionalities make it possible to simplify the most elementary annotation tasks and substantially reduce the annotation time. The annotation of old, unstructured texts is very relevant for all those who want to study and acquire knowledge about their contents. Annotating these texts makes them more accessible to people who are not experts in the domain or in the era in which they were produced. In this work we develop a specific annotation system, supported by natural language processing and machine learning tools, to reveal the knowledge contained in the Book of Properties – “Tombo da Mitra” –, a codex containing the inventory of the Archbishop's Table of Braga’s properties (Portugal) in the 17th century. This codex contains a huge amount and a wide variety of elements, containing names, nicknames, settlements, professions, types of land and buildings, among many others. All these elements are very important for studying and learning of geography, culture, economy, architecture, religion and Portuguese language until the 17th century. Annotating the Book of Properties makes possible to maintain a tag database for indexing the most relevant information contained in the book and make its knowledge accessible to a wider range of people.","PeriodicalId":506509,"journal":{"name":"IADIS INTERNATIONAL JOURNAL ON COMPUTER SCIENCE AND INFORMATION SYSTEMS","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IADIS INTERNATIONAL JOURNAL ON COMPUTER SCIENCE AND INFORMATION SYSTEMS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33965/ijcsis_2023180103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Annotation is a powerful instrument for enhancing knowledge containing in texts. When developing a text analysis process, we often make notes for identifying and characterizing concepts and relationships, or highlighting aspects in the text that could go unnoticed by some of its readers. In addition, text annotation can enrich the semantics of texts, giving them more value through the introduction of comments, explanations, references, among many other things. Today, most text annotation processes are carried out helped by computational tools, whose functionalities make it possible to simplify the most elementary annotation tasks and substantially reduce the annotation time. The annotation of old, unstructured texts is very relevant for all those who want to study and acquire knowledge about their contents. Annotating these texts makes them more accessible to people who are not experts in the domain or in the era in which they were produced. In this work we develop a specific annotation system, supported by natural language processing and machine learning tools, to reveal the knowledge contained in the Book of Properties – “Tombo da Mitra” –, a codex containing the inventory of the Archbishop's Table of Braga’s properties (Portugal) in the 17th century. This codex contains a huge amount and a wide variety of elements, containing names, nicknames, settlements, professions, types of land and buildings, among many others. All these elements are very important for studying and learning of geography, culture, economy, architecture, religion and Portuguese language until the 17th century. Annotating the Book of Properties makes possible to maintain a tag database for indexing the most relevant information contained in the book and make its knowledge accessible to a wider range of people.
为非结构化文本添加注释以增强语义分析过程
注释是增强文本知识含量的有力工具。在进行文本分析时,我们经常会做一些注释,以确定概念和关系的特征,或突出文本中可能被某些读者忽略的方面。此外,文本注释还能丰富文本的语义,通过引入注释、解释和参考文献等方式赋予文本更多价值。如今,大多数文本注释工作都是在计算机工具的帮助下进行的,这些工具的功能可以简化最基本的注释任务,并大大缩短注释时间。对于所有希望研究和获取文本内容知识的人来说,给非结构化的旧文本添加注释是非常重要的。对这些文本进行注释,可以让那些不是该领域或该文本产生时代的专家的人更容易理解这些文本。在这项工作中,我们在自然语言处理和机器学习工具的支持下开发了一个特定的注释系统,以揭示《财产之书》("Tombo da Mitra")中包含的知识,这是一本包含 17 世纪布拉加(葡萄牙)大主教财产目录的手抄本。这本手抄本包含大量内容,种类繁多,包括姓名、绰号、居住地、职业、土地和建筑物类型等。所有这些内容对于研究和学习 17 世纪以前的地理、文化、经济、建筑、宗教和葡萄牙语都非常重要。对《物产册》进行注释,可以维护一个标签数据库,为《物产册》中最相关的信息编制索引,让更多的人了解《物产册》的知识。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信