Use of text mining techniques for unsupervised organization of digital procedural acts

Alfredo Silveira Araújo Neto, Marcos Negreiros
{"title":"Use of text mining techniques for unsupervised organization of digital procedural acts","authors":"Alfredo Silveira Araújo Neto, Marcos Negreiros","doi":"10.22456/2175-2745.83581","DOIUrl":null,"url":null,"abstract":"The rapid advances in technologies related to the capture and storage of data in digital format have allowed to organizations the accumulation of a volume of information extremely high, constituted a higher proportion of data in unstructured format, represented by texts. However, it is noted that the retrieval of useful information from these large repositories has been a very challenging activity. In this context, data mining is presented as a self-discovery process that acts on large databases and enables the knowledge extraction from raw text documents. Among the many sources of textual documents are electronic diaries of justice, which are intended to make public officially all the acts of the Judiciary. Despite the publication in digital form has provided improvements represented by the removal of imperfections related to divulgation at printed format, it is observed that the application of data mining methods could render more rapid analysis of its contents. In this sense, this article establishes a tool capable of automatically grouping and categorizing digital procedural acts, based on the evaluation of text mining techniques applied to groups determination activity. In addition, the strategy of defining the descriptors of the groups, that is usually conducted based on the most frequent words in the documents, was evaluated and remodeled in order to use, instead of words, the most regularly identified concepts in the texts.","PeriodicalId":82472,"journal":{"name":"Research initiative, treatment action : RITA","volume":"423 1","pages":"74-102"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research initiative, treatment action : RITA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22456/2175-2745.83581","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The rapid advances in technologies related to the capture and storage of data in digital format have allowed to organizations the accumulation of a volume of information extremely high, constituted a higher proportion of data in unstructured format, represented by texts. However, it is noted that the retrieval of useful information from these large repositories has been a very challenging activity. In this context, data mining is presented as a self-discovery process that acts on large databases and enables the knowledge extraction from raw text documents. Among the many sources of textual documents are electronic diaries of justice, which are intended to make public officially all the acts of the Judiciary. Despite the publication in digital form has provided improvements represented by the removal of imperfections related to divulgation at printed format, it is observed that the application of data mining methods could render more rapid analysis of its contents. In this sense, this article establishes a tool capable of automatically grouping and categorizing digital procedural acts, based on the evaluation of text mining techniques applied to groups determination activity. In addition, the strategy of defining the descriptors of the groups, that is usually conducted based on the most frequent words in the documents, was evaluated and remodeled in order to use, instead of words, the most regularly identified concepts in the texts.
文本挖掘技术在数字程序行为无监督组织中的应用
与以数字格式获取和储存数据有关的技术迅速发展,使各组织积累了大量的信息,构成了以文本为代表的非结构化格式的数据的较高比例。然而,值得注意的是,从这些大型存储库中检索有用的信息是一项非常具有挑战性的活动。在这种情况下,数据挖掘被认为是一个自我发现的过程,它作用于大型数据库,并能够从原始文本文档中提取知识。电子司法日记是文本文件的众多来源之一,其目的是正式公开司法机构的所有行为。尽管以数字形式出版有所改进,消除了与印刷形式泄露有关的缺陷,但有人指出,应用数据挖掘方法可以更迅速地分析其内容。从这个意义上说,本文建立了一个能够自动分组和分类数字程序行为的工具,基于文本挖掘技术应用于组确定活动的评估。此外,定义组描述符的策略通常是基于文档中最频繁的单词进行的,为了使用文本中最经常识别的概念而不是单词,对其进行了评估和改造。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信