Using a Workflow Management Platform in Textual Data Management

IF 1.3 3区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Intelligence Pub Date : 2022-03-07 DOI:10.1162/dint_a_00139

T. Doan, S. Bingert, R. Yahyapour

{"title":"Using a Workflow Management Platform in Textual Data Management","authors":"T. Doan, S. Bingert, R. Yahyapour","doi":"10.1162/dint_a_00139","DOIUrl":null,"url":null,"abstract":"Abstract The paper gives a brief introduction about the workflow management platform, Flowable, and how it is used for textual-data management. It is relatively new with its first release on 13 October, 2016. Despite the short time on the market, it seems to be quickly well-noticed with 4.6 thousand stars on GitHub at the moment. The focus of our project is to build a platform for text analysis on a large scale by including many different text resources. Currently, we have successfully connected to four different text resources and obtained more than one million works. Some resources are dynamic, which means that they might add more data or modify their current data. Therefore, it is necessary to keep data, both the metadata and the raw data, from our side up to date with the resources. In addition, to comply with FAIR principles, each work is assigned a persistent identifier (PID) and indexed for searching purposes. In the last step, we perform some standard analyses on the data to enhance our search engine and to generate a knowledge graph. End-users can utilize our platform to search on our data or get access to the knowledge graph. Furthermore, they can submit their code for their analyses to the system. The code will be executed on a High-Performance Cluster (HPC) and users can receive the results later on. In this case, Flowable can take advantage of PIDs for digital objects identification and management to facilitate the communication with the HPC system. As one may already notice, the whole process can be expressed as a workflow. A workflow, including error handling and notification, has been created and deployed. Workflow execution can be triggered manually or after predefined time intervals. According to our evaluation, the Flowable platform proves to be powerful and flexible. Further usage of the platform is already planned or implemented for many of our projects.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"398-408"},"PeriodicalIF":1.3000,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/dint_a_00139","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract The paper gives a brief introduction about the workflow management platform, Flowable, and how it is used for textual-data management. It is relatively new with its first release on 13 October, 2016. Despite the short time on the market, it seems to be quickly well-noticed with 4.6 thousand stars on GitHub at the moment. The focus of our project is to build a platform for text analysis on a large scale by including many different text resources. Currently, we have successfully connected to four different text resources and obtained more than one million works. Some resources are dynamic, which means that they might add more data or modify their current data. Therefore, it is necessary to keep data, both the metadata and the raw data, from our side up to date with the resources. In addition, to comply with FAIR principles, each work is assigned a persistent identifier (PID) and indexed for searching purposes. In the last step, we perform some standard analyses on the data to enhance our search engine and to generate a knowledge graph. End-users can utilize our platform to search on our data or get access to the knowledge graph. Furthermore, they can submit their code for their analyses to the system. The code will be executed on a High-Performance Cluster (HPC) and users can receive the results later on. In this case, Flowable can take advantage of PIDs for digital objects identification and management to facilitate the communication with the HPC system. As one may already notice, the whole process can be expressed as a workflow. A workflow, including error handling and notification, has been created and deployed. Workflow execution can be triggered manually or after predefined time intervals. According to our evaluation, the Flowable platform proves to be powerful and flexible. Further usage of the platform is already planned or implemented for many of our projects.

查看原文本刊更多论文

工作流管理平台在文本数据管理中的应用

摘要本文简要介绍了工作流管理平台Flowable，以及它是如何用于文本数据管理的。它相对较新，于2016年10月13日首次发布。尽管上市时间很短，但它似乎很快就受到了广泛关注，目前在GitHub上有4.6万颗星。我们项目的重点是通过包含许多不同的文本资源来构建一个大规模的文本分析平台。目前，我们已经成功连接到四个不同的文本资源，并获得了超过一百万件作品。有些资源是动态的，这意味着它们可能会添加更多数据或修改当前数据。因此，有必要保持我们这边的数据，包括元数据和原始数据，与资源保持同步。此外，为了遵守FAIR原则，每个作品都被分配了一个持久标识符（PID），并被索引用于搜索目的。在最后一步中，我们对数据进行了一些标准分析，以增强我们的搜索引擎并生成知识图。最终用户可以利用我们的平台搜索我们的数据或访问知识图。此外，他们可以向系统提交用于分析的代码。代码将在高性能集群（HPC）上执行，用户稍后可以接收结果。在这种情况下，Flowable可以利用PID进行数字对象识别和管理，以方便与HPC系统的通信。正如人们可能已经注意到的那样，整个过程可以表示为一个工作流。已经创建并部署了包括错误处理和通知在内的工作流。工作流执行可以手动触发，也可以在预定义的时间间隔之后触发。根据我们的评估，Flowable平台被证明是强大和灵活的。我们的许多项目已经计划或实施了该平台的进一步使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊