丰富非结构化研究信息质量的文本和数据分析方法

J. Chem. Inf. Comput. Sci. Pub Date : 2019-10-30 DOI:10.5539/cis.v12n4p84

Otmane Azeroual

{"title":"丰富非结构化研究信息质量的文本和数据分析方法","authors":"Otmane Azeroual","doi":"10.5539/cis.v12n4p84","DOIUrl":null,"url":null,"abstract":"With the increased accessibility of research information, the demands on research information systems (RIS) that are expected to automatically generate and process knowledge are increasing. Furthermore, the quality of the RIS data entries of the individual sources of information causes problems. If the data is structured in RIS, users can read and filter out their information and knowledge needs without any problems. This technique, which nevertheless allows text databases and text sources to be analyzed and knowledge extracted from unknown texts, is referred to as text mining or text data mining based on the principles of data mining. Text mining allows automatically classifying large heterogeneous sources of research information and assigning them to specific topics. Research information has always played a major role in higher education and academic institutions, although they were usually available in unstructured form in RIS and grow faster than structured data. This can be a waste of time searching for RIS staff in universities and can lead to bad decision-making. For this reason, the present paper proposes a new approach to obtaining structured research information from heterogeneous information systems. It is a subset of an approach to the semantic integration of unstructured data using the example of a RIS. The purpose of this paper is to investigate text and data mining methods in the context of RIS and to develop an improvement quality model as an aid to RIS using universities and academic institutions to enrich unstructured research information.","PeriodicalId":14676,"journal":{"name":"J. Chem. Inf. Comput. Sci.","volume":"7 1","pages":"84-95"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Text and Data Analytics Approach to Enrich the Quality of Unstructured Research Information\",\"authors\":\"Otmane Azeroual\",\"doi\":\"10.5539/cis.v12n4p84\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the increased accessibility of research information, the demands on research information systems (RIS) that are expected to automatically generate and process knowledge are increasing. Furthermore, the quality of the RIS data entries of the individual sources of information causes problems. If the data is structured in RIS, users can read and filter out their information and knowledge needs without any problems. This technique, which nevertheless allows text databases and text sources to be analyzed and knowledge extracted from unknown texts, is referred to as text mining or text data mining based on the principles of data mining. Text mining allows automatically classifying large heterogeneous sources of research information and assigning them to specific topics. Research information has always played a major role in higher education and academic institutions, although they were usually available in unstructured form in RIS and grow faster than structured data. This can be a waste of time searching for RIS staff in universities and can lead to bad decision-making. For this reason, the present paper proposes a new approach to obtaining structured research information from heterogeneous information systems. It is a subset of an approach to the semantic integration of unstructured data using the example of a RIS. The purpose of this paper is to investigate text and data mining methods in the context of RIS and to develop an improvement quality model as an aid to RIS using universities and academic institutions to enrich unstructured research information.\",\"PeriodicalId\":14676,\"journal\":{\"name\":\"J. Chem. Inf. Comput. Sci.\",\"volume\":\"7 1\",\"pages\":\"84-95\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Chem. Inf. Comput. Sci.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5539/cis.v12n4p84\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Chem. Inf. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5539/cis.v12n4p84","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

随着研究信息可及性的提高，人们对研究信息系统(RIS)自动生成和处理知识的要求也越来越高。此外，各个信息源的RIS数据条目的质量也会导致问题。如果数据在RIS中结构化，用户可以毫无问题地阅读和过滤出他们需要的信息和知识。这种技术允许对文本数据库和文本源进行分析，并从未知文本中提取知识，根据数据挖掘的原理将其称为文本挖掘或文本数据挖掘。文本挖掘允许自动分类大型异构来源的研究信息，并将它们分配到特定的主题。研究信息一直在高等教育和学术机构中发挥着重要作用，尽管它们通常以RIS中的非结构化形式提供，并且比结构化数据增长得更快。这可能是浪费时间在大学里寻找RIS工作人员，并可能导致错误的决策。为此，本文提出了一种从异构信息系统中获取结构化研究信息的新方法。它是一种非结构化数据语义集成方法的子集，以RIS为例。本文的目的是研究RIS背景下的文本和数据挖掘方法，并开发一个改进质量模型，作为RIS的辅助，利用大学和学术机构丰富非结构化研究信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Text and Data Analytics Approach to Enrich the Quality of Unstructured Research Information

With the increased accessibility of research information, the demands on research information systems (RIS) that are expected to automatically generate and process knowledge are increasing. Furthermore, the quality of the RIS data entries of the individual sources of information causes problems. If the data is structured in RIS, users can read and filter out their information and knowledge needs without any problems. This technique, which nevertheless allows text databases and text sources to be analyzed and knowledge extracted from unknown texts, is referred to as text mining or text data mining based on the principles of data mining. Text mining allows automatically classifying large heterogeneous sources of research information and assigning them to specific topics. Research information has always played a major role in higher education and academic institutions, although they were usually available in unstructured form in RIS and grow faster than structured data. This can be a waste of time searching for RIS staff in universities and can lead to bad decision-making. For this reason, the present paper proposes a new approach to obtaining structured research information from heterogeneous information systems. It is a subset of an approach to the semantic integration of unstructured data using the example of a RIS. The purpose of this paper is to investigate text and data mining methods in the context of RIS and to develop an improvement quality model as an aid to RIS using universities and academic institutions to enrich unstructured research information.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

J. Chem. Inf. Comput. Sci.

自引率

0.00%

发文量