不确定环境下高效大数据索引方法研究

Q3 Computer Science

International Journal of Intelligent Systems and Applications in Engineering Pub Date : 2022-04-08 DOI:10.5815/ijisa.2022.02.01

Asma Omri, Mohamed Nazih Omri

{"title":"不确定环境下高效大数据索引方法研究","authors":"Asma Omri, Mohamed Nazih Omri","doi":"10.5815/ijisa.2022.02.01","DOIUrl":null,"url":null,"abstract":"It is generally accepted that data production has experienced spectacular growth for several years due to the proliferation of new technologies such as new mobile devices, smart meters, social networks, cloud computing and sensors. In fact, this data explosion should continue and even accelerate. To find all of the documents responding to a request, any information search system develops a methodology to confirm whether or not the terms of each document correspond to those of the user's request. Most systems are based on the assumption that the terms extracted from the documents have been certain and precise. However, there are data in which this assumption is difficult to apply. The main objective of the work carried out within the framework of this article is to propose a new model of data service indexing in an uncertain environment, meaning that the data they contain can be untrustworthy, or they can be contradictory to another data source, due to failure in collection or integration mechanisms. The solution we have proposed is characterized by its Intelligent side ensured by an efficient fuzzy module capable of reasoning in an environment of uncertain and imprecise data. Concretely, our proposed approach is articulated around two main phases: (i) a first phase ensures the processing of uncertain data in a textual document and, (ii) the second phase makes it possible to determine a new method of uncertain syntactic indexing. We carried out a series of experiments, on different bases of standard tests, in order to evaluate our solution while comparing it to the approaches studied in the literature. We used different standard performance measures, namely precision, recall and F_measure. The results found showed that our solution is more efficient and more efficient than the main approaches proposed in the literature. The results show that the proposed approach realizes an efficient Big Data indexing solution in an Uncertain Environment that increases the Precision, the Recall and the F_measure measurements. Experimental results present that the proposed uncertain model obtained the best precision accuracy 0.395 with KDD database and the best recall accuracy 0.254 with the same database.","PeriodicalId":14067,"journal":{"name":"International Journal of Intelligent Systems and Applications in Engineering","volume":"50 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Towards an Efficient Big Data Indexing Approach under an Uncertain Environment\",\"authors\":\"Asma Omri, Mohamed Nazih Omri\",\"doi\":\"10.5815/ijisa.2022.02.01\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is generally accepted that data production has experienced spectacular growth for several years due to the proliferation of new technologies such as new mobile devices, smart meters, social networks, cloud computing and sensors. In fact, this data explosion should continue and even accelerate. To find all of the documents responding to a request, any information search system develops a methodology to confirm whether or not the terms of each document correspond to those of the user's request. Most systems are based on the assumption that the terms extracted from the documents have been certain and precise. However, there are data in which this assumption is difficult to apply. The main objective of the work carried out within the framework of this article is to propose a new model of data service indexing in an uncertain environment, meaning that the data they contain can be untrustworthy, or they can be contradictory to another data source, due to failure in collection or integration mechanisms. The solution we have proposed is characterized by its Intelligent side ensured by an efficient fuzzy module capable of reasoning in an environment of uncertain and imprecise data. Concretely, our proposed approach is articulated around two main phases: (i) a first phase ensures the processing of uncertain data in a textual document and, (ii) the second phase makes it possible to determine a new method of uncertain syntactic indexing. We carried out a series of experiments, on different bases of standard tests, in order to evaluate our solution while comparing it to the approaches studied in the literature. We used different standard performance measures, namely precision, recall and F_measure. The results found showed that our solution is more efficient and more efficient than the main approaches proposed in the literature. The results show that the proposed approach realizes an efficient Big Data indexing solution in an Uncertain Environment that increases the Precision, the Recall and the F_measure measurements. Experimental results present that the proposed uncertain model obtained the best precision accuracy 0.395 with KDD database and the best recall accuracy 0.254 with the same database.\",\"PeriodicalId\":14067,\"journal\":{\"name\":\"International Journal of Intelligent Systems and Applications in Engineering\",\"volume\":\"50 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Intelligent Systems and Applications in Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5815/ijisa.2022.02.01\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Systems and Applications in Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijisa.2022.02.01","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 1

摘要

人们普遍认为，由于新的移动设备、智能电表、社交网络、云计算和传感器等新技术的激增，数据生产在过去几年里经历了惊人的增长。事实上，这种数据爆炸应该会继续，甚至会加速。为了找到响应请求的所有文档，任何信息搜索系统都开发了一种方法来确认每个文档的术语是否与用户请求的术语相对应。大多数系统都基于这样的假设，即从文档中提取的术语是确定和精确的。然而，在一些数据中，这种假设很难适用。在本文框架内开展的工作的主要目标是提出一种在不确定环境下的数据服务索引的新模型，这意味着它们包含的数据可能不可信，或者由于收集或集成机制的失败而与另一个数据源相矛盾。我们提出的解决方案的特点是其智能的一面，由一个有效的模糊模块保证，能够在不确定和不精确的数据环境中进行推理。具体而言，我们提出的方法围绕两个主要阶段进行阐述:(i)第一阶段确保文本文档中不确定数据的处理，(ii)第二阶段使确定不确定句法索引的新方法成为可能。我们在不同的标准测试基础上进行了一系列实验，以评估我们的解决方案，并将其与文献中研究的方法进行比较。我们使用了不同的标准性能度量，即精度、召回率和F_measure。结果表明，我们的解决方案比文献中提出的主要方法效率更高，效率更高。结果表明，该方法在不确定环境下实现了一种高效的大数据索引解决方案，提高了检索精度、查全率和F_measure测量值。实验结果表明，该不确定模型在使用KDD数据库时获得了最佳的查全准确率0.395，在使用相同数据库时获得了最佳查全准确率0.254。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards an Efficient Big Data Indexing Approach under an Uncertain Environment

It is generally accepted that data production has experienced spectacular growth for several years due to the proliferation of new technologies such as new mobile devices, smart meters, social networks, cloud computing and sensors. In fact, this data explosion should continue and even accelerate. To find all of the documents responding to a request, any information search system develops a methodology to confirm whether or not the terms of each document correspond to those of the user's request. Most systems are based on the assumption that the terms extracted from the documents have been certain and precise. However, there are data in which this assumption is difficult to apply. The main objective of the work carried out within the framework of this article is to propose a new model of data service indexing in an uncertain environment, meaning that the data they contain can be untrustworthy, or they can be contradictory to another data source, due to failure in collection or integration mechanisms. The solution we have proposed is characterized by its Intelligent side ensured by an efficient fuzzy module capable of reasoning in an environment of uncertain and imprecise data. Concretely, our proposed approach is articulated around two main phases: (i) a first phase ensures the processing of uncertain data in a textual document and, (ii) the second phase makes it possible to determine a new method of uncertain syntactic indexing. We carried out a series of experiments, on different bases of standard tests, in order to evaluate our solution while comparing it to the approaches studied in the literature. We used different standard performance measures, namely precision, recall and F_measure. The results found showed that our solution is more efficient and more efficient than the main approaches proposed in the literature. The results show that the proposed approach realizes an efficient Big Data indexing solution in an Uncertain Environment that increases the Precision, the Recall and the F_measure measurements. Experimental results present that the proposed uncertain model obtained the best precision accuracy 0.395 with KDD database and the best recall accuracy 0.254 with the same database.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Intelligent Systems and Applications in Engineering Computer Science-Computer Graphics and Computer-Aided Design

CiteScore

1.30

自引率

0.00%

发文量