{"title":"不确定环境下高效大数据索引方法研究","authors":"Asma Omri, Mohamed Nazih Omri","doi":"10.5815/ijisa.2022.02.01","DOIUrl":null,"url":null,"abstract":"It is generally accepted that data production has experienced spectacular growth for several years due to the proliferation of new technologies such as new mobile devices, smart meters, social networks, cloud computing and sensors. In fact, this data explosion should continue and even accelerate. To find all of the documents responding to a request, any information search system develops a methodology to confirm whether or not the terms of each document correspond to those of the user's request. Most systems are based on the assumption that the terms extracted from the documents have been certain and precise. However, there are data in which this assumption is difficult to apply. The main objective of the work carried out within the framework of this article is to propose a new model of data service indexing in an uncertain environment, meaning that the data they contain can be untrustworthy, or they can be contradictory to another data source, due to failure in collection or integration mechanisms. The solution we have proposed is characterized by its Intelligent side ensured by an efficient fuzzy module capable of reasoning in an environment of uncertain and imprecise data. Concretely, our proposed approach is articulated around two main phases: (i) a first phase ensures the processing of uncertain data in a textual document and, (ii) the second phase makes it possible to determine a new method of uncertain syntactic indexing. We carried out a series of experiments, on different bases of standard tests, in order to evaluate our solution while comparing it to the approaches studied in the literature. We used different standard performance measures, namely precision, recall and F_measure. The results found showed that our solution is more efficient and more efficient than the main approaches proposed in the literature. The results show that the proposed approach realizes an efficient Big Data indexing solution in an Uncertain Environment that increases the Precision, the Recall and the F_measure measurements. Experimental results present that the proposed uncertain model obtained the best precision accuracy 0.395 with KDD database and the best recall accuracy 0.254 with the same database.","PeriodicalId":14067,"journal":{"name":"International Journal of Intelligent Systems and Applications in Engineering","volume":"50 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Towards an Efficient Big Data Indexing Approach under an Uncertain Environment\",\"authors\":\"Asma Omri, Mohamed Nazih Omri\",\"doi\":\"10.5815/ijisa.2022.02.01\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is generally accepted that data production has experienced spectacular growth for several years due to the proliferation of new technologies such as new mobile devices, smart meters, social networks, cloud computing and sensors. In fact, this data explosion should continue and even accelerate. To find all of the documents responding to a request, any information search system develops a methodology to confirm whether or not the terms of each document correspond to those of the user's request. Most systems are based on the assumption that the terms extracted from the documents have been certain and precise. However, there are data in which this assumption is difficult to apply. The main objective of the work carried out within the framework of this article is to propose a new model of data service indexing in an uncertain environment, meaning that the data they contain can be untrustworthy, or they can be contradictory to another data source, due to failure in collection or integration mechanisms. The solution we have proposed is characterized by its Intelligent side ensured by an efficient fuzzy module capable of reasoning in an environment of uncertain and imprecise data. Concretely, our proposed approach is articulated around two main phases: (i) a first phase ensures the processing of uncertain data in a textual document and, (ii) the second phase makes it possible to determine a new method of uncertain syntactic indexing. We carried out a series of experiments, on different bases of standard tests, in order to evaluate our solution while comparing it to the approaches studied in the literature. We used different standard performance measures, namely precision, recall and F_measure. The results found showed that our solution is more efficient and more efficient than the main approaches proposed in the literature. The results show that the proposed approach realizes an efficient Big Data indexing solution in an Uncertain Environment that increases the Precision, the Recall and the F_measure measurements. Experimental results present that the proposed uncertain model obtained the best precision accuracy 0.395 with KDD database and the best recall accuracy 0.254 with the same database.\",\"PeriodicalId\":14067,\"journal\":{\"name\":\"International Journal of Intelligent Systems and Applications in Engineering\",\"volume\":\"50 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Intelligent Systems and Applications in Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5815/ijisa.2022.02.01\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Systems and Applications in Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijisa.2022.02.01","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
Towards an Efficient Big Data Indexing Approach under an Uncertain Environment
It is generally accepted that data production has experienced spectacular growth for several years due to the proliferation of new technologies such as new mobile devices, smart meters, social networks, cloud computing and sensors. In fact, this data explosion should continue and even accelerate. To find all of the documents responding to a request, any information search system develops a methodology to confirm whether or not the terms of each document correspond to those of the user's request. Most systems are based on the assumption that the terms extracted from the documents have been certain and precise. However, there are data in which this assumption is difficult to apply. The main objective of the work carried out within the framework of this article is to propose a new model of data service indexing in an uncertain environment, meaning that the data they contain can be untrustworthy, or they can be contradictory to another data source, due to failure in collection or integration mechanisms. The solution we have proposed is characterized by its Intelligent side ensured by an efficient fuzzy module capable of reasoning in an environment of uncertain and imprecise data. Concretely, our proposed approach is articulated around two main phases: (i) a first phase ensures the processing of uncertain data in a textual document and, (ii) the second phase makes it possible to determine a new method of uncertain syntactic indexing. We carried out a series of experiments, on different bases of standard tests, in order to evaluate our solution while comparing it to the approaches studied in the literature. We used different standard performance measures, namely precision, recall and F_measure. The results found showed that our solution is more efficient and more efficient than the main approaches proposed in the literature. The results show that the proposed approach realizes an efficient Big Data indexing solution in an Uncertain Environment that increases the Precision, the Recall and the F_measure measurements. Experimental results present that the proposed uncertain model obtained the best precision accuracy 0.395 with KDD database and the best recall accuracy 0.254 with the same database.