Doaa N. Mhawi, Haider W. Oleiwi, N. Saeed, Heba L. Al-Taie
{"title":"An Efficient Information Retrieval System Using Evolutionary Algorithms","authors":"Doaa N. Mhawi, Haider W. Oleiwi, N. Saeed, Heba L. Al-Taie","doi":"10.3390/network2040034","DOIUrl":null,"url":null,"abstract":"When it comes to web search, information retrieval (IR) represents a critical technique as web pages have been increasingly growing. However, web users face major problems; unrelated user query retrieved documents (i.e., low precision), a lack of relevant document retrieval (i.e., low recall), acceptable retrieval time, and minimum storage space. This paper proposed a novel advanced document-indexing method (ADIM) with an integrated evolutionary algorithm. The proposed IRS includes three main stages; the first stage (i.e., the advanced documents indexing method) is preprocessing, which consists of two steps: dataset documents reading and advanced documents indexing method (ADIM), resulting in a set of two tables. The second stage is the query searching algorithm to produce a set of words or keywords and the related documents retrieving. The third stage (i.e., the searching algorithm) consists of two steps. The modified genetic algorithm (MGA) proposed new fitness functions using a cross-point operator with dynamic length chromosomes with the adaptive function of the culture algorithm (CA). The proposed system ranks the most relevant documents to the user query by adding a simple parameter (∝) to the fitness function to guarantee the convergence solution, retrieving the most relevant user’s document by integrating MGA with the CA algorithm to achieve the best accuracy. This system was simulated using a free dataset called WebKb containing Worldwide Webpages of computer science departments at multiple universities. The dataset is composed of 8280 HTML-programed semi-structured documents. Experimental results and evaluation measurements showed 100% average precision with 98.5236% average recall for 50 test queries, while the average response time was 00.46.74.78 milliseconds with 18.8 MB memory space for document indexing. The proposed work outperforms all the literature, comparatively, representing a remarkable leap in the studied field.","PeriodicalId":48520,"journal":{"name":"Network Neuroscience","volume":"10 1","pages":"583-605"},"PeriodicalIF":3.6000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Network Neuroscience","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/network2040034","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"NEUROSCIENCES","Score":null,"Total":0}
引用次数: 5
Abstract
When it comes to web search, information retrieval (IR) represents a critical technique as web pages have been increasingly growing. However, web users face major problems; unrelated user query retrieved documents (i.e., low precision), a lack of relevant document retrieval (i.e., low recall), acceptable retrieval time, and minimum storage space. This paper proposed a novel advanced document-indexing method (ADIM) with an integrated evolutionary algorithm. The proposed IRS includes three main stages; the first stage (i.e., the advanced documents indexing method) is preprocessing, which consists of two steps: dataset documents reading and advanced documents indexing method (ADIM), resulting in a set of two tables. The second stage is the query searching algorithm to produce a set of words or keywords and the related documents retrieving. The third stage (i.e., the searching algorithm) consists of two steps. The modified genetic algorithm (MGA) proposed new fitness functions using a cross-point operator with dynamic length chromosomes with the adaptive function of the culture algorithm (CA). The proposed system ranks the most relevant documents to the user query by adding a simple parameter (∝) to the fitness function to guarantee the convergence solution, retrieving the most relevant user’s document by integrating MGA with the CA algorithm to achieve the best accuracy. This system was simulated using a free dataset called WebKb containing Worldwide Webpages of computer science departments at multiple universities. The dataset is composed of 8280 HTML-programed semi-structured documents. Experimental results and evaluation measurements showed 100% average precision with 98.5236% average recall for 50 test queries, while the average response time was 00.46.74.78 milliseconds with 18.8 MB memory space for document indexing. The proposed work outperforms all the literature, comparatively, representing a remarkable leap in the studied field.