{"title":"A New Filter Approach Based on Effective Ranges for Classification of Gene Expression Data.","authors":"Derya Turfan, Bulent Altunkaynak, Özgür Yeniay","doi":"10.1089/big.2022.0086","DOIUrl":null,"url":null,"abstract":"<p><p>Over the years, many studies have been carried out to reduce and eliminate the effects of diseases on human health. Gene expression data sets play a critical role in diagnosing and treating diseases. These data sets consist of thousands of genes and a small number of sample sizes. This situation creates the curse of dimensionality and it becomes problematic to analyze such data sets. One of the most effective strategies to solve this problem is feature selection methods. Feature selection is a preprocessing step to improve classification performance by selecting the most relevant and informative features while increasing the accuracy of classification. In this article, we propose a new statistically based filter method for the feature selection approach named Effective Range-based Feature Selection Algorithm (FSAER). As an extension of the previous Effective Range based Gene Selection (ERGS) and Improved Feature Selection based on Effective Range (IFSER) algorithms, our novel method includes the advantages of both methods while taking into account the disjoint area. To illustrate the efficacy of the proposed algorithm, the experiments have been conducted on six benchmark gene expression data sets. The results of the FSAER and the other filter methods have been compared in terms of classification accuracies to demonstrate the effectiveness of the proposed method. For classification methods, support vector machines, naive Bayes classifier, and k-nearest neighbor algorithms have been used.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"312-330"},"PeriodicalIF":2.6000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1089/big.2022.0086","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/9/4 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Over the years, many studies have been carried out to reduce and eliminate the effects of diseases on human health. Gene expression data sets play a critical role in diagnosing and treating diseases. These data sets consist of thousands of genes and a small number of sample sizes. This situation creates the curse of dimensionality and it becomes problematic to analyze such data sets. One of the most effective strategies to solve this problem is feature selection methods. Feature selection is a preprocessing step to improve classification performance by selecting the most relevant and informative features while increasing the accuracy of classification. In this article, we propose a new statistically based filter method for the feature selection approach named Effective Range-based Feature Selection Algorithm (FSAER). As an extension of the previous Effective Range based Gene Selection (ERGS) and Improved Feature Selection based on Effective Range (IFSER) algorithms, our novel method includes the advantages of both methods while taking into account the disjoint area. To illustrate the efficacy of the proposed algorithm, the experiments have been conducted on six benchmark gene expression data sets. The results of the FSAER and the other filter methods have been compared in terms of classification accuracies to demonstrate the effectiveness of the proposed method. For classification methods, support vector machines, naive Bayes classifier, and k-nearest neighbor algorithms have been used.
Big DataCOMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-COMPUTER SCIENCE, THEORY & METHODS
CiteScore
9.10
自引率
2.20%
发文量
60
期刊介绍:
Big Data is the leading peer-reviewed journal covering the challenges and opportunities in collecting, analyzing, and disseminating vast amounts of data. The Journal addresses questions surrounding this powerful and growing field of data science and facilitates the efforts of researchers, business managers, analysts, developers, data scientists, physicists, statisticians, infrastructure developers, academics, and policymakers to improve operations, profitability, and communications within their businesses and institutions.
Spanning a broad array of disciplines focusing on novel big data technologies, policies, and innovations, the Journal brings together the community to address current challenges and enforce effective efforts to organize, store, disseminate, protect, manipulate, and, most importantly, find the most effective strategies to make this incredible amount of information work to benefit society, industry, academia, and government.
Big Data coverage includes:
Big data industry standards,
New technologies being developed specifically for big data,
Data acquisition, cleaning, distribution, and best practices,
Data protection, privacy, and policy,
Business interests from research to product,
The changing role of business intelligence,
Visualization and design principles of big data infrastructures,
Physical interfaces and robotics,
Social networking advantages for Facebook, Twitter, Amazon, Google, etc,
Opportunities around big data and how companies can harness it to their advantage.