基于 k-means 和 SVM 算法的混合方法，用于选择适当的部门风险评估方法

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science Pub Date : 2024-07-26 DOI:10.7717/peerj-cs.2198

Fatih Topaloglu

{"title":"基于 k-means 和 SVM 算法的混合方法，用于选择适当的部门风险评估方法","authors":"Fatih Topaloglu","doi":"10.7717/peerj-cs.2198","DOIUrl":null,"url":null,"abstract":"Every work environment contains different types of risks and interactions between risks. Therefore, the method to be used when making a risk assessment is very important. When determining which risk assessment method (RAM) to use, there are many factors such as the types of risks in the work environment, the interactions of these risks with each other, and their distance from the employees. Although there are many RAMs available, there is no RAM that will suit all workplaces and which method to choose is the biggest question. There is no internationally accepted scale or trend on this subject. In the study, 26 sectors, 10 different RAMs and 10 criteria were determined. A hybrid approach has been designed to determine the most suitable RAMs for sectors by using k-means clustering and support vector machine (SVM) classification algorithms, which are machine learning (ML) algorithms. First, the data set was divided into subsets with the k-means algorithm. Then, the SVM algorithm was run on all subsets with different characteristics. Finally, the results of all subsets were combined to obtain the result of the entire dataset. Thus, instead of the threshold value determined for a single and large cluster affecting the entire cluster and being made mandatory for all of them, a flexible structure was created by determining separate threshold values for each sub-cluster according to their characteristics. In this way, machine support was provided by selecting the most suitable RAMs for the sectors and eliminating the administrative and software problems in the selection phase from the manpower. The first comparison result of the proposed method was found to be the hybrid method: 96.63%, k-means: 90.63 and SVM: 94.68%. In the second comparison made with five different ML algorithms, the results of the artificial neural networks (ANN): 87.44%, naive bayes (NB): 91.29%, decision trees (DT): 89.25%, random forest (RF): 81.23% and k-nearest neighbours (KNN): 85.43% were found.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"67 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A hybrid approach based on k-means and SVM algorithms in selection of appropriate risk assessment methods for sectors\",\"authors\":\"Fatih Topaloglu\",\"doi\":\"10.7717/peerj-cs.2198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Every work environment contains different types of risks and interactions between risks. Therefore, the method to be used when making a risk assessment is very important. When determining which risk assessment method (RAM) to use, there are many factors such as the types of risks in the work environment, the interactions of these risks with each other, and their distance from the employees. Although there are many RAMs available, there is no RAM that will suit all workplaces and which method to choose is the biggest question. There is no internationally accepted scale or trend on this subject. In the study, 26 sectors, 10 different RAMs and 10 criteria were determined. A hybrid approach has been designed to determine the most suitable RAMs for sectors by using k-means clustering and support vector machine (SVM) classification algorithms, which are machine learning (ML) algorithms. First, the data set was divided into subsets with the k-means algorithm. Then, the SVM algorithm was run on all subsets with different characteristics. Finally, the results of all subsets were combined to obtain the result of the entire dataset. Thus, instead of the threshold value determined for a single and large cluster affecting the entire cluster and being made mandatory for all of them, a flexible structure was created by determining separate threshold values for each sub-cluster according to their characteristics. In this way, machine support was provided by selecting the most suitable RAMs for the sectors and eliminating the administrative and software problems in the selection phase from the manpower. The first comparison result of the proposed method was found to be the hybrid method: 96.63%, k-means: 90.63 and SVM: 94.68%. In the second comparison made with five different ML algorithms, the results of the artificial neural networks (ANN): 87.44%, naive bayes (NB): 91.29%, decision trees (DT): 89.25%, random forest (RF): 81.23% and k-nearest neighbours (KNN): 85.43% were found.\",\"PeriodicalId\":54224,\"journal\":{\"name\":\"PeerJ Computer Science\",\"volume\":\"67 1\",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PeerJ Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.7717/peerj-cs.2198\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2198","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

每个工作环境都包含不同类型的风险和风险之间的相互作用。因此，进行风险评估时使用的方法非常重要。在确定使用哪种风险评估方法（RAM）时，需要考虑很多因素，如工作环境中的风险类型、这些风险之间的相互作用以及它们与员工之间的距离。虽然有许多 RAM 可供使用，但没有一种 RAM 适合所有工作场所，选择哪种方法是最大的问题。在这个问题上，还没有一个国际公认的尺度或趋势。在这项研究中，确定了 26 个部门、10 种不同的记录和档案管理方法以及 10 项标准。我们设计了一种混合方法，通过使用 k-means 聚类和支持向量机（SVM）分类算法，即机器学习（ML）算法，来确定各部门最合适的记录和档案管理方法。首先，使用 k-means 算法将数据集划分为若干子集。然后，在所有具有不同特征的子集中运行 SVM 算法。最后，合并所有子集的结果，得出整个数据集的结果。这样，就不再是为单一的大集群确定的阈值会影响整个集群，也不再是所有集群都必须使用的阈值，而是根据每个子集群的特点为其确定单独的阈值，从而创建了一个灵活的结构。这样，通过为各部门选择最合适的记录和档案管理员，提供了机器支持，并从人力方面消除了选择阶段的行政和软件问题。建议方法的第一次比较结果是：混合方法：96.63%，k-means：90.63%，SVM：94.68%。在与五种不同 ML 算法的第二次比较中，人工神经网络（ANN）的结果是：87.44%；奈夫贝叶斯（NB）的结果是：91.29%；决策树（DT）的结果是：89.25%；随机森林（RF）的结果是：94.68%：89.25%，随机森林 (RF)81.23% 和 k-nearest neighbours (KNN): 85.43%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A hybrid approach based on k-means and SVM algorithms in selection of appropriate risk assessment methods for sectors

Every work environment contains different types of risks and interactions between risks. Therefore, the method to be used when making a risk assessment is very important. When determining which risk assessment method (RAM) to use, there are many factors such as the types of risks in the work environment, the interactions of these risks with each other, and their distance from the employees. Although there are many RAMs available, there is no RAM that will suit all workplaces and which method to choose is the biggest question. There is no internationally accepted scale or trend on this subject. In the study, 26 sectors, 10 different RAMs and 10 criteria were determined. A hybrid approach has been designed to determine the most suitable RAMs for sectors by using k-means clustering and support vector machine (SVM) classification algorithms, which are machine learning (ML) algorithms. First, the data set was divided into subsets with the k-means algorithm. Then, the SVM algorithm was run on all subsets with different characteristics. Finally, the results of all subsets were combined to obtain the result of the entire dataset. Thus, instead of the threshold value determined for a single and large cluster affecting the entire cluster and being made mandatory for all of them, a flexible structure was created by determining separate threshold values for each sub-cluster according to their characteristics. In this way, machine support was provided by selecting the most suitable RAMs for the sectors and eliminating the administrative and software problems in the selection phase from the manpower. The first comparison result of the proposed method was found to be the hybrid method: 96.63%, k-means: 90.63 and SVM: 94.68%. In the second comparison made with five different ML algorithms, the results of the artificial neural networks (ANN): 87.44%, naive bayes (NB): 91.29%, decision trees (DT): 89.25%, random forest (RF): 81.23% and k-nearest neighbours (KNN): 85.43% were found.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PeerJ Computer Science Computer Science-General Computer Science

CiteScore

6.10

自引率

5.30%

发文量

332

审稿时长

10 weeks

期刊介绍： PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.