基于llm的HTTPS网络安全意识评估：来自摩洛哥网络用户和网站管理员的数据集

IF 1.4 Q3 MULTIDISCIPLINARY SCIENCES

Data in Brief Pub Date : 2025-09-04 DOI:10.1016/j.dib.2025.112024

Abdelhadi Zineddine , Abdeslam Rehaimi , Mohamed Zaoui , Yousra Belfaik , Yassine Sadqi , Said Safi

{"title":"基于llm的HTTPS网络安全意识评估：来自摩洛哥网络用户和网站管理员的数据集","authors":"Abdelhadi Zineddine , Abdeslam Rehaimi , Mohamed Zaoui , Yousra Belfaik , Yassine Sadqi , Said Safi","doi":"10.1016/j.dib.2025.112024","DOIUrl":null,"url":null,"abstract":"<div><div>Cybersecurity awareness plays a fundamental role in protecting digital communications, particularly in the deployment and use of the HTTPS protocol. While previous studies have explored website security practices, there is a lack of available datasets that empirically assess both awareness levels and implementation behaviors of web-users and website administrators. This dataset addresses this gap by analyzing cybersecurity awareness and HTTPS-related behaviors of 440 Moroccan voluntary participants, including web users and webmasters. Data was collected via a structured Google Forms survey, disseminated through web development and cybersecurity communities on online platforms such as Facebook, WhatsApp and LinkedIn.</div><div>The responses collected from multiple-choice questions (MCQs) and free-text entries (categorized using the GPT-4o large language model (LLM)) were pre-processed and score-encoded according to a predefined mapping scheme. Participants’ awareness levels were classified as Low, Moderate, or High on total scores. To identify behavioral patterns, the unsupervised KMeans clustering algorithm was applied separately to user and webmaster groups. Principal Component Analysis (PCA) and LLM-based interpretation provided insights into awareness profiles and cybersecurity risk behaviors.</div><div>The dataset includes raw survey responses, score-encoded data, clustering outputs, and LLM-generated awareness assessment reports. It serves both as supplementary material for a novel hybrid cybersecurity assessment methodology for HTTPS deployment presented in [1], and as a standalone resource for researchers and practitioners examining HTTPS usage, certificate management, and behavioral risk profiling. This dataset is a valuable asset for empirical research and practical improvements in cybersecurity awareness within role-based and regional web ecosystems.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 112024"},"PeriodicalIF":1.4000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLM-based assessment of HTTPS cybersecurity awareness: Dataset from moroccan web users and webmasters\",\"authors\":\"Abdelhadi Zineddine , Abdeslam Rehaimi , Mohamed Zaoui , Yousra Belfaik , Yassine Sadqi , Said Safi\",\"doi\":\"10.1016/j.dib.2025.112024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Cybersecurity awareness plays a fundamental role in protecting digital communications, particularly in the deployment and use of the HTTPS protocol. While previous studies have explored website security practices, there is a lack of available datasets that empirically assess both awareness levels and implementation behaviors of web-users and website administrators. This dataset addresses this gap by analyzing cybersecurity awareness and HTTPS-related behaviors of 440 Moroccan voluntary participants, including web users and webmasters. Data was collected via a structured Google Forms survey, disseminated through web development and cybersecurity communities on online platforms such as Facebook, WhatsApp and LinkedIn.</div><div>The responses collected from multiple-choice questions (MCQs) and free-text entries (categorized using the GPT-4o large language model (LLM)) were pre-processed and score-encoded according to a predefined mapping scheme. Participants’ awareness levels were classified as Low, Moderate, or High on total scores. To identify behavioral patterns, the unsupervised KMeans clustering algorithm was applied separately to user and webmaster groups. Principal Component Analysis (PCA) and LLM-based interpretation provided insights into awareness profiles and cybersecurity risk behaviors.</div><div>The dataset includes raw survey responses, score-encoded data, clustering outputs, and LLM-generated awareness assessment reports. It serves both as supplementary material for a novel hybrid cybersecurity assessment methodology for HTTPS deployment presented in [1], and as a standalone resource for researchers and practitioners examining HTTPS usage, certificate management, and behavioral risk profiling. This dataset is a valuable asset for empirical research and practical improvements in cybersecurity awareness within role-based and regional web ecosystems.</div></div>\",\"PeriodicalId\":10973,\"journal\":{\"name\":\"Data in Brief\",\"volume\":\"62 \",\"pages\":\"Article 112024\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data in Brief\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2352340925007462\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340925007462","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

网络安全意识在保护数字通信方面起着至关重要的作用，特别是在部署和使用HTTPS协议方面。虽然以前的研究已经探索了网站安全实践，但缺乏可用的数据集来经验地评估网络用户和网站管理员的意识水平和实施行为。该数据集通过分析440名摩洛哥自愿参与者（包括网络用户和网站管理员）的网络安全意识和https相关行为来解决这一差距。数据是通过结构化谷歌Forms调查收集的，并通过Facebook、WhatsApp和LinkedIn等在线平台上的网络开发和网络安全社区传播。从选择题（mcq）和自由文本条目（使用gpt - 40大语言模型（LLM）分类）中收集的答案根据预定义的映射方案进行预处理和分数编码。参与者的意识水平按总分分为低、中、高三个等级。为了识别行为模式，将无监督KMeans聚类算法分别应用于用户组和网站管理员组。主成分分析（PCA）和基于法学硕士的解释提供了对意识概况和网络安全风险行为的见解。该数据集包括原始调查回复、分数编码数据、聚类输出和llm生成的意识评估报告。它既可以作为[1]中提出的用于HTTPS部署的新型混合网络安全评估方法的补充材料，也可以作为研究人员和从业人员检查HTTPS使用、证书管理和行为风险分析的独立资源。该数据集是基于角色和区域网络生态系统中网络安全意识的实证研究和实际改进的宝贵资产。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LLM-based assessment of HTTPS cybersecurity awareness: Dataset from moroccan web users and webmasters

Cybersecurity awareness plays a fundamental role in protecting digital communications, particularly in the deployment and use of the HTTPS protocol. While previous studies have explored website security practices, there is a lack of available datasets that empirically assess both awareness levels and implementation behaviors of web-users and website administrators. This dataset addresses this gap by analyzing cybersecurity awareness and HTTPS-related behaviors of 440 Moroccan voluntary participants, including web users and webmasters. Data was collected via a structured Google Forms survey, disseminated through web development and cybersecurity communities on online platforms such as Facebook, WhatsApp and LinkedIn.

The responses collected from multiple-choice questions (MCQs) and free-text entries (categorized using the GPT-4o large language model (LLM)) were pre-processed and score-encoded according to a predefined mapping scheme. Participants’ awareness levels were classified as Low, Moderate, or High on total scores. To identify behavioral patterns, the unsupervised KMeans clustering algorithm was applied separately to user and webmaster groups. Principal Component Analysis (PCA) and LLM-based interpretation provided insights into awareness profiles and cybersecurity risk behaviors.

The dataset includes raw survey responses, score-encoded data, clustering outputs, and LLM-generated awareness assessment reports. It serves both as supplementary material for a novel hybrid cybersecurity assessment methodology for HTTPS deployment presented in [1], and as a standalone resource for researchers and practitioners examining HTTPS usage, certificate management, and behavioral risk profiling. This dataset is a valuable asset for empirical research and practical improvements in cybersecurity awareness within role-based and regional web ecosystems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Data in Brief MULTIDISCIPLINARY SCIENCES-

CiteScore

3.10

自引率

0.00%

发文量

996

审稿时长

70 days

期刊介绍： Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.