{"title":"基于llm的HTTPS网络安全意识评估:来自摩洛哥网络用户和网站管理员的数据集","authors":"Abdelhadi Zineddine , Abdeslam Rehaimi , Mohamed Zaoui , Yousra Belfaik , Yassine Sadqi , Said Safi","doi":"10.1016/j.dib.2025.112024","DOIUrl":null,"url":null,"abstract":"<div><div>Cybersecurity awareness plays a fundamental role in protecting digital communications, particularly in the deployment and use of the HTTPS protocol. While previous studies have explored website security practices, there is a lack of available datasets that empirically assess both awareness levels and implementation behaviors of web-users and website administrators. This dataset addresses this gap by analyzing cybersecurity awareness and HTTPS-related behaviors of 440 Moroccan voluntary participants, including web users and webmasters. Data was collected via a structured Google Forms survey, disseminated through web development and cybersecurity communities on online platforms such as Facebook, WhatsApp and LinkedIn.</div><div>The responses collected from multiple-choice questions (MCQs) and free-text entries (categorized using the GPT-4o large language model (LLM)) were pre-processed and score-encoded according to a predefined mapping scheme. Participants’ awareness levels were classified as Low, Moderate, or High on total scores. To identify behavioral patterns, the unsupervised KMeans clustering algorithm was applied separately to user and webmaster groups. Principal Component Analysis (PCA) and LLM-based interpretation provided insights into awareness profiles and cybersecurity risk behaviors.</div><div>The dataset includes raw survey responses, score-encoded data, clustering outputs, and LLM-generated awareness assessment reports. It serves both as supplementary material for a novel hybrid cybersecurity assessment methodology for HTTPS deployment presented in [1], and as a standalone resource for researchers and practitioners examining HTTPS usage, certificate management, and behavioral risk profiling. This dataset is a valuable asset for empirical research and practical improvements in cybersecurity awareness within role-based and regional web ecosystems.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"62 ","pages":"Article 112024"},"PeriodicalIF":1.4000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLM-based assessment of HTTPS cybersecurity awareness: Dataset from moroccan web users and webmasters\",\"authors\":\"Abdelhadi Zineddine , Abdeslam Rehaimi , Mohamed Zaoui , Yousra Belfaik , Yassine Sadqi , Said Safi\",\"doi\":\"10.1016/j.dib.2025.112024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Cybersecurity awareness plays a fundamental role in protecting digital communications, particularly in the deployment and use of the HTTPS protocol. While previous studies have explored website security practices, there is a lack of available datasets that empirically assess both awareness levels and implementation behaviors of web-users and website administrators. This dataset addresses this gap by analyzing cybersecurity awareness and HTTPS-related behaviors of 440 Moroccan voluntary participants, including web users and webmasters. Data was collected via a structured Google Forms survey, disseminated through web development and cybersecurity communities on online platforms such as Facebook, WhatsApp and LinkedIn.</div><div>The responses collected from multiple-choice questions (MCQs) and free-text entries (categorized using the GPT-4o large language model (LLM)) were pre-processed and score-encoded according to a predefined mapping scheme. Participants’ awareness levels were classified as Low, Moderate, or High on total scores. To identify behavioral patterns, the unsupervised KMeans clustering algorithm was applied separately to user and webmaster groups. Principal Component Analysis (PCA) and LLM-based interpretation provided insights into awareness profiles and cybersecurity risk behaviors.</div><div>The dataset includes raw survey responses, score-encoded data, clustering outputs, and LLM-generated awareness assessment reports. It serves both as supplementary material for a novel hybrid cybersecurity assessment methodology for HTTPS deployment presented in [1], and as a standalone resource for researchers and practitioners examining HTTPS usage, certificate management, and behavioral risk profiling. This dataset is a valuable asset for empirical research and practical improvements in cybersecurity awareness within role-based and regional web ecosystems.</div></div>\",\"PeriodicalId\":10973,\"journal\":{\"name\":\"Data in Brief\",\"volume\":\"62 \",\"pages\":\"Article 112024\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data in Brief\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2352340925007462\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340925007462","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
LLM-based assessment of HTTPS cybersecurity awareness: Dataset from moroccan web users and webmasters
Cybersecurity awareness plays a fundamental role in protecting digital communications, particularly in the deployment and use of the HTTPS protocol. While previous studies have explored website security practices, there is a lack of available datasets that empirically assess both awareness levels and implementation behaviors of web-users and website administrators. This dataset addresses this gap by analyzing cybersecurity awareness and HTTPS-related behaviors of 440 Moroccan voluntary participants, including web users and webmasters. Data was collected via a structured Google Forms survey, disseminated through web development and cybersecurity communities on online platforms such as Facebook, WhatsApp and LinkedIn.
The responses collected from multiple-choice questions (MCQs) and free-text entries (categorized using the GPT-4o large language model (LLM)) were pre-processed and score-encoded according to a predefined mapping scheme. Participants’ awareness levels were classified as Low, Moderate, or High on total scores. To identify behavioral patterns, the unsupervised KMeans clustering algorithm was applied separately to user and webmaster groups. Principal Component Analysis (PCA) and LLM-based interpretation provided insights into awareness profiles and cybersecurity risk behaviors.
The dataset includes raw survey responses, score-encoded data, clustering outputs, and LLM-generated awareness assessment reports. It serves both as supplementary material for a novel hybrid cybersecurity assessment methodology for HTTPS deployment presented in [1], and as a standalone resource for researchers and practitioners examining HTTPS usage, certificate management, and behavioral risk profiling. This dataset is a valuable asset for empirical research and practical improvements in cybersecurity awareness within role-based and regional web ecosystems.
期刊介绍:
Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.