{"title":"使用机器学习和自然语言处理技术对用户创建的密码进行分类","authors":"Binh Le Thanh Thai, Tsubasa Takii, Hidema Tanaka","doi":"10.1016/j.iot.2025.101854","DOIUrl":null,"url":null,"abstract":"<div><div>Passwords are the dominant authentication method. However, evaluating the strength of user-created passwords remains a significant challenge due to the influence of various external factors, such as language, culture, and keyboard layout. In this paper, we address the problem of classifying user-created passwords into predefined groups, rather than directly evaluating their strength. First, we assess the performance of classifiers utilizing eight machine learning (ML) algorithms and four Natural Language Processing techniques to identify the optimal combination of ML algorithms and feature extraction methods. Through this experiment, we determine that the classifier combining Bag-of-Words and Logistic Regression is the most effective approach for classifying user-created passwords. Subsequently, we propose a hierarchical classification model to enhance the performance of this classifier. Experimental results demonstrate that the proposed model achieves accuracy of 97.81 % and recall of 99.66 % for weak passwords.</div></div>","PeriodicalId":29968,"journal":{"name":"Internet of Things","volume":"36 ","pages":"Article 101854"},"PeriodicalIF":7.6000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classifying user-created passwords using machine learning and natural language processing techniques\",\"authors\":\"Binh Le Thanh Thai, Tsubasa Takii, Hidema Tanaka\",\"doi\":\"10.1016/j.iot.2025.101854\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Passwords are the dominant authentication method. However, evaluating the strength of user-created passwords remains a significant challenge due to the influence of various external factors, such as language, culture, and keyboard layout. In this paper, we address the problem of classifying user-created passwords into predefined groups, rather than directly evaluating their strength. First, we assess the performance of classifiers utilizing eight machine learning (ML) algorithms and four Natural Language Processing techniques to identify the optimal combination of ML algorithms and feature extraction methods. Through this experiment, we determine that the classifier combining Bag-of-Words and Logistic Regression is the most effective approach for classifying user-created passwords. Subsequently, we propose a hierarchical classification model to enhance the performance of this classifier. Experimental results demonstrate that the proposed model achieves accuracy of 97.81 % and recall of 99.66 % for weak passwords.</div></div>\",\"PeriodicalId\":29968,\"journal\":{\"name\":\"Internet of Things\",\"volume\":\"36 \",\"pages\":\"Article 101854\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2026-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Internet of Things\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2542660525003683\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/12/13 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet of Things","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2542660525003683","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/12/13 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Classifying user-created passwords using machine learning and natural language processing techniques
Passwords are the dominant authentication method. However, evaluating the strength of user-created passwords remains a significant challenge due to the influence of various external factors, such as language, culture, and keyboard layout. In this paper, we address the problem of classifying user-created passwords into predefined groups, rather than directly evaluating their strength. First, we assess the performance of classifiers utilizing eight machine learning (ML) algorithms and four Natural Language Processing techniques to identify the optimal combination of ML algorithms and feature extraction methods. Through this experiment, we determine that the classifier combining Bag-of-Words and Logistic Regression is the most effective approach for classifying user-created passwords. Subsequently, we propose a hierarchical classification model to enhance the performance of this classifier. Experimental results demonstrate that the proposed model achieves accuracy of 97.81 % and recall of 99.66 % for weak passwords.
期刊介绍:
Internet of Things; Engineering Cyber Physical Human Systems is a comprehensive journal encouraging cross collaboration between researchers, engineers and practitioners in the field of IoT & Cyber Physical Human Systems. The journal offers a unique platform to exchange scientific information on the entire breadth of technology, science, and societal applications of the IoT.
The journal will place a high priority on timely publication, and provide a home for high quality.
Furthermore, IOT is interested in publishing topical Special Issues on any aspect of IOT.