{"title":"Research on a data mining algorithm based on BERTopic for medication rules in Traditional Chinese Medicine prescriptions","authors":"Hongchen Li, Xinyi Lu, Yujia Wu, Jie Luo","doi":"10.1002/med4.39","DOIUrl":null,"url":null,"abstract":"A data mining algorithm is proposed based on BERTopic to provide new insights into the analysis of medication rules in Traditional Chinese Medicine (TCM) prescriptions.Using the BERTopic algorithm, collected TCM prescriptions for corneal diseases are converted to embeddings through a transformer based on the Bidirectional Encoder Representations from Transformers pre‐trained model. Then, Uniform Manifold Approximation and Projection is applied to perform dimensionality reduction in prescription embeddings. Subsequently, Hierarchical Density‐Based Spatial Clustering of Applications with Noise is used for clustering. Finally, class‐based term frequency–inverse document frequency is used to generate several main drug combinations from the clustered results.The highest frequency of drugs used included Buddleja officinalis, Bidens pilosa, Angelica sinensis, Eriocaulon buergerianum, and Raw Rehmannia glutinosa. The most frequent drug combinations were “Eriocaulon buergerianum, Raw Rehmannia glutinosa, Prunella vulgaris, Notopterygium incisum” “Lycii Fructus, Bidens pilosa, Buddleja officinalis” and “Kochiae Fructus, Cortex Dictamni.”The proposed data mining algorithm based on BERTopic demonstrated promising outcomes in the analysis of TCM prescription medication rules. This method exhibited simplicity and efficiency, thereby offering a novel avenue for analysis.","PeriodicalId":502918,"journal":{"name":"Medicine Advances","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medicine Advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/med4.39","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A data mining algorithm is proposed based on BERTopic to provide new insights into the analysis of medication rules in Traditional Chinese Medicine (TCM) prescriptions.Using the BERTopic algorithm, collected TCM prescriptions for corneal diseases are converted to embeddings through a transformer based on the Bidirectional Encoder Representations from Transformers pre‐trained model. Then, Uniform Manifold Approximation and Projection is applied to perform dimensionality reduction in prescription embeddings. Subsequently, Hierarchical Density‐Based Spatial Clustering of Applications with Noise is used for clustering. Finally, class‐based term frequency–inverse document frequency is used to generate several main drug combinations from the clustered results.The highest frequency of drugs used included Buddleja officinalis, Bidens pilosa, Angelica sinensis, Eriocaulon buergerianum, and Raw Rehmannia glutinosa. The most frequent drug combinations were “Eriocaulon buergerianum, Raw Rehmannia glutinosa, Prunella vulgaris, Notopterygium incisum” “Lycii Fructus, Bidens pilosa, Buddleja officinalis” and “Kochiae Fructus, Cortex Dictamni.”The proposed data mining algorithm based on BERTopic demonstrated promising outcomes in the analysis of TCM prescription medication rules. This method exhibited simplicity and efficiency, thereby offering a novel avenue for analysis.