Ransom J Wyse, David C Samuels, Sandra Sanchez-Roige, Lori Schirle, Bethany A Rhoten, Seo Yoon Lee, Alvin D Jeffery
{"title":"物质使用障碍信息提取的自然语言处理:系统文献综述。","authors":"Ransom J Wyse, David C Samuels, Sandra Sanchez-Roige, Lori Schirle, Bethany A Rhoten, Seo Yoon Lee, Alvin D Jeffery","doi":"10.1007/s40429-026-00733-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose of review: </strong>To examine the use of natural language processing (NLP) for substance use disorder (SUD) information extraction.</p><p><strong>Recent findings: </strong>623 studies were reviewed, of which 35 met inclusion criteria. 1 paper (2.9%) was alcohol-related, 12 (34.3%) were opioid-related, 6 (17.1%) were tobacco-related, and 16 (45.7%) included multiple SUDs. Of the three types of NLP categorized for this analysis, 65.7% followed a Rule-Based approach, 37.1% followed a Machine-Learning approach, and 11.4% followed a Deep-Learning approach. NLP methods were categorized into three groups, with 43% as \"Most common use\" (e.g., concept extraction), 20-35% as \"Regular use\" (e.g., regular expressions), and < 10% as \"Rare use\" (e.g., sentiment analysis). Various software applications were used in each included paper, with Python leading (10 papers), followed by cTAKES (9 papers), NegEx (6 papers), R (4 papers) and others. Multiple evaluation metrics were used in each included paper; Multiple SUDs (6 papers) utilized a comparison of F1 scores and ROC AUC, followed by Tobacco (4 papers), Opioids (3 papers), and Alcohol (1 paper), each with acceptable-to-outstanding ROC AUC scores ( > = 0.7) and good-to-excellent F1 scores ( > = 0.7).</p><p><strong>Summary: </strong>Most papers included in this systematic review encompassed multiple SUDs following Rule-Based approaches, \"Most common use\" NLP methods (e.g. concept extraction), and familiar software applications (e.g. Python). Evaluation metrics for SUD papers utilizing NLP included common performance metrics, with ROC AUC and F1 scores achieving acceptable-to-outstanding discrimination between classes and good-to-excellent balance between precision and recall, respectively. The future direction of NLP for SUD information extraction could make use of Machine- or Deep-Learning approaches, advanced methods including Regular expressions or Sentiment analysis, and/or advanced software packages designed specifically for NLP endeavors, to better inform public health research and clinical decision making.</p>","PeriodicalId":52300,"journal":{"name":"Current Addiction Reports","volume":"13 1","pages":"34"},"PeriodicalIF":4.6000,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13070045/pdf/","citationCount":"0","resultStr":"{\"title\":\"Natural Language Processing for Substance Use Disorder Information Extraction: A Systematic Literature Review.\",\"authors\":\"Ransom J Wyse, David C Samuels, Sandra Sanchez-Roige, Lori Schirle, Bethany A Rhoten, Seo Yoon Lee, Alvin D Jeffery\",\"doi\":\"10.1007/s40429-026-00733-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose of review: </strong>To examine the use of natural language processing (NLP) for substance use disorder (SUD) information extraction.</p><p><strong>Recent findings: </strong>623 studies were reviewed, of which 35 met inclusion criteria. 1 paper (2.9%) was alcohol-related, 12 (34.3%) were opioid-related, 6 (17.1%) were tobacco-related, and 16 (45.7%) included multiple SUDs. Of the three types of NLP categorized for this analysis, 65.7% followed a Rule-Based approach, 37.1% followed a Machine-Learning approach, and 11.4% followed a Deep-Learning approach. NLP methods were categorized into three groups, with 43% as \\\"Most common use\\\" (e.g., concept extraction), 20-35% as \\\"Regular use\\\" (e.g., regular expressions), and < 10% as \\\"Rare use\\\" (e.g., sentiment analysis). Various software applications were used in each included paper, with Python leading (10 papers), followed by cTAKES (9 papers), NegEx (6 papers), R (4 papers) and others. Multiple evaluation metrics were used in each included paper; Multiple SUDs (6 papers) utilized a comparison of F1 scores and ROC AUC, followed by Tobacco (4 papers), Opioids (3 papers), and Alcohol (1 paper), each with acceptable-to-outstanding ROC AUC scores ( > = 0.7) and good-to-excellent F1 scores ( > = 0.7).</p><p><strong>Summary: </strong>Most papers included in this systematic review encompassed multiple SUDs following Rule-Based approaches, \\\"Most common use\\\" NLP methods (e.g. concept extraction), and familiar software applications (e.g. Python). Evaluation metrics for SUD papers utilizing NLP included common performance metrics, with ROC AUC and F1 scores achieving acceptable-to-outstanding discrimination between classes and good-to-excellent balance between precision and recall, respectively. The future direction of NLP for SUD information extraction could make use of Machine- or Deep-Learning approaches, advanced methods including Regular expressions or Sentiment analysis, and/or advanced software packages designed specifically for NLP endeavors, to better inform public health research and clinical decision making.</p>\",\"PeriodicalId\":52300,\"journal\":{\"name\":\"Current Addiction Reports\",\"volume\":\"13 1\",\"pages\":\"34\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2026-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13070045/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current Addiction Reports\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s40429-026-00733-3\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2026/4/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"SUBSTANCE ABUSE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Addiction Reports","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s40429-026-00733-3","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/4/11 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"SUBSTANCE ABUSE","Score":null,"Total":0}
Natural Language Processing for Substance Use Disorder Information Extraction: A Systematic Literature Review.
Purpose of review: To examine the use of natural language processing (NLP) for substance use disorder (SUD) information extraction.
Recent findings: 623 studies were reviewed, of which 35 met inclusion criteria. 1 paper (2.9%) was alcohol-related, 12 (34.3%) were opioid-related, 6 (17.1%) were tobacco-related, and 16 (45.7%) included multiple SUDs. Of the three types of NLP categorized for this analysis, 65.7% followed a Rule-Based approach, 37.1% followed a Machine-Learning approach, and 11.4% followed a Deep-Learning approach. NLP methods were categorized into three groups, with 43% as "Most common use" (e.g., concept extraction), 20-35% as "Regular use" (e.g., regular expressions), and < 10% as "Rare use" (e.g., sentiment analysis). Various software applications were used in each included paper, with Python leading (10 papers), followed by cTAKES (9 papers), NegEx (6 papers), R (4 papers) and others. Multiple evaluation metrics were used in each included paper; Multiple SUDs (6 papers) utilized a comparison of F1 scores and ROC AUC, followed by Tobacco (4 papers), Opioids (3 papers), and Alcohol (1 paper), each with acceptable-to-outstanding ROC AUC scores ( > = 0.7) and good-to-excellent F1 scores ( > = 0.7).
Summary: Most papers included in this systematic review encompassed multiple SUDs following Rule-Based approaches, "Most common use" NLP methods (e.g. concept extraction), and familiar software applications (e.g. Python). Evaluation metrics for SUD papers utilizing NLP included common performance metrics, with ROC AUC and F1 scores achieving acceptable-to-outstanding discrimination between classes and good-to-excellent balance between precision and recall, respectively. The future direction of NLP for SUD information extraction could make use of Machine- or Deep-Learning approaches, advanced methods including Regular expressions or Sentiment analysis, and/or advanced software packages designed specifically for NLP endeavors, to better inform public health research and clinical decision making.
期刊介绍:
This journal focuses on the prevention, assessment and diagnosis, and treatment of addiction. Designed for physicians and other mental health professionals who need to keep up-to-date with the latest research, Current Addiction Reports offers expert reviews on the most recent and important research in addiction. We accomplish this by appointing leaders in the field to serve as Section Editors in key subject areas and disciplines, such asAlcoholTobaccoStimulants, cannabis, and club drugsBehavioral addictionsGender disparities in addictionComorbid psychiatric disorders and addictionSubstance abuse disorders and HIVSection Editors, in turn, select the most pressing topics as well as experts to evaluate the latest research, report on any controversial discoveries or hypotheses of interest, and ultimately bring readers up-to-date on the topic. Articles represent interdisciplinary endeavors with research from fields such as psychiatry, psychology, pharmacology, epidemiology, and neuroscience.Additionally, an international Editorial Board—representing a range of disciplines within addiction medicine—ensures that the journal content includes current, emerging research and suggests articles of special interest to their country or region.