{"title":"随机对照试验采用的信息提取方法的范围综述。","authors":"Azadeh Aletaha, Leila Nemati-Anaraki, AbbasAli Keshtkar, Shahram Sedghi, Abdalsamad Keramatfar, Anna Korolyova","doi":"10.47176/mjiri.37.95","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Randomized controlled trials (RCTs) provide the strongest evidence for therapeutic interventions and their effects on groups of subjects. However, the large amount of unstructured information in these trials makes it challenging and time-consuming to make decisions and identify important concepts and valid evidence. This study aims to explore methods for automating or semi-automating information extraction from reports of RCT studies.</p><p><strong>Methods: </strong>We conducted a systematic search of PubMed, ACM Digital Library, and Web of Science to identify relevant articles published between January 1, 2010, and 2022. We focused on published Natural Language Processing (NLP), machine learning, and deep learning methods that automate or semi-automate key elements of information extraction in the context of RCTs.</p><p><strong>Results: </strong>A total of 26 publications were included, which discussed the automatic extraction of key characteristics of RCTs using various PICO frameworks (PIBOSO and PECODR). Among these publications, 14 (53.8%) extracted key characteristics based on PICO, PIBOSO, and PECODR, while 12 (46.1%) discussed information extraction methods in RCT studies. Common approaches mentioned included word/phrase matching, machine learning algorithms such as binary classification using the Naïve Bayes algorithm and powerful BERT network for feature extraction, support vector machine for data classification, conditional random field, non-machine-dependent automation, and machine learning or deep learning approaches.</p><p><strong>Conclusion: </strong>The lack of publicly available software and limited access to existing software makes it difficult to determine the most powerful information extraction system. However, deep learning models like Transformers and BERT language models have shown better performance in natural language processing.</p>","PeriodicalId":18361,"journal":{"name":"Medical Journal of the Islamic Republic of Iran","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10657257/pdf/","citationCount":"0","resultStr":"{\"title\":\"A Scoping Review of Adopted Information Extraction Methods for RCTs.\",\"authors\":\"Azadeh Aletaha, Leila Nemati-Anaraki, AbbasAli Keshtkar, Shahram Sedghi, Abdalsamad Keramatfar, Anna Korolyova\",\"doi\":\"10.47176/mjiri.37.95\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Randomized controlled trials (RCTs) provide the strongest evidence for therapeutic interventions and their effects on groups of subjects. However, the large amount of unstructured information in these trials makes it challenging and time-consuming to make decisions and identify important concepts and valid evidence. This study aims to explore methods for automating or semi-automating information extraction from reports of RCT studies.</p><p><strong>Methods: </strong>We conducted a systematic search of PubMed, ACM Digital Library, and Web of Science to identify relevant articles published between January 1, 2010, and 2022. We focused on published Natural Language Processing (NLP), machine learning, and deep learning methods that automate or semi-automate key elements of information extraction in the context of RCTs.</p><p><strong>Results: </strong>A total of 26 publications were included, which discussed the automatic extraction of key characteristics of RCTs using various PICO frameworks (PIBOSO and PECODR). Among these publications, 14 (53.8%) extracted key characteristics based on PICO, PIBOSO, and PECODR, while 12 (46.1%) discussed information extraction methods in RCT studies. Common approaches mentioned included word/phrase matching, machine learning algorithms such as binary classification using the Naïve Bayes algorithm and powerful BERT network for feature extraction, support vector machine for data classification, conditional random field, non-machine-dependent automation, and machine learning or deep learning approaches.</p><p><strong>Conclusion: </strong>The lack of publicly available software and limited access to existing software makes it difficult to determine the most powerful information extraction system. However, deep learning models like Transformers and BERT language models have shown better performance in natural language processing.</p>\",\"PeriodicalId\":18361,\"journal\":{\"name\":\"Medical Journal of the Islamic Republic of Iran\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10657257/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical Journal of the Islamic Republic of Iran\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.47176/mjiri.37.95\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Journal of the Islamic Republic of Iran","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47176/mjiri.37.95","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
摘要
背景:随机对照试验(RCTs)为治疗干预及其对受试者群体的影响提供了最有力的证据。然而,这些试验中大量的非结构化信息使得决策和识别重要概念和有效证据变得具有挑战性和耗时。本研究旨在探索从RCT研究报告中自动化或半自动化信息提取的方法。方法:我们对PubMed、ACM数字图书馆和Web of Science进行了系统检索,以确定2010年1月1日至2022年期间发表的相关文章。我们专注于已发表的自然语言处理(NLP)、机器学习和深度学习方法,这些方法可以在随机对照试验的背景下自动化或半自动化信息提取的关键要素。结果:共纳入26篇文献,讨论了使用各种PICO框架(PIBOSO和PECODR)自动提取rct关键特征。其中14篇(53.8%)基于PICO、PIBOSO和PECODR提取关键特征,12篇(46.1%)讨论了RCT研究中的信息提取方法。提到的常用方法包括词/短语匹配、机器学习算法(如使用Naïve贝叶斯算法和强大的BERT网络进行特征提取的二进制分类)、数据分类的支持向量机、条件随机场、非机器依赖的自动化以及机器学习或深度学习方法。结论:由于缺乏公开可用的软件和对现有软件的有限访问,难以确定最强大的信息提取系统。然而,像变形金刚和BERT语言模型这样的深度学习模型在自然语言处理中表现出更好的性能。
A Scoping Review of Adopted Information Extraction Methods for RCTs.
Background: Randomized controlled trials (RCTs) provide the strongest evidence for therapeutic interventions and their effects on groups of subjects. However, the large amount of unstructured information in these trials makes it challenging and time-consuming to make decisions and identify important concepts and valid evidence. This study aims to explore methods for automating or semi-automating information extraction from reports of RCT studies.
Methods: We conducted a systematic search of PubMed, ACM Digital Library, and Web of Science to identify relevant articles published between January 1, 2010, and 2022. We focused on published Natural Language Processing (NLP), machine learning, and deep learning methods that automate or semi-automate key elements of information extraction in the context of RCTs.
Results: A total of 26 publications were included, which discussed the automatic extraction of key characteristics of RCTs using various PICO frameworks (PIBOSO and PECODR). Among these publications, 14 (53.8%) extracted key characteristics based on PICO, PIBOSO, and PECODR, while 12 (46.1%) discussed information extraction methods in RCT studies. Common approaches mentioned included word/phrase matching, machine learning algorithms such as binary classification using the Naïve Bayes algorithm and powerful BERT network for feature extraction, support vector machine for data classification, conditional random field, non-machine-dependent automation, and machine learning or deep learning approaches.
Conclusion: The lack of publicly available software and limited access to existing software makes it difficult to determine the most powerful information extraction system. However, deep learning models like Transformers and BERT language models have shown better performance in natural language processing.