在监管文件中使用AI将自由文本分类为预定义的部分:药物标签文件的案例研究

IF 3.8 3区医学 Q2 CHEMISTRY, MEDICINAL

Chemical Research in Toxicology Pub Date : 2023-07-24 DOI:10.1021/acs.chemrestox.3c00028

Magnus Gray, Joshua Xu, Weida Tong and Leihong Wu*,

{"title":"在监管文件中使用AI将自由文本分类为预定义的部分:药物标签文件的案例研究","authors":"Magnus Gray, Joshua Xu, Weida Tong and Leihong Wu*, ","doi":"10.1021/acs.chemrestox.3c00028","DOIUrl":null,"url":null,"abstract":"<p >The US Food and Drug Administration (FDA) regulatory process often involves several reviewers who focus on sets of information related to their respective areas of review. Accordingly, manufacturers that provide submission packages to regulatory agencies are instructed to organize the contents using a structure that enables the information to be easily allocated, retrieved, and reviewed. However, this practice is not always followed correctly; as such, some documents are not well structured, with similar information spreading across different sections, hindering the efficient access and review of all of the relevant data as a whole. To improve this common situation, we evaluated an artificial intelligence (AI)-based natural language processing (NLP) methodology, called Bidirectional Encoder Representations from Transformers (BERT), to automatically classify free-text information into standardized sections, supporting a holistic review of drug safety and efficacy. Specifically, FDA labeling documents were used in this study as a proof of concept, where the labeling section structure defined by the Physician Label Rule (PLR) was used to classify labels in the development of the model. The model was subsequently evaluated on texts from both well-structured labeling documents (i.e., PLR-based labeling) and less- or differently structured documents (i.e., non-PLR and Summary of Product Characteristic [SmPC] labeling.) In the training process, the model yielded 96% and 88% accuracy for binary and multiclass tasks, respectively. The testing accuracies observed for the PLR, non-PLR, and SmPC testing data sets for the binary model were 95%, 88%, and 88%, and for the multiclass model were 82%, 73%, and 68%, respectively. Our study demonstrated that automatically classifying free texts into standardized sections with AI language models could be an advanced regulatory science approach for supporting the review process by effectively processing unformatted documents.</p>","PeriodicalId":31,"journal":{"name":"Chemical Research in Toxicology","volume":"36 8","pages":"1290–1299"},"PeriodicalIF":3.8000,"publicationDate":"2023-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acs.chemrestox.3c00028","citationCount":"1","resultStr":"{\"title\":\"Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents\",\"authors\":\"Magnus Gray, Joshua Xu, Weida Tong and Leihong Wu*, \",\"doi\":\"10.1021/acs.chemrestox.3c00028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >The US Food and Drug Administration (FDA) regulatory process often involves several reviewers who focus on sets of information related to their respective areas of review. Accordingly, manufacturers that provide submission packages to regulatory agencies are instructed to organize the contents using a structure that enables the information to be easily allocated, retrieved, and reviewed. However, this practice is not always followed correctly; as such, some documents are not well structured, with similar information spreading across different sections, hindering the efficient access and review of all of the relevant data as a whole. To improve this common situation, we evaluated an artificial intelligence (AI)-based natural language processing (NLP) methodology, called Bidirectional Encoder Representations from Transformers (BERT), to automatically classify free-text information into standardized sections, supporting a holistic review of drug safety and efficacy. Specifically, FDA labeling documents were used in this study as a proof of concept, where the labeling section structure defined by the Physician Label Rule (PLR) was used to classify labels in the development of the model. The model was subsequently evaluated on texts from both well-structured labeling documents (i.e., PLR-based labeling) and less- or differently structured documents (i.e., non-PLR and Summary of Product Characteristic [SmPC] labeling.) In the training process, the model yielded 96% and 88% accuracy for binary and multiclass tasks, respectively. The testing accuracies observed for the PLR, non-PLR, and SmPC testing data sets for the binary model were 95%, 88%, and 88%, and for the multiclass model were 82%, 73%, and 68%, respectively. Our study demonstrated that automatically classifying free texts into standardized sections with AI language models could be an advanced regulatory science approach for supporting the review process by effectively processing unformatted documents.</p>\",\"PeriodicalId\":31,\"journal\":{\"name\":\"Chemical Research in Toxicology\",\"volume\":\"36 8\",\"pages\":\"1290–1299\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2023-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://pubs.acs.org/doi/epdf/10.1021/acs.chemrestox.3c00028\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemical Research in Toxicology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.chemrestox.3c00028\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Research in Toxicology","FirstCategoryId":"3","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.chemrestox.3c00028","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 1

摘要

美国食品和药物管理局(FDA)的监管过程通常涉及几个审查员，他们专注于与各自审评领域相关的信息集。因此，向监管机构提供提交包的制造商被指示使用使信息易于分配、检索和审查的结构来组织内容。然而，这种做法并不总是正确地遵循;因此，一些文档结构不佳，类似的信息分散在不同的部分，阻碍了对所有相关数据的整体有效访问和审查。为了改善这种常见情况，我们评估了一种基于人工智能(AI)的自然语言处理(NLP)方法，称为变形器的双向编码器表示(BERT)，以自动将自由文本信息分类为标准化部分，支持对药物安全性和有效性的整体审查。具体而言，本研究使用FDA标签文件作为概念证明，其中使用医师标签规则(PLR)定义的标签部分结构对模型开发中的标签进行分类。该模型随后在结构良好的标签文档(即基于plr的标签)和结构较少或不同的文档(即非plr和产品特性摘要[SmPC]标签)的文本上进行了评估。在训练过程中，该模型对二分类任务和多分类任务的准确率分别为96%和88%。对于二元模型，PLR、非PLR和SmPC测试数据集的检测准确率分别为95%、88%和88%，而对于多类模型，检测准确率分别为82%、73%和68%。我们的研究表明，使用人工智能语言模型自动将自由文本分类为标准化部分可能是一种先进的监管科学方法，可以通过有效处理未格式化的文档来支持审查过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents

查看原文本刊更多论文

Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents

The US Food and Drug Administration (FDA) regulatory process often involves several reviewers who focus on sets of information related to their respective areas of review. Accordingly, manufacturers that provide submission packages to regulatory agencies are instructed to organize the contents using a structure that enables the information to be easily allocated, retrieved, and reviewed. However, this practice is not always followed correctly; as such, some documents are not well structured, with similar information spreading across different sections, hindering the efficient access and review of all of the relevant data as a whole. To improve this common situation, we evaluated an artificial intelligence (AI)-based natural language processing (NLP) methodology, called Bidirectional Encoder Representations from Transformers (BERT), to automatically classify free-text information into standardized sections, supporting a holistic review of drug safety and efficacy. Specifically, FDA labeling documents were used in this study as a proof of concept, where the labeling section structure defined by the Physician Label Rule (PLR) was used to classify labels in the development of the model. The model was subsequently evaluated on texts from both well-structured labeling documents (i.e., PLR-based labeling) and less- or differently structured documents (i.e., non-PLR and Summary of Product Characteristic [SmPC] labeling.) In the training process, the model yielded 96% and 88% accuracy for binary and multiclass tasks, respectively. The testing accuracies observed for the PLR, non-PLR, and SmPC testing data sets for the binary model were 95%, 88%, and 88%, and for the multiclass model were 82%, 73%, and 68%, respectively. Our study demonstrated that automatically classifying free texts into standardized sections with AI language models could be an advanced regulatory science approach for supporting the review process by effectively processing unformatted documents.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Chemical Research in Toxicology 医学-毒理学

CiteScore

7.90

自引率

7.30%

发文量

215

审稿时长

3.5 months

期刊介绍： Chemical Research in Toxicology publishes Articles, Rapid Reports, Chemical Profiles, Reviews, Perspectives, Letters to the Editor, and ToxWatch on a wide range of topics in Toxicology that inform a chemical and molecular understanding and capacity to predict biological outcomes on the basis of structures and processes. The overarching goal of activities reported in the Journal are to provide knowledge and innovative approaches needed to promote intelligent solutions for human safety and ecosystem preservation. The journal emphasizes insight concerning mechanisms of toxicity over phenomenological observations. It upholds rigorous chemical, physical and mathematical standards for characterization and application of modern techniques.