从自由文本中挖掘患者用药状态的最大熵模型。

Proceedings. AMIA Symposium Pub Date : 2002-01-01

Serguei V Pakhomov, Alexander Ruggieri, Christopher G Chute

{"title":"从自由文本中挖掘患者用药状态的最大熵模型。","authors":"Serguei V Pakhomov, Alexander Ruggieri, Christopher G Chute","doi":"","DOIUrl":null,"url":null,"abstract":"Using a classification scheme of patient medication status we sought to recognize and categorize medications mentioned in the unrestricted text of clinical documents generated in clinical practice. The categories refer to the patient's status with respect to the medication such as discontinuation, start or initiation, and continuation of a given medication. This categorization is performed with a machine learning technique, Maximum Entropy (ME), that is well suited to incorporating heterogeneous sources of information necessary for classifying patient's medication status. We use hand labeled training data to generate ME models and test 5 different training feature sets. Our results show that the most optimal feature set includes a combination of the following: two words preceding and following the mention of the drug, the subject of the sentence in which the drug mention occurs, the 2 words following the subject, and a binary feature vector of lexicalized semantic cues indicative of medication status or its change. The average predictive power of a model trained on these features is approximately 89%.","PeriodicalId":79712,"journal":{"name":"Proceedings. AMIA Symposium","volume":" ","pages":"587-91"},"PeriodicalIF":0.0000,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2244576/pdf/procamiasymp00001-0628.pdf","citationCount":"0","resultStr":"{\"title\":\"Maximum entropy modeling for mining patient medication status from free text.\",\"authors\":\"Serguei V Pakhomov, Alexander Ruggieri, Christopher G Chute\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Using a classification scheme of patient medication status we sought to recognize and categorize medications mentioned in the unrestricted text of clinical documents generated in clinical practice. The categories refer to the patient's status with respect to the medication such as discontinuation, start or initiation, and continuation of a given medication. This categorization is performed with a machine learning technique, Maximum Entropy (ME), that is well suited to incorporating heterogeneous sources of information necessary for classifying patient's medication status. We use hand labeled training data to generate ME models and test 5 different training feature sets. Our results show that the most optimal feature set includes a combination of the following: two words preceding and following the mention of the drug, the subject of the sentence in which the drug mention occurs, the 2 words following the subject, and a binary feature vector of lexicalized semantic cues indicative of medication status or its change. The average predictive power of a model trained on these features is approximately 89%.\",\"PeriodicalId\":79712,\"journal\":{\"name\":\"Proceedings. AMIA Symposium\",\"volume\":\" \",\"pages\":\"587-91\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2244576/pdf/procamiasymp00001-0628.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. AMIA Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. AMIA Symposium","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

使用患者用药状态的分类方案，我们试图识别和分类在临床实践中产生的临床文件的不受限制的文本中提到的药物。这些类别是指患者与药物有关的状态，例如停药，开始或开始，以及继续服用给定药物。这种分类是通过机器学习技术进行的，最大熵(ME)，它非常适合合并对患者药物状态进行分类所需的异构信息源。我们使用手工标记的训练数据来生成ME模型，并测试了5个不同的训练特征集。我们的研究结果表明，最优特征集包括以下组合:提到药物的前后两个词，提到药物的句子的主语，主语后面的两个词，以及指示药物状态或其变化的词汇化语义线索的二进制特征向量。在这些特征上训练的模型的平均预测能力约为89%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

本刊更多论文

Maximum entropy modeling for mining patient medication status from free text.

Using a classification scheme of patient medication status we sought to recognize and categorize medications mentioned in the unrestricted text of clinical documents generated in clinical practice. The categories refer to the patient's status with respect to the medication such as discontinuation, start or initiation, and continuation of a given medication. This categorization is performed with a machine learning technique, Maximum Entropy (ME), that is well suited to incorporating heterogeneous sources of information necessary for classifying patient's medication status. We use hand labeled training data to generate ME models and test 5 different training feature sets. Our results show that the most optimal feature set includes a combination of the following: two words preceding and following the mention of the drug, the subject of the sentence in which the drug mention occurs, the 2 words following the subject, and a binary feature vector of lexicalized semantic cues indicative of medication status or its change. The average predictive power of a model trained on these features is approximately 89%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. AMIA Symposium

自引率

0.00%

发文量