Serguei V Pakhomov, Alexander Ruggieri, Christopher G Chute
{"title":"从自由文本中挖掘患者用药状态的最大熵模型。","authors":"Serguei V Pakhomov, Alexander Ruggieri, Christopher G Chute","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Using a classification scheme of patient medication status we sought to recognize and categorize medications mentioned in the unrestricted text of clinical documents generated in clinical practice. The categories refer to the patient's status with respect to the medication such as discontinuation, start or initiation, and continuation of a given medication. This categorization is performed with a machine learning technique, Maximum Entropy (ME), that is well suited to incorporating heterogeneous sources of information necessary for classifying patient's medication status. We use hand labeled training data to generate ME models and test 5 different training feature sets. Our results show that the most optimal feature set includes a combination of the following: two words preceding and following the mention of the drug, the subject of the sentence in which the drug mention occurs, the 2 words following the subject, and a binary feature vector of lexicalized semantic cues indicative of medication status or its change. The average predictive power of a model trained on these features is approximately 89%.</p>","PeriodicalId":79712,"journal":{"name":"Proceedings. AMIA Symposium","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2244576/pdf/procamiasymp00001-0628.pdf","citationCount":"0","resultStr":"{\"title\":\"Maximum entropy modeling for mining patient medication status from free text.\",\"authors\":\"Serguei V Pakhomov, Alexander Ruggieri, Christopher G Chute\",\"doi\":\"\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Using a classification scheme of patient medication status we sought to recognize and categorize medications mentioned in the unrestricted text of clinical documents generated in clinical practice. The categories refer to the patient's status with respect to the medication such as discontinuation, start or initiation, and continuation of a given medication. This categorization is performed with a machine learning technique, Maximum Entropy (ME), that is well suited to incorporating heterogeneous sources of information necessary for classifying patient's medication status. We use hand labeled training data to generate ME models and test 5 different training feature sets. Our results show that the most optimal feature set includes a combination of the following: two words preceding and following the mention of the drug, the subject of the sentence in which the drug mention occurs, the 2 words following the subject, and a binary feature vector of lexicalized semantic cues indicative of medication status or its change. The average predictive power of a model trained on these features is approximately 89%.</p>\",\"PeriodicalId\":79712,\"journal\":{\"name\":\"Proceedings. AMIA Symposium\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2244576/pdf/procamiasymp00001-0628.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. AMIA Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. AMIA Symposium","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Maximum entropy modeling for mining patient medication status from free text.
Using a classification scheme of patient medication status we sought to recognize and categorize medications mentioned in the unrestricted text of clinical documents generated in clinical practice. The categories refer to the patient's status with respect to the medication such as discontinuation, start or initiation, and continuation of a given medication. This categorization is performed with a machine learning technique, Maximum Entropy (ME), that is well suited to incorporating heterogeneous sources of information necessary for classifying patient's medication status. We use hand labeled training data to generate ME models and test 5 different training feature sets. Our results show that the most optimal feature set includes a combination of the following: two words preceding and following the mention of the drug, the subject of the sentence in which the drug mention occurs, the 2 words following the subject, and a binary feature vector of lexicalized semantic cues indicative of medication status or its change. The average predictive power of a model trained on these features is approximately 89%.