{"title":"在没有标记数据的临床试验方案中推断适当的资格标准","authors":"Angelo C. Restificar, S. Ananiadou","doi":"10.1145/2390068.2390074","DOIUrl":null,"url":null,"abstract":"We consider the user task of designing clinical trial protocols and propose a method that outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which itself contains a set of eligibility criteria. Given a small set of sample documents D', |D'|<<|D|, a user has initially identified as relevant e.g., via a user query interface, our scoring method automatically suggests eligibility criteria from D by ranking them according to how appropriate they are to the clinical trial protocol currently being designed. We view a document as a mixture of latent topics and our method exploits this by applying a three-step procedure. First, we infer the latent topics in the sample documents using Latent Dirichlet Allocation (LDA) [3]. Next, we use logistic regression models to compute the probability that a given candidate criterion belongs to a particular topic. Lastly, we score each criterion by computing its expected value, the probability-weighted sum of the topic proportions inferred from the set of sample documents. Intuitively, the greater the probability that a candidate criterion belongs to the topics that are dominant in the samples, the higher its expected value or score. Results from our experiments indicate that our proposed method is 8 and 9 times better (resp., for inclusion and exclusion criteria) than randomly choosing from a set of candidates obtained from relevant documents. In user simulation experiments, we were able to automatically construct eligibility criteria that are on the average 75% and 70% (resp., for inclusion and exclusion criteria) similar to the correct eligibility criteria.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"38 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Inferring appropriate eligibility criteria in clinical trial protocols without labeled data\",\"authors\":\"Angelo C. Restificar, S. Ananiadou\",\"doi\":\"10.1145/2390068.2390074\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the user task of designing clinical trial protocols and propose a method that outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which itself contains a set of eligibility criteria. Given a small set of sample documents D', |D'|<<|D|, a user has initially identified as relevant e.g., via a user query interface, our scoring method automatically suggests eligibility criteria from D by ranking them according to how appropriate they are to the clinical trial protocol currently being designed. We view a document as a mixture of latent topics and our method exploits this by applying a three-step procedure. First, we infer the latent topics in the sample documents using Latent Dirichlet Allocation (LDA) [3]. Next, we use logistic regression models to compute the probability that a given candidate criterion belongs to a particular topic. Lastly, we score each criterion by computing its expected value, the probability-weighted sum of the topic proportions inferred from the set of sample documents. Intuitively, the greater the probability that a candidate criterion belongs to the topics that are dominant in the samples, the higher its expected value or score. Results from our experiments indicate that our proposed method is 8 and 9 times better (resp., for inclusion and exclusion criteria) than randomly choosing from a set of candidates obtained from relevant documents. In user simulation experiments, we were able to automatically construct eligibility criteria that are on the average 75% and 70% (resp., for inclusion and exclusion criteria) similar to the correct eligibility criteria.\",\"PeriodicalId\":143937,\"journal\":{\"name\":\"Data and Text Mining in Bioinformatics\",\"volume\":\"38 \",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data and Text Mining in Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2390068.2390074\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and Text Mining in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2390068.2390074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Inferring appropriate eligibility criteria in clinical trial protocols without labeled data
We consider the user task of designing clinical trial protocols and propose a method that outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which itself contains a set of eligibility criteria. Given a small set of sample documents D', |D'|<<|D|, a user has initially identified as relevant e.g., via a user query interface, our scoring method automatically suggests eligibility criteria from D by ranking them according to how appropriate they are to the clinical trial protocol currently being designed. We view a document as a mixture of latent topics and our method exploits this by applying a three-step procedure. First, we infer the latent topics in the sample documents using Latent Dirichlet Allocation (LDA) [3]. Next, we use logistic regression models to compute the probability that a given candidate criterion belongs to a particular topic. Lastly, we score each criterion by computing its expected value, the probability-weighted sum of the topic proportions inferred from the set of sample documents. Intuitively, the greater the probability that a candidate criterion belongs to the topics that are dominant in the samples, the higher its expected value or score. Results from our experiments indicate that our proposed method is 8 and 9 times better (resp., for inclusion and exclusion criteria) than randomly choosing from a set of candidates obtained from relevant documents. In user simulation experiments, we were able to automatically construct eligibility criteria that are on the average 75% and 70% (resp., for inclusion and exclusion criteria) similar to the correct eligibility criteria.