Aditya Rao, Thomas Joseph, V. Saipradeep, Rajgopal Srinivasan
{"title":"医药文本中基于UIMA的解决方案","authors":"Aditya Rao, Thomas Joseph, V. Saipradeep, Rajgopal Srinivasan","doi":"10.1109/BIBM.2015.7359958","DOIUrl":null,"url":null,"abstract":"Background: Text-processing of unstructured biomedical text has become crucial to pharma companies, both with regards to legacy as well as topical documentation. The Apache Unstructured Information Management Applications (UIMA) framework addresses general information extraction requirements. We present in this poster two use cases of using UIMA for specific unstructured biomedical information extraction tasks in pharma companies. The first use case requires extraction of values belonging to specific fields from legacy clinical study documents. These fields could be diverse, examples being study duration, study population, study arm, completion date and co-morbidity. The second use case deals with accurate propagation of drug label information to digital channels such as drug-specific websites. Due to the increased importance of such websites and mobile applications, pharma companies are looking at text-processing solutions to keep information in such channels accurate and up-to-date. Implementation: The use cases were implemented using the UIMA framework. The framework comprises of core UIMA modules and custom in-house modules specifically built for each of the use cases. Some of the key custom modules include document clustering, section identification, named entity recognition and relation-identification. For the first use case, a total of 70 fields were extracted from clinical study reports. These included study phase, study type, study duration, study start date and the drug dosage. For the second use case, content extraction was first done on drug-websites, and fields such as target dosage, dosage regimen and study duration were then extracted from the content. The field values were evaluated for accuracy against the label information. Conclusion: Both implementations were successful, with high degree of precision and recall. The second use case has successfully moved from proof-of-concept to pilot phase. While there is a requirement for comprehensive knowledge management solutions dealing with exploration and management of biomedical text within the big data umbrella in pharma, we have seen that there also exist small and specific problems in the within the industry that can benefit from bespoke text-processing solutions built around frameworks such as UIMA.","PeriodicalId":186217,"journal":{"name":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"UIMA based solution in pharma text\",\"authors\":\"Aditya Rao, Thomas Joseph, V. Saipradeep, Rajgopal Srinivasan\",\"doi\":\"10.1109/BIBM.2015.7359958\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Text-processing of unstructured biomedical text has become crucial to pharma companies, both with regards to legacy as well as topical documentation. The Apache Unstructured Information Management Applications (UIMA) framework addresses general information extraction requirements. We present in this poster two use cases of using UIMA for specific unstructured biomedical information extraction tasks in pharma companies. The first use case requires extraction of values belonging to specific fields from legacy clinical study documents. These fields could be diverse, examples being study duration, study population, study arm, completion date and co-morbidity. The second use case deals with accurate propagation of drug label information to digital channels such as drug-specific websites. Due to the increased importance of such websites and mobile applications, pharma companies are looking at text-processing solutions to keep information in such channels accurate and up-to-date. Implementation: The use cases were implemented using the UIMA framework. The framework comprises of core UIMA modules and custom in-house modules specifically built for each of the use cases. Some of the key custom modules include document clustering, section identification, named entity recognition and relation-identification. For the first use case, a total of 70 fields were extracted from clinical study reports. These included study phase, study type, study duration, study start date and the drug dosage. For the second use case, content extraction was first done on drug-websites, and fields such as target dosage, dosage regimen and study duration were then extracted from the content. The field values were evaluated for accuracy against the label information. Conclusion: Both implementations were successful, with high degree of precision and recall. The second use case has successfully moved from proof-of-concept to pilot phase. While there is a requirement for comprehensive knowledge management solutions dealing with exploration and management of biomedical text within the big data umbrella in pharma, we have seen that there also exist small and specific problems in the within the industry that can benefit from bespoke text-processing solutions built around frameworks such as UIMA.\",\"PeriodicalId\":186217,\"journal\":{\"name\":\"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM.2015.7359958\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2015.7359958","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Background: Text-processing of unstructured biomedical text has become crucial to pharma companies, both with regards to legacy as well as topical documentation. The Apache Unstructured Information Management Applications (UIMA) framework addresses general information extraction requirements. We present in this poster two use cases of using UIMA for specific unstructured biomedical information extraction tasks in pharma companies. The first use case requires extraction of values belonging to specific fields from legacy clinical study documents. These fields could be diverse, examples being study duration, study population, study arm, completion date and co-morbidity. The second use case deals with accurate propagation of drug label information to digital channels such as drug-specific websites. Due to the increased importance of such websites and mobile applications, pharma companies are looking at text-processing solutions to keep information in such channels accurate and up-to-date. Implementation: The use cases were implemented using the UIMA framework. The framework comprises of core UIMA modules and custom in-house modules specifically built for each of the use cases. Some of the key custom modules include document clustering, section identification, named entity recognition and relation-identification. For the first use case, a total of 70 fields were extracted from clinical study reports. These included study phase, study type, study duration, study start date and the drug dosage. For the second use case, content extraction was first done on drug-websites, and fields such as target dosage, dosage regimen and study duration were then extracted from the content. The field values were evaluated for accuracy against the label information. Conclusion: Both implementations were successful, with high degree of precision and recall. The second use case has successfully moved from proof-of-concept to pilot phase. While there is a requirement for comprehensive knowledge management solutions dealing with exploration and management of biomedical text within the big data umbrella in pharma, we have seen that there also exist small and specific problems in the within the industry that can benefit from bespoke text-processing solutions built around frameworks such as UIMA.