Wutthipong Kongburan, P. Padungweang, Worarat Krathu, Jonathan H. Chan
{"title":"基于生物医学文摘的甲状腺癌干预语料库的半自动构建","authors":"Wutthipong Kongburan, P. Padungweang, Worarat Krathu, Jonathan H. Chan","doi":"10.1109/ICACI.2016.7449819","DOIUrl":null,"url":null,"abstract":"Thyroid cancer is a common endocrine tumor that is experiencing a steady increase in incidence worldwide. The latest discoveries on disease and its treatment are mostly propagated in the form of biomedical publications such as those in PubMed. Unfortunately, this information is distributed in unstructured text with over two thousand articles being added annually. Text mining technology plays an important role in information extraction, since it can be used to uncover hidden value from the vast amount of text in reasonable time. In general, a preliminary task of text mining is Named Entity Recognition (NER). In this case, a gold standard corpus is needed, since the capability of NER depends on a trustworthy corpus. However the construction of gold standard corpus is a laborious and time-consuming process. In order to obtain a reasonably practical corpus in a limited time, this paper consequently proposes a semiautomatic approach to construct a thyroid cancer interventions corpus. The experimental results demonstrate that the proposed method can be used to construct a thyroid cancer intervention corpus reasonably in terms of both performance and overfitting avoidance.","PeriodicalId":211040,"journal":{"name":"2016 Eighth International Conference on Advanced Computational Intelligence (ICACI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Semi-automatic construction of thyroid cancer intervention corpus from biomedical abstracts\",\"authors\":\"Wutthipong Kongburan, P. Padungweang, Worarat Krathu, Jonathan H. Chan\",\"doi\":\"10.1109/ICACI.2016.7449819\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Thyroid cancer is a common endocrine tumor that is experiencing a steady increase in incidence worldwide. The latest discoveries on disease and its treatment are mostly propagated in the form of biomedical publications such as those in PubMed. Unfortunately, this information is distributed in unstructured text with over two thousand articles being added annually. Text mining technology plays an important role in information extraction, since it can be used to uncover hidden value from the vast amount of text in reasonable time. In general, a preliminary task of text mining is Named Entity Recognition (NER). In this case, a gold standard corpus is needed, since the capability of NER depends on a trustworthy corpus. However the construction of gold standard corpus is a laborious and time-consuming process. In order to obtain a reasonably practical corpus in a limited time, this paper consequently proposes a semiautomatic approach to construct a thyroid cancer interventions corpus. The experimental results demonstrate that the proposed method can be used to construct a thyroid cancer intervention corpus reasonably in terms of both performance and overfitting avoidance.\",\"PeriodicalId\":211040,\"journal\":{\"name\":\"2016 Eighth International Conference on Advanced Computational Intelligence (ICACI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Eighth International Conference on Advanced Computational Intelligence (ICACI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACI.2016.7449819\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Eighth International Conference on Advanced Computational Intelligence (ICACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACI.2016.7449819","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semi-automatic construction of thyroid cancer intervention corpus from biomedical abstracts
Thyroid cancer is a common endocrine tumor that is experiencing a steady increase in incidence worldwide. The latest discoveries on disease and its treatment are mostly propagated in the form of biomedical publications such as those in PubMed. Unfortunately, this information is distributed in unstructured text with over two thousand articles being added annually. Text mining technology plays an important role in information extraction, since it can be used to uncover hidden value from the vast amount of text in reasonable time. In general, a preliminary task of text mining is Named Entity Recognition (NER). In this case, a gold standard corpus is needed, since the capability of NER depends on a trustworthy corpus. However the construction of gold standard corpus is a laborious and time-consuming process. In order to obtain a reasonably practical corpus in a limited time, this paper consequently proposes a semiautomatic approach to construct a thyroid cancer interventions corpus. The experimental results demonstrate that the proposed method can be used to construct a thyroid cancer intervention corpus reasonably in terms of both performance and overfitting avoidance.