基于生物医学文摘的甲状腺癌干预语料库的半自动构建

Wutthipong Kongburan, P. Padungweang, Worarat Krathu, Jonathan H. Chan
{"title":"基于生物医学文摘的甲状腺癌干预语料库的半自动构建","authors":"Wutthipong Kongburan, P. Padungweang, Worarat Krathu, Jonathan H. Chan","doi":"10.1109/ICACI.2016.7449819","DOIUrl":null,"url":null,"abstract":"Thyroid cancer is a common endocrine tumor that is experiencing a steady increase in incidence worldwide. The latest discoveries on disease and its treatment are mostly propagated in the form of biomedical publications such as those in PubMed. Unfortunately, this information is distributed in unstructured text with over two thousand articles being added annually. Text mining technology plays an important role in information extraction, since it can be used to uncover hidden value from the vast amount of text in reasonable time. In general, a preliminary task of text mining is Named Entity Recognition (NER). In this case, a gold standard corpus is needed, since the capability of NER depends on a trustworthy corpus. However the construction of gold standard corpus is a laborious and time-consuming process. In order to obtain a reasonably practical corpus in a limited time, this paper consequently proposes a semiautomatic approach to construct a thyroid cancer interventions corpus. The experimental results demonstrate that the proposed method can be used to construct a thyroid cancer intervention corpus reasonably in terms of both performance and overfitting avoidance.","PeriodicalId":211040,"journal":{"name":"2016 Eighth International Conference on Advanced Computational Intelligence (ICACI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Semi-automatic construction of thyroid cancer intervention corpus from biomedical abstracts\",\"authors\":\"Wutthipong Kongburan, P. Padungweang, Worarat Krathu, Jonathan H. Chan\",\"doi\":\"10.1109/ICACI.2016.7449819\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Thyroid cancer is a common endocrine tumor that is experiencing a steady increase in incidence worldwide. The latest discoveries on disease and its treatment are mostly propagated in the form of biomedical publications such as those in PubMed. Unfortunately, this information is distributed in unstructured text with over two thousand articles being added annually. Text mining technology plays an important role in information extraction, since it can be used to uncover hidden value from the vast amount of text in reasonable time. In general, a preliminary task of text mining is Named Entity Recognition (NER). In this case, a gold standard corpus is needed, since the capability of NER depends on a trustworthy corpus. However the construction of gold standard corpus is a laborious and time-consuming process. In order to obtain a reasonably practical corpus in a limited time, this paper consequently proposes a semiautomatic approach to construct a thyroid cancer interventions corpus. The experimental results demonstrate that the proposed method can be used to construct a thyroid cancer intervention corpus reasonably in terms of both performance and overfitting avoidance.\",\"PeriodicalId\":211040,\"journal\":{\"name\":\"2016 Eighth International Conference on Advanced Computational Intelligence (ICACI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Eighth International Conference on Advanced Computational Intelligence (ICACI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACI.2016.7449819\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Eighth International Conference on Advanced Computational Intelligence (ICACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACI.2016.7449819","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

甲状腺癌是一种常见的内分泌肿瘤,在世界范围内的发病率正在稳步上升。关于疾病及其治疗的最新发现大多以诸如PubMed等生物医学出版物的形式传播。不幸的是,这些信息是以非结构化文本的形式分发的,每年增加两千多篇文章。文本挖掘技术在信息提取中起着重要的作用,它可以在合理的时间内从海量的文本中发现隐藏的价值。一般来说,文本挖掘的一个初步任务是命名实体识别(NER)。在这种情况下,需要一个金标准语料库,因为NER的能力依赖于一个值得信赖的语料库。然而,构建金标准语料库是一个费时费力的过程。为了在有限的时间内获得较为实用的语料库,本文提出了一种半自动构建甲状腺癌干预语料库的方法。实验结果表明,该方法在性能和避免过拟合方面都能合理地构建甲状腺癌干预语料库。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Semi-automatic construction of thyroid cancer intervention corpus from biomedical abstracts
Thyroid cancer is a common endocrine tumor that is experiencing a steady increase in incidence worldwide. The latest discoveries on disease and its treatment are mostly propagated in the form of biomedical publications such as those in PubMed. Unfortunately, this information is distributed in unstructured text with over two thousand articles being added annually. Text mining technology plays an important role in information extraction, since it can be used to uncover hidden value from the vast amount of text in reasonable time. In general, a preliminary task of text mining is Named Entity Recognition (NER). In this case, a gold standard corpus is needed, since the capability of NER depends on a trustworthy corpus. However the construction of gold standard corpus is a laborious and time-consuming process. In order to obtain a reasonably practical corpus in a limited time, this paper consequently proposes a semiautomatic approach to construct a thyroid cancer interventions corpus. The experimental results demonstrate that the proposed method can be used to construct a thyroid cancer intervention corpus reasonably in terms of both performance and overfitting avoidance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信