学习基于随机文本生成和不完整字典的化学名称提取

Su Yan, W. Spangler, Ying Chen
{"title":"学习基于随机文本生成和不完整字典的化学名称提取","authors":"Su Yan, W. Spangler, Ying Chen","doi":"10.1145/2350176.2350180","DOIUrl":null,"url":null,"abstract":"Automatically extracting chemical names from text has significant value to biomedical and life science research. A major barrier in this task is the difficulty of getting a sizable good quality training set to train a reliable entity extraction model. Leveraging the well-studied random text generation techniques based on formal grammars, we explore the idea of automatically creating training sets for the task of chemical named entity extraction. Assuming the availability of an incomplete list of chemical names, we are able to generate well-controlled, random, yet realistic chemical-like training documents. Compared to state-of-the-art models learned from manually labeled data and rule-based systems using real-world data, our solutions show comparable or better results, with least human effort.","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"126 1","pages":"21-25"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Learning to extract chemical names based on random text generation and incomplete dictionary\",\"authors\":\"Su Yan, W. Spangler, Ying Chen\",\"doi\":\"10.1145/2350176.2350180\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatically extracting chemical names from text has significant value to biomedical and life science research. A major barrier in this task is the difficulty of getting a sizable good quality training set to train a reliable entity extraction model. Leveraging the well-studied random text generation techniques based on formal grammars, we explore the idea of automatically creating training sets for the task of chemical named entity extraction. Assuming the availability of an incomplete list of chemical names, we are able to generate well-controlled, random, yet realistic chemical-like training documents. Compared to state-of-the-art models learned from manually labeled data and rule-based systems using real-world data, our solutions show comparable or better results, with least human effort.\",\"PeriodicalId\":90497,\"journal\":{\"name\":\"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)\",\"volume\":\"126 1\",\"pages\":\"21-25\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2350176.2350180\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2350176.2350180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

化学物质名称的自动提取对生物医学和生命科学研究具有重要意义。该任务的一个主要障碍是难以获得相当大的高质量训练集来训练可靠的实体提取模型。利用基于形式语法的随机文本生成技术,我们探索了为化学命名实体提取任务自动创建训练集的想法。假设有一个不完整的化学名称列表,我们就能够生成控制良好的、随机的、但又逼真的化学类培训文档。与从人工标记数据和使用真实世界数据的基于规则的系统中学习的最先进的模型相比,我们的解决方案显示出可比或更好的结果,而人工付出的努力最少。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learning to extract chemical names based on random text generation and incomplete dictionary
Automatically extracting chemical names from text has significant value to biomedical and life science research. A major barrier in this task is the difficulty of getting a sizable good quality training set to train a reliable entity extraction model. Leveraging the well-studied random text generation techniques based on formal grammars, we explore the idea of automatically creating training sets for the task of chemical named entity extraction. Assuming the availability of an incomplete list of chemical names, we are able to generate well-controlled, random, yet realistic chemical-like training documents. Compared to state-of-the-art models learned from manually labeled data and rule-based systems using real-world data, our solutions show comparable or better results, with least human effort.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信