学习基于随机文本生成和不完整字典的化学名称提取

Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference) Pub Date : 2012-08-12 DOI:10.1145/2350176.2350180

Su Yan, W. Spangler, Ying Chen

{"title":"学习基于随机文本生成和不完整字典的化学名称提取","authors":"Su Yan, W. Spangler, Ying Chen","doi":"10.1145/2350176.2350180","DOIUrl":null,"url":null,"abstract":"Automatically extracting chemical names from text has significant value to biomedical and life science research. A major barrier in this task is the difficulty of getting a sizable good quality training set to train a reliable entity extraction model. Leveraging the well-studied random text generation techniques based on formal grammars, we explore the idea of automatically creating training sets for the task of chemical named entity extraction. Assuming the availability of an incomplete list of chemical names, we are able to generate well-controlled, random, yet realistic chemical-like training documents. Compared to state-of-the-art models learned from manually labeled data and rule-based systems using real-world data, our solutions show comparable or better results, with least human effort.","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"126 1","pages":"21-25"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Learning to extract chemical names based on random text generation and incomplete dictionary\",\"authors\":\"Su Yan, W. Spangler, Ying Chen\",\"doi\":\"10.1145/2350176.2350180\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatically extracting chemical names from text has significant value to biomedical and life science research. A major barrier in this task is the difficulty of getting a sizable good quality training set to train a reliable entity extraction model. Leveraging the well-studied random text generation techniques based on formal grammars, we explore the idea of automatically creating training sets for the task of chemical named entity extraction. Assuming the availability of an incomplete list of chemical names, we are able to generate well-controlled, random, yet realistic chemical-like training documents. Compared to state-of-the-art models learned from manually labeled data and rule-based systems using real-world data, our solutions show comparable or better results, with least human effort.\",\"PeriodicalId\":90497,\"journal\":{\"name\":\"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)\",\"volume\":\"126 1\",\"pages\":\"21-25\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2350176.2350180\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2350176.2350180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

化学物质名称的自动提取对生物医学和生命科学研究具有重要意义。该任务的一个主要障碍是难以获得相当大的高质量训练集来训练可靠的实体提取模型。利用基于形式语法的随机文本生成技术，我们探索了为化学命名实体提取任务自动创建训练集的想法。假设有一个不完整的化学名称列表，我们就能够生成控制良好的、随机的、但又逼真的化学类培训文档。与从人工标记数据和使用真实世界数据的基于规则的系统中学习的最先进的模型相比，我们的解决方案显示出可比或更好的结果，而人工付出的努力最少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning to extract chemical names based on random text generation and incomplete dictionary

Automatically extracting chemical names from text has significant value to biomedical and life science research. A major barrier in this task is the difficulty of getting a sizable good quality training set to train a reliable entity extraction model. Leveraging the well-studied random text generation techniques based on formal grammars, we explore the idea of automatically creating training sets for the task of chemical named entity extraction. Assuming the availability of an incomplete list of chemical names, we are able to generate well-controlled, random, yet realistic chemical-like training documents. Compared to state-of-the-art models learned from manually labeled data and rule-based systems using real-world data, our solutions show comparable or better results, with least human effort.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)

自引率

0.00%

发文量