{"title":"学习基于随机文本生成和不完整字典的化学名称提取","authors":"Su Yan, W. Spangler, Ying Chen","doi":"10.1145/2350176.2350180","DOIUrl":null,"url":null,"abstract":"Automatically extracting chemical names from text has significant value to biomedical and life science research. A major barrier in this task is the difficulty of getting a sizable good quality training set to train a reliable entity extraction model. Leveraging the well-studied random text generation techniques based on formal grammars, we explore the idea of automatically creating training sets for the task of chemical named entity extraction. Assuming the availability of an incomplete list of chemical names, we are able to generate well-controlled, random, yet realistic chemical-like training documents. Compared to state-of-the-art models learned from manually labeled data and rule-based systems using real-world data, our solutions show comparable or better results, with least human effort.","PeriodicalId":90497,"journal":{"name":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","volume":"126 1","pages":"21-25"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Learning to extract chemical names based on random text generation and incomplete dictionary\",\"authors\":\"Su Yan, W. Spangler, Ying Chen\",\"doi\":\"10.1145/2350176.2350180\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatically extracting chemical names from text has significant value to biomedical and life science research. A major barrier in this task is the difficulty of getting a sizable good quality training set to train a reliable entity extraction model. Leveraging the well-studied random text generation techniques based on formal grammars, we explore the idea of automatically creating training sets for the task of chemical named entity extraction. Assuming the availability of an incomplete list of chemical names, we are able to generate well-controlled, random, yet realistic chemical-like training documents. Compared to state-of-the-art models learned from manually labeled data and rule-based systems using real-world data, our solutions show comparable or better results, with least human effort.\",\"PeriodicalId\":90497,\"journal\":{\"name\":\"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)\",\"volume\":\"126 1\",\"pages\":\"21-25\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2350176.2350180\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2350176.2350180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning to extract chemical names based on random text generation and incomplete dictionary
Automatically extracting chemical names from text has significant value to biomedical and life science research. A major barrier in this task is the difficulty of getting a sizable good quality training set to train a reliable entity extraction model. Leveraging the well-studied random text generation techniques based on formal grammars, we explore the idea of automatically creating training sets for the task of chemical named entity extraction. Assuming the availability of an incomplete list of chemical names, we are able to generate well-controlled, random, yet realistic chemical-like training documents. Compared to state-of-the-art models learned from manually labeled data and rule-based systems using real-world data, our solutions show comparable or better results, with least human effort.