识别生物医学文献中的科学伪迹。

Biomedical informatics insights Pub Date : 2013-04-02 Print Date: 2013-01-01 DOI:10.4137/BII.S11572

Tudor Groza, Hamed Hassanzadeh, Jane Hunter

{"title":"识别生物医学文献中的科学伪迹。","authors":"Tudor Groza, Hamed Hassanzadeh, Jane Hunter","doi":"10.4137/BII.S11572","DOIUrl":null,"url":null,"abstract":"Today's search engines and digital libraries offer little or no support for discovering those scientific artifacts (hypotheses, supporting/contradicting statements, or findings) that form the core of scientific written communication. Consequently, we currently have no means of identifying central themes within a domain or to detect gaps between accepted knowledge and newly emerging knowledge as a means for tracking the evolution of hypotheses from incipient phases to maturity or decline. We present a hybrid Machine Learning approach using an ensemble of four classifiers, for recognizing scientific artifacts (ie, hypotheses, background, motivation, objectives, and findings) within biomedical research publications, as a precursory step to the general goal of automatically creating argumentative discourse networks that span across multiple publications. The performance achieved by the classifiers ranges from 15.30% to 78.39%, subject to the target class. The set of features used for classification has led to promising results. Furthermore, their use strictly in a local, publication scope, ie, without aggregating corpus-wide statistics, increases the versatility of the ensemble of classifiers and enables its direct applicability without the necessity of re-training.","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"6 ","pages":"15-27"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4137/BII.S11572","citationCount":"10","resultStr":"{\"title\":\"Recognizing scientific artifacts in biomedical literature.\",\"authors\":\"Tudor Groza, Hamed Hassanzadeh, Jane Hunter\",\"doi\":\"10.4137/BII.S11572\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Today's search engines and digital libraries offer little or no support for discovering those scientific artifacts (hypotheses, supporting/contradicting statements, or findings) that form the core of scientific written communication. Consequently, we currently have no means of identifying central themes within a domain or to detect gaps between accepted knowledge and newly emerging knowledge as a means for tracking the evolution of hypotheses from incipient phases to maturity or decline. We present a hybrid Machine Learning approach using an ensemble of four classifiers, for recognizing scientific artifacts (ie, hypotheses, background, motivation, objectives, and findings) within biomedical research publications, as a precursory step to the general goal of automatically creating argumentative discourse networks that span across multiple publications. The performance achieved by the classifiers ranges from 15.30% to 78.39%, subject to the target class. The set of features used for classification has led to promising results. Furthermore, their use strictly in a local, publication scope, ie, without aggregating corpus-wide statistics, increases the versatility of the ensemble of classifiers and enables its direct applicability without the necessity of re-training.\",\"PeriodicalId\":88397,\"journal\":{\"name\":\"Biomedical informatics insights\",\"volume\":\"6 \",\"pages\":\"15-27\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.4137/BII.S11572\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical informatics insights\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4137/BII.S11572\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2013/1/1 0:00:00\",\"PubModel\":\"Print\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical informatics insights","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4137/BII.S11572","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2013/1/1 0:00:00","PubModel":"Print","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

今天的搜索引擎和数字图书馆很少或根本不支持发现那些构成科学书面交流核心的科学工件(假设，支持/矛盾的陈述或发现)。因此，我们目前没有办法确定一个领域内的中心主题，或者检测公认的知识和新兴知识之间的差距，作为跟踪假设从早期阶段到成熟或衰退的演变的一种手段。我们提出了一种混合机器学习方法，使用四个分类器的集合来识别生物医学研究出版物中的科学工件(即假设，背景，动机，目标和发现)，作为自动创建跨越多个出版物的论证性话语网络的总体目标的前驱。根据目标类别的不同，分类器实现的性能在15.30%到78.39%之间。用于分类的特征集产生了有希望的结果。此外，它们严格地用于局部出版范围，即不聚合整个语料库的统计数据，从而增加了分类器集合的通用性，并使其无需重新训练即可直接适用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Recognizing scientific artifacts in biomedical literature.

Today's search engines and digital libraries offer little or no support for discovering those scientific artifacts (hypotheses, supporting/contradicting statements, or findings) that form the core of scientific written communication. Consequently, we currently have no means of identifying central themes within a domain or to detect gaps between accepted knowledge and newly emerging knowledge as a means for tracking the evolution of hypotheses from incipient phases to maturity or decline. We present a hybrid Machine Learning approach using an ensemble of four classifiers, for recognizing scientific artifacts (ie, hypotheses, background, motivation, objectives, and findings) within biomedical research publications, as a precursory step to the general goal of automatically creating argumentative discourse networks that span across multiple publications. The performance achieved by the classifiers ranges from 15.30% to 78.39%, subject to the target class. The set of features used for classification has led to promising results. Furthermore, their use strictly in a local, publication scope, ie, without aggregating corpus-wide statistics, increases the versatility of the ensemble of classifiers and enables its direct applicability without the necessity of re-training.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biomedical informatics insights

自引率

0.00%

发文量