Automatically classifying sentences in full-text biomedical articles into introduction, methods, results and discussion.

Summit on translational bioinformatics Pub Date : 2009-03-01

Shashank Agarwal, Hong Yu

引用次数: 0

Abstract

BIOMEDICAL TEXTS CAN BE TYPICALLY REPRESENTED BY FOUR RHETORICAL CATEGORIES: introduction, methods, results and discussion (IMRAD). Classifying sentences into these categories can benefit many other text-mining tasks. Although many studies have applied approaches to automatically classify sentences in MEDLINE abstracts into the IMRAD categories, few have explored the classification of sentences that appear in full-text biomedical articles. We explored different approaches to automatically classify a sentence in a full-text biomedical article into the IMRAD categories. Our best system is a support vector machine classifier that achieved 81.30% accuracy, which is significantly higher than baseline systems.

本刊更多论文

生物医学全文文章中的句子自动分类为引言、方法、结果和讨论。

生物医学语篇通常由引言、方法、结果和讨论(IMRAD)四种修辞类型来表现。将句子分类到这些类别中可以使许多其他文本挖掘任务受益。尽管许多研究已经应用方法将MEDLINE摘要中的句子自动分类到IMRAD类别中，但很少有研究探索出现在全文生物医学文章中的句子分类。我们探索了将全文生物医学文章中的句子自动分类为IMRAD类别的不同方法。我们最好的系统是一个支持向量机分类器，它达到了81.30%的准确率，明显高于基线系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Summit on translational bioinformatics

自引率

0.00%

发文量