Extraction of Lexical Bundles used in Natural Language Processing Articles

2019 International Conference on Advanced Computer Science and information Systems (ICACSIS) Pub Date : 2019-10-01 DOI:10.1109/ICACSIS47736.2019.8979950

Chooi-Ling Goh, Y. Lepage

引用次数: 2

Abstract

Lexical bundles are indispensable for fluent academic writing. They might not constitute complete structural units but they occur very frequently in academic conversations, conference presentations and scientific articles. This paper shows how to collect a large database of lexical bundles from articles in the Natural Language Processing (NLP) domain. We first collect highly frequent N-grams from the ACL-ARC collection of NLP articles and then classify them into true or false lexical bundles using machine learning models trained from a set of manually checked bundles. In a verification experiment, our best model achieves an accuracy of 76 %. Using this model, we extract more than 18,000 lexical bundles from the ACL-ARC corpus, which we publicly release.

查看原文本刊更多论文

自然语言处理文章中词汇束的提取

词汇束对于流畅的学术写作是不可或缺的。它们可能不构成完整的结构单元，但它们经常出现在学术对话、会议报告和科学文章中。本文介绍了如何从自然语言处理(NLP)领域的文章中收集一个大型词汇束数据库。我们首先从NLP文章的ACL-ARC集合中收集高度频繁的n -gram，然后使用从一组手动检查的束中训练的机器学习模型将它们分类为真或假词汇束。在验证实验中，我们的最佳模型达到了76%的准确率。使用这个模型，我们从公开发布的ACL-ARC语料库中提取了超过18,000个词汇包。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on Advanced Computer Science and information Systems (ICACSIS)

自引率

0.00%

发文量