自然语言处理文章中词汇束的提取

2019 International Conference on Advanced Computer Science and information Systems (ICACSIS) Pub Date : 2019-10-01 DOI:10.1109/ICACSIS47736.2019.8979950

Chooi-Ling Goh, Y. Lepage

{"title":"自然语言处理文章中词汇束的提取","authors":"Chooi-Ling Goh, Y. Lepage","doi":"10.1109/ICACSIS47736.2019.8979950","DOIUrl":null,"url":null,"abstract":"Lexical bundles are indispensable for fluent academic writing. They might not constitute complete structural units but they occur very frequently in academic conversations, conference presentations and scientific articles. This paper shows how to collect a large database of lexical bundles from articles in the Natural Language Processing (NLP) domain. We first collect highly frequent N-grams from the ACL-ARC collection of NLP articles and then classify them into true or false lexical bundles using machine learning models trained from a set of manually checked bundles. In a verification experiment, our best model achieves an accuracy of 76 %. Using this model, we extract more than 18,000 lexical bundles from the ACL-ARC corpus, which we publicly release.","PeriodicalId":165090,"journal":{"name":"2019 International Conference on Advanced Computer Science and information Systems (ICACSIS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Extraction of Lexical Bundles used in Natural Language Processing Articles\",\"authors\":\"Chooi-Ling Goh, Y. Lepage\",\"doi\":\"10.1109/ICACSIS47736.2019.8979950\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Lexical bundles are indispensable for fluent academic writing. They might not constitute complete structural units but they occur very frequently in academic conversations, conference presentations and scientific articles. This paper shows how to collect a large database of lexical bundles from articles in the Natural Language Processing (NLP) domain. We first collect highly frequent N-grams from the ACL-ARC collection of NLP articles and then classify them into true or false lexical bundles using machine learning models trained from a set of manually checked bundles. In a verification experiment, our best model achieves an accuracy of 76 %. Using this model, we extract more than 18,000 lexical bundles from the ACL-ARC corpus, which we publicly release.\",\"PeriodicalId\":165090,\"journal\":{\"name\":\"2019 International Conference on Advanced Computer Science and information Systems (ICACSIS)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Advanced Computer Science and information Systems (ICACSIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACSIS47736.2019.8979950\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Advanced Computer Science and information Systems (ICACSIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACSIS47736.2019.8979950","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

词汇束对于流畅的学术写作是不可或缺的。它们可能不构成完整的结构单元，但它们经常出现在学术对话、会议报告和科学文章中。本文介绍了如何从自然语言处理(NLP)领域的文章中收集一个大型词汇束数据库。我们首先从NLP文章的ACL-ARC集合中收集高度频繁的n -gram，然后使用从一组手动检查的束中训练的机器学习模型将它们分类为真或假词汇束。在验证实验中，我们的最佳模型达到了76%的准确率。使用这个模型，我们从公开发布的ACL-ARC语料库中提取了超过18,000个词汇包。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Extraction of Lexical Bundles used in Natural Language Processing Articles

Lexical bundles are indispensable for fluent academic writing. They might not constitute complete structural units but they occur very frequently in academic conversations, conference presentations and scientific articles. This paper shows how to collect a large database of lexical bundles from articles in the Natural Language Processing (NLP) domain. We first collect highly frequent N-grams from the ACL-ARC collection of NLP articles and then classify them into true or false lexical bundles using machine learning models trained from a set of manually checked bundles. In a verification experiment, our best model achieves an accuracy of 76 %. Using this model, we extract more than 18,000 lexical bundles from the ACL-ARC corpus, which we publicly release.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 International Conference on Advanced Computer Science and information Systems (ICACSIS)

自引率

0.00%

发文量