基于链接语法的可解释性自然语言切分

2020 Science and Artificial Intelligence conference (S.A.I.ence) Pub Date : 2020-11-14 DOI:10.1109/S.A.I.ence50533.2020.9303220

Vignav Ramesh, A. Kolonin

{"title":"基于链接语法的可解释性自然语言切分","authors":"Vignav Ramesh, A. Kolonin","doi":"10.1109/S.A.I.ence50533.2020.9303220","DOIUrl":null,"url":null,"abstract":"Natural language segmentation (NLS), or text segmentation, refers to the process of dividing written text into meaningful units. Sentence segmentation, a subfield of text segmentation, is the problem of dividing a string of natural language text into its component sentences. Current methods of sentence segmentation are often either hardcoded—they require manual implementation of fixed grammar and segmentation rules—or require extensive training on labeled corpora and are not explainable—they are \"black box\" algorithms that cannot be understood by humans. In this paper, we present a novel explainable sentence segmentation method capable of separating bodies of text into grammatically valid sentences solely based on the grammatical relationships between individual words or tokens. The proposed NLS architecture can both automate the input query parsing and semantic query execution components of voice-activated question answering and information retrieval systems as well as enable automatic summarization, entity extraction, sentiment identification, and a variety of other natural language processing (NLP) algorithms that operate at the sentential level.","PeriodicalId":201402,"journal":{"name":"2020 Science and Artificial Intelligence conference (S.A.I.ence)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Interpretable Natural Language Segmentation Based on Link Grammar\",\"authors\":\"Vignav Ramesh, A. Kolonin\",\"doi\":\"10.1109/S.A.I.ence50533.2020.9303220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Natural language segmentation (NLS), or text segmentation, refers to the process of dividing written text into meaningful units. Sentence segmentation, a subfield of text segmentation, is the problem of dividing a string of natural language text into its component sentences. Current methods of sentence segmentation are often either hardcoded—they require manual implementation of fixed grammar and segmentation rules—or require extensive training on labeled corpora and are not explainable—they are \\\"black box\\\" algorithms that cannot be understood by humans. In this paper, we present a novel explainable sentence segmentation method capable of separating bodies of text into grammatically valid sentences solely based on the grammatical relationships between individual words or tokens. The proposed NLS architecture can both automate the input query parsing and semantic query execution components of voice-activated question answering and information retrieval systems as well as enable automatic summarization, entity extraction, sentiment identification, and a variety of other natural language processing (NLP) algorithms that operate at the sentential level.\",\"PeriodicalId\":201402,\"journal\":{\"name\":\"2020 Science and Artificial Intelligence conference (S.A.I.ence)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Science and Artificial Intelligence conference (S.A.I.ence)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/S.A.I.ence50533.2020.9303220\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Science and Artificial Intelligence conference (S.A.I.ence)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/S.A.I.ence50533.2020.9303220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

自然语言分词(NLS)，又称文本分词，是指将书面文本分割成有意义的单位的过程。句子切分是文本切分的一个分支，是将一串自然语言文本分割成其组成句子的问题。当前的句子切分方法通常要么是硬编码的——它们需要手动实现固定的语法和切分规则——要么是需要在标记的语料库上进行大量训练，并且无法解释——它们是人类无法理解的“黑匣子”算法。在本文中，我们提出了一种新的可解释的句子切分方法，该方法能够仅基于单个单词或标记之间的语法关系将文本主体分离为语法有效的句子。所提出的NLS架构既可以自动化语音激活问答和信息检索系统的输入查询解析和语义查询执行组件，也可以实现自动摘要、实体提取、情感识别和其他各种在句子级别操作的自然语言处理(NLP)算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Interpretable Natural Language Segmentation Based on Link Grammar

Natural language segmentation (NLS), or text segmentation, refers to the process of dividing written text into meaningful units. Sentence segmentation, a subfield of text segmentation, is the problem of dividing a string of natural language text into its component sentences. Current methods of sentence segmentation are often either hardcoded—they require manual implementation of fixed grammar and segmentation rules—or require extensive training on labeled corpora and are not explainable—they are "black box" algorithms that cannot be understood by humans. In this paper, we present a novel explainable sentence segmentation method capable of separating bodies of text into grammatically valid sentences solely based on the grammatical relationships between individual words or tokens. The proposed NLS architecture can both automate the input query parsing and semantic query execution components of voice-activated question answering and information retrieval systems as well as enable automatic summarization, entity extraction, sentiment identification, and a variety of other natural language processing (NLP) algorithms that operate at the sentential level.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 Science and Artificial Intelligence conference (S.A.I.ence)

自引率

0.00%

发文量