文档处理:语义文本相似度分析方法

A. Qurashi, Violeta Holmes, Anju P. Johnson
{"title":"文档处理:语义文本相似度分析方法","authors":"A. Qurashi, Violeta Holmes, Anju P. Johnson","doi":"10.1109/INISTA49547.2020.9194665","DOIUrl":null,"url":null,"abstract":"The document text similarity measurement and analysis is a growing application of Natural Language Processing. This paper presents the results of using different techniques for semantic text similarity measurements in documents used for safety-critical systems. The research objective of this work is to measure the degree of semantic equivalence of multi-word sentences for rules and procedures contained in the documents on railway safety. These documents, with unstructured data and different formats, need to be preprocessed and cleaned before the set of Natural Language Processing toolkits, and Jaccard and Cosine similarity metrics are applied. The results demonstrate that it is feasible to automate the process of identifying equivalent rules and procedures and measure similarity of disparate safety-critical documents using Natural language processing and similarity measurement techniques.","PeriodicalId":124632,"journal":{"name":"2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Document Processing: Methods for Semantic Text Similarity Analysis\",\"authors\":\"A. Qurashi, Violeta Holmes, Anju P. Johnson\",\"doi\":\"10.1109/INISTA49547.2020.9194665\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The document text similarity measurement and analysis is a growing application of Natural Language Processing. This paper presents the results of using different techniques for semantic text similarity measurements in documents used for safety-critical systems. The research objective of this work is to measure the degree of semantic equivalence of multi-word sentences for rules and procedures contained in the documents on railway safety. These documents, with unstructured data and different formats, need to be preprocessed and cleaned before the set of Natural Language Processing toolkits, and Jaccard and Cosine similarity metrics are applied. The results demonstrate that it is feasible to automate the process of identifying equivalent rules and procedures and measure similarity of disparate safety-critical documents using Natural language processing and similarity measurement techniques.\",\"PeriodicalId\":124632,\"journal\":{\"name\":\"2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INISTA49547.2020.9194665\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INISTA49547.2020.9194665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

摘要

文档文本相似度测量与分析是自然语言处理的一个新兴应用。本文介绍了在用于安全关键系统的文档中使用不同技术进行语义文本相似度测量的结果。本文的研究目的是测量铁路安全文件中包含的规则和程序的多词句的语义等价程度。这些具有非结构化数据和不同格式的文档需要在使用自然语言处理工具包集之前进行预处理和清理,并应用Jaccard和余弦相似度度量。结果表明,利用自然语言处理和相似度测量技术自动化识别等效规则和程序的过程以及度量不同安全关键文档的相似度是可行的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Document Processing: Methods for Semantic Text Similarity Analysis
The document text similarity measurement and analysis is a growing application of Natural Language Processing. This paper presents the results of using different techniques for semantic text similarity measurements in documents used for safety-critical systems. The research objective of this work is to measure the degree of semantic equivalence of multi-word sentences for rules and procedures contained in the documents on railway safety. These documents, with unstructured data and different formats, need to be preprocessed and cleaned before the set of Natural Language Processing toolkits, and Jaccard and Cosine similarity metrics are applied. The results demonstrate that it is feasible to automate the process of identifying equivalent rules and procedures and measure similarity of disparate safety-critical documents using Natural language processing and similarity measurement techniques.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信