Relation recognition among named entities from a crime corpus using a web-based semantic similarity measurement

Priyanka Das, A. Das
{"title":"Relation recognition among named entities from a crime corpus using a web-based semantic similarity measurement","authors":"Priyanka Das, A. Das","doi":"10.1109/ICRCICN.2017.8234525","DOIUrl":null,"url":null,"abstract":"The present work proposes an unsupervised approach for recognising relations between named entities from a large corpora based on crime in Indian states and union territories. Initially, named entities have been identified from the extracted crime corpus and certain pair of entities have been chosen that facilitates the crime analysis. Then the entity pairs with their intermediate context words have been represented as a shallow parse tree for relation instance. From the parse trees, only the head words (in each entity pair) reflecting the main meaning of the phrases has been considered for measuring a semantic similarity using a web search engine that retrieves the page count of those particular words and their conjunctives. The derived page count is used for measuring the Simpson Coefficient between the pairs and based on this similarity score, an agglomerative hierarchical clustering technique has been applied that makes several clusters of entity pairs of same relationship. The resultant clusters also have been characterised with the most frequent head word present in the group. This proposed method shows a simple similarity measure technique for relation extraction from crime data providing better accuracy than other existing methods.","PeriodicalId":166298,"journal":{"name":"2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRCICN.2017.8234525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The present work proposes an unsupervised approach for recognising relations between named entities from a large corpora based on crime in Indian states and union territories. Initially, named entities have been identified from the extracted crime corpus and certain pair of entities have been chosen that facilitates the crime analysis. Then the entity pairs with their intermediate context words have been represented as a shallow parse tree for relation instance. From the parse trees, only the head words (in each entity pair) reflecting the main meaning of the phrases has been considered for measuring a semantic similarity using a web search engine that retrieves the page count of those particular words and their conjunctives. The derived page count is used for measuring the Simpson Coefficient between the pairs and based on this similarity score, an agglomerative hierarchical clustering technique has been applied that makes several clusters of entity pairs of same relationship. The resultant clusters also have been characterised with the most frequent head word present in the group. This proposed method shows a simple similarity measure technique for relation extraction from crime data providing better accuracy than other existing methods.
使用基于web的语义相似度度量来识别犯罪语料库中命名实体之间的关系
目前的工作提出了一种无监督的方法,用于识别基于印度各州和联邦领土犯罪的大型语料库中指定实体之间的关系。首先,从提取的犯罪语料库中识别出命名实体,并选择特定的实体对以方便犯罪分析。然后将实体对及其中间上下文词表示为关系实例的浅解析树。从解析树中,只考虑反映短语主要含义的头词(在每个实体对中),以便使用检索这些特定词及其连词的页面计数的web搜索引擎测量语义相似性。导出的页面计数用于测量对之间的辛普森系数,并基于此相似性得分,应用了一种凝聚分层聚类技术,使具有相同关系的实体对组成多个聚类。由此产生的集群也具有在组中出现的最频繁的头部词的特征。该方法提供了一种简单的相似性度量技术,可用于犯罪数据的关联提取,比现有方法具有更高的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信