Y. Priyadarshana, L. Ranathunga, C. Amalraj, I. Perera
{"title":"HelaNER: A Novel Approach for Nested Named Entity Boundary Detection","authors":"Y. Priyadarshana, L. Ranathunga, C. Amalraj, I. Perera","doi":"10.1109/EUROCON52738.2021.9535565","DOIUrl":null,"url":null,"abstract":"Named entity recognition (NER) is a prominent task in identifying text spans to specific types. Named entity boundary detection can be mentioned as a rising research area under NER. Although a limited work has been conducted for nested NE boundary detection, flat NE boundary detection can be considered as at a pinnacle stage. Nested NE boundary detection is an important aspect in information extraction, information retrieval, event extraction, sentiment analysis etc. On the other hand, spreading religious unhealthy statements through social media has become a burden for the wellbeing of the society. The prime objective of this research is to implement a novel system for nested NE boundary detection for Sinhala language considering religious unhealthy statements in social media. A constructive literature survey has been conducted for analyzing the already developed NE type and boundary detection approaches and systems. Along with that, identifying the linguistic structures and patterns of Sinhala hate speech detection has been conducted. A corpus of more than 100,000 Sinhala hates speech contents have been extracted, preprocessed, and annotated by an expert panel. Then, a deep neural approach has been applied for capturing the complexity indexes, matrices, and other related elements of the corpus. Next, a novel approach called \"boundary bubbles\" has been conducted for capturing word representation, head word detection, entity mention nuggets identification and region classification for NE boundary detection. Experiments reveal that our scientific novel approach has achieved the state-of-art performance over the existing baselines.","PeriodicalId":328338,"journal":{"name":"IEEE EUROCON 2021 - 19th International Conference on Smart Technologies","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE EUROCON 2021 - 19th International Conference on Smart Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EUROCON52738.2021.9535565","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Named entity recognition (NER) is a prominent task in identifying text spans to specific types. Named entity boundary detection can be mentioned as a rising research area under NER. Although a limited work has been conducted for nested NE boundary detection, flat NE boundary detection can be considered as at a pinnacle stage. Nested NE boundary detection is an important aspect in information extraction, information retrieval, event extraction, sentiment analysis etc. On the other hand, spreading religious unhealthy statements through social media has become a burden for the wellbeing of the society. The prime objective of this research is to implement a novel system for nested NE boundary detection for Sinhala language considering religious unhealthy statements in social media. A constructive literature survey has been conducted for analyzing the already developed NE type and boundary detection approaches and systems. Along with that, identifying the linguistic structures and patterns of Sinhala hate speech detection has been conducted. A corpus of more than 100,000 Sinhala hates speech contents have been extracted, preprocessed, and annotated by an expert panel. Then, a deep neural approach has been applied for capturing the complexity indexes, matrices, and other related elements of the corpus. Next, a novel approach called "boundary bubbles" has been conducted for capturing word representation, head word detection, entity mention nuggets identification and region classification for NE boundary detection. Experiments reveal that our scientific novel approach has achieved the state-of-art performance over the existing baselines.