{"title":"Information Extraction from Indonesian Crime News with Named Entity Recognition","authors":"Roy Rachman Sedik, A. Romadhony","doi":"10.1109/KST57286.2023.10086789","DOIUrl":null,"url":null,"abstract":"Information Extraction on crime domain is the process of extracting information related to crime event. A prior study of crime information extraction on Indonesian text has been carried out by utilizing features from Part-of-Speech tagging and Dependency Parsing. However, there are some misclassifications, especially in location and date/time extraction. The misclassification is mainly due to the system was not able to identify several named entities. In this study, we propose a system capable of extracting criminal information on Indonesian online news by utilizing named entity recognition, with the focus to extract crime location and time. We use Support Vector Machine (SVM) to classify crime type. We evaluate the proposed system performance by comparing with the gold label. The test results show that crime type classification has an overall performance of 92%, the Crime Location Extraction has F1 score of 90.8%, and for Crime Date Extraction the F1 score is 94,1%. Based on analysis, improvement should be conducted especially on Crime Location extraction. Identification of various date time format is also important to be explored further.","PeriodicalId":351833,"journal":{"name":"2023 15th International Conference on Knowledge and Smart Technology (KST)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 15th International Conference on Knowledge and Smart Technology (KST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KST57286.2023.10086789","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Information Extraction on crime domain is the process of extracting information related to crime event. A prior study of crime information extraction on Indonesian text has been carried out by utilizing features from Part-of-Speech tagging and Dependency Parsing. However, there are some misclassifications, especially in location and date/time extraction. The misclassification is mainly due to the system was not able to identify several named entities. In this study, we propose a system capable of extracting criminal information on Indonesian online news by utilizing named entity recognition, with the focus to extract crime location and time. We use Support Vector Machine (SVM) to classify crime type. We evaluate the proposed system performance by comparing with the gold label. The test results show that crime type classification has an overall performance of 92%, the Crime Location Extraction has F1 score of 90.8%, and for Crime Date Extraction the F1 score is 94,1%. Based on analysis, improvement should be conducted especially on Crime Location extraction. Identification of various date time format is also important to be explored further.