{"title":"阿姆哈拉语文本敏感个人信息检测与分类模型设计","authors":"A. Genetu, Tesfa Tegegne","doi":"10.1109/ict4da53266.2021.9672227","DOIUrl":null,"url":null,"abstract":"Sensitive information is a classified type of content that should not be disclosed to the public and that can harm the owner of the information if it is disclosed. To protect disclose of sensitive information first, it requires detecting the availability of sensitive information and its domain classification for further analysis. To the best of our knowledge, there is no work attempted for Amharic texts. Models developed for another language cannot be used for Amharic texts language because of morphology, grammar and semantics differences. To address these gaps, we have proposed a model for detecting and classifying personal sensitive information for Amharic texts. We have experimented with three deep learning algorithms: Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BI-LSTM) and Convolutional Neural Network (CNN) using 7.31K and 6.697K Amharic sentences for sensitivity detection and domain classification respectively. The accuracy of LSTM, BI-LSTM and CNN was 82%, 90% and 87% respectively for sensitivity classification and 88, 93, 89 respectively for domain classification.","PeriodicalId":371663,"journal":{"name":"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Designing Sensitive Personal Information Detection and Classification Model for Amharic Text\",\"authors\":\"A. Genetu, Tesfa Tegegne\",\"doi\":\"10.1109/ict4da53266.2021.9672227\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sensitive information is a classified type of content that should not be disclosed to the public and that can harm the owner of the information if it is disclosed. To protect disclose of sensitive information first, it requires detecting the availability of sensitive information and its domain classification for further analysis. To the best of our knowledge, there is no work attempted for Amharic texts. Models developed for another language cannot be used for Amharic texts language because of morphology, grammar and semantics differences. To address these gaps, we have proposed a model for detecting and classifying personal sensitive information for Amharic texts. We have experimented with three deep learning algorithms: Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BI-LSTM) and Convolutional Neural Network (CNN) using 7.31K and 6.697K Amharic sentences for sensitivity detection and domain classification respectively. The accuracy of LSTM, BI-LSTM and CNN was 82%, 90% and 87% respectively for sensitivity classification and 88, 93, 89 respectively for domain classification.\",\"PeriodicalId\":371663,\"journal\":{\"name\":\"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ict4da53266.2021.9672227\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ict4da53266.2021.9672227","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Designing Sensitive Personal Information Detection and Classification Model for Amharic Text
Sensitive information is a classified type of content that should not be disclosed to the public and that can harm the owner of the information if it is disclosed. To protect disclose of sensitive information first, it requires detecting the availability of sensitive information and its domain classification for further analysis. To the best of our knowledge, there is no work attempted for Amharic texts. Models developed for another language cannot be used for Amharic texts language because of morphology, grammar and semantics differences. To address these gaps, we have proposed a model for detecting and classifying personal sensitive information for Amharic texts. We have experimented with three deep learning algorithms: Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BI-LSTM) and Convolutional Neural Network (CNN) using 7.31K and 6.697K Amharic sentences for sensitivity detection and domain classification respectively. The accuracy of LSTM, BI-LSTM and CNN was 82%, 90% and 87% respectively for sensitivity classification and 88, 93, 89 respectively for domain classification.