Gabriele Morabito, Valeria Lukaj, Armando Ruggeri, M. Fazio, Maria Annunziata Astone, M. Villari
{"title":"Docflow:监督多方法文档匿名化引擎","authors":"Gabriele Morabito, Valeria Lukaj, Armando Ruggeri, M. Fazio, Maria Annunziata Astone, M. Villari","doi":"10.1109/ISCC58397.2023.10218224","DOIUrl":null,"url":null,"abstract":"Nowadays the process of anonymization of documents has been the subject of several studies and debates. By anonymization of documents, we mean the process of replacing sensitive data in order to preserve the confidentiality of documents without altering their content. In this work, we introduce Docflow, an open-source document anonymization engine capable of anonymizing documents based on specific filters chosen by the user. We applied Docflow to anonymize a set of legal documents and performed a processing performance analysis. By providing a Markdown input file to be anonymized, Docflow is able to redact all information according to users' choices, preserving the document content. Docflow will be integrated with NLP algorithms for the generation of the Markdown source file starting from documents already processed in different formats, but always with human supervision in the loop.","PeriodicalId":265337,"journal":{"name":"2023 IEEE Symposium on Computers and Communications (ISCC)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Docflow: Supervised Multi-Method Document Anonymization Engine\",\"authors\":\"Gabriele Morabito, Valeria Lukaj, Armando Ruggeri, M. Fazio, Maria Annunziata Astone, M. Villari\",\"doi\":\"10.1109/ISCC58397.2023.10218224\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays the process of anonymization of documents has been the subject of several studies and debates. By anonymization of documents, we mean the process of replacing sensitive data in order to preserve the confidentiality of documents without altering their content. In this work, we introduce Docflow, an open-source document anonymization engine capable of anonymizing documents based on specific filters chosen by the user. We applied Docflow to anonymize a set of legal documents and performed a processing performance analysis. By providing a Markdown input file to be anonymized, Docflow is able to redact all information according to users' choices, preserving the document content. Docflow will be integrated with NLP algorithms for the generation of the Markdown source file starting from documents already processed in different formats, but always with human supervision in the loop.\",\"PeriodicalId\":265337,\"journal\":{\"name\":\"2023 IEEE Symposium on Computers and Communications (ISCC)\",\"volume\":\"184 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE Symposium on Computers and Communications (ISCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCC58397.2023.10218224\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Symposium on Computers and Communications (ISCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCC58397.2023.10218224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Nowadays the process of anonymization of documents has been the subject of several studies and debates. By anonymization of documents, we mean the process of replacing sensitive data in order to preserve the confidentiality of documents without altering their content. In this work, we introduce Docflow, an open-source document anonymization engine capable of anonymizing documents based on specific filters chosen by the user. We applied Docflow to anonymize a set of legal documents and performed a processing performance analysis. By providing a Markdown input file to be anonymized, Docflow is able to redact all information according to users' choices, preserving the document content. Docflow will be integrated with NLP algorithms for the generation of the Markdown source file starting from documents already processed in different formats, but always with human supervision in the loop.