{"title":"Quantitative Information Extraction from Humanitarian Documents","authors":"Daniele Liberatore, Kyriaki Kalimeri, Derya Sever, Yelena Mejova","doi":"arxiv-2408.04941","DOIUrl":null,"url":null,"abstract":"Humanitarian action is accompanied by a mass of reports, summaries, news, and\nother documents. To guide its activities, important information must be quickly\nextracted from such free-text resources. Quantities, such as the number of\npeople affected, amount of aid distributed, or the extent of infrastructure\ndamage, are central to emergency response and anticipatory action. In this\nwork, we contribute an annotated dataset for the humanitarian domain for the\nextraction of such quantitative information, along side its important context,\nincluding units it refers to, any modifiers, and the relevant event. Further,\nwe develop a custom Natural Language Processing pipeline to extract the\nquantities alongside their units, and evaluate it in comparison to baseline and\nrecent literature. The proposed model achieves a consistent improvement in the\nperformance, especially in the documents pertaining to the Dominican Republic\nand select African countries. We make the dataset and code available to the\nresearch community to continue the improvement of NLP tools for the\nhumanitarian domain.","PeriodicalId":501112,"journal":{"name":"arXiv - CS - Computers and Society","volume":"15 Suppl 1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computers and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04941","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Humanitarian action is accompanied by a mass of reports, summaries, news, and
other documents. To guide its activities, important information must be quickly
extracted from such free-text resources. Quantities, such as the number of
people affected, amount of aid distributed, or the extent of infrastructure
damage, are central to emergency response and anticipatory action. In this
work, we contribute an annotated dataset for the humanitarian domain for the
extraction of such quantitative information, along side its important context,
including units it refers to, any modifiers, and the relevant event. Further,
we develop a custom Natural Language Processing pipeline to extract the
quantities alongside their units, and evaluate it in comparison to baseline and
recent literature. The proposed model achieves a consistent improvement in the
performance, especially in the documents pertaining to the Dominican Republic
and select African countries. We make the dataset and code available to the
research community to continue the improvement of NLP tools for the
humanitarian domain.