{"title":"利用深度学习模型从 Twitter 上检测意大利滑坡灾害信息","authors":"Rachele Franceschini, Ascanio Rosi, Filippo Catani, Nicola Casagli","doi":"10.1186/s40677-024-00279-4","DOIUrl":null,"url":null,"abstract":"Mass media are a new and important source of information for any natural disaster, mass emergency, pandemic, economic or political event, or extreme weather event affecting one or more communities in a country. Several techniques have been developed for data mining in social media for many natural events, but few of them have been applied to the automatic extraction of landslide events. In this study, Twitter has been investigated to detect data about landslide events in Italian-language. The main aim is to obtain an automatic text classification on the basis of information about natural hazards. The text classification for landslide events in Italian-language has still not been applied to detect this type of natural hazard. Over 13,000 data were extracted within Twitter considering five keywords referring to landslide events. The dataset was classified manually, providing a solid base for applying deep learning. The combination of BERT + CNN has been chosen for text classification and two different pre-processing approaches and bert-model have been applied. BERT-multicase + CNN without preprocessing archived the highest values of accuracy, equal to 96% and AUC of 0.96. Two advantages resulted from this studio: the Italian-language classified dataset for landslide events fills that present gap of analysing natural events using Twitter. BERT + CNN was trained to detect this information and proved to be an excellent classifier for the Italian language for landslide events.","PeriodicalId":37025,"journal":{"name":"Geoenvironmental Disasters","volume":"46 1","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Detecting information from Twitter on landslide hazards in Italy using deep learning models\",\"authors\":\"Rachele Franceschini, Ascanio Rosi, Filippo Catani, Nicola Casagli\",\"doi\":\"10.1186/s40677-024-00279-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mass media are a new and important source of information for any natural disaster, mass emergency, pandemic, economic or political event, or extreme weather event affecting one or more communities in a country. Several techniques have been developed for data mining in social media for many natural events, but few of them have been applied to the automatic extraction of landslide events. In this study, Twitter has been investigated to detect data about landslide events in Italian-language. The main aim is to obtain an automatic text classification on the basis of information about natural hazards. The text classification for landslide events in Italian-language has still not been applied to detect this type of natural hazard. Over 13,000 data were extracted within Twitter considering five keywords referring to landslide events. The dataset was classified manually, providing a solid base for applying deep learning. The combination of BERT + CNN has been chosen for text classification and two different pre-processing approaches and bert-model have been applied. BERT-multicase + CNN without preprocessing archived the highest values of accuracy, equal to 96% and AUC of 0.96. Two advantages resulted from this studio: the Italian-language classified dataset for landslide events fills that present gap of analysing natural events using Twitter. BERT + CNN was trained to detect this information and proved to be an excellent classifier for the Italian language for landslide events.\",\"PeriodicalId\":37025,\"journal\":{\"name\":\"Geoenvironmental Disasters\",\"volume\":\"46 1\",\"pages\":\"\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Geoenvironmental Disasters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s40677-024-00279-4\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENVIRONMENTAL SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoenvironmental Disasters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s40677-024-00279-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
Detecting information from Twitter on landslide hazards in Italy using deep learning models
Mass media are a new and important source of information for any natural disaster, mass emergency, pandemic, economic or political event, or extreme weather event affecting one or more communities in a country. Several techniques have been developed for data mining in social media for many natural events, but few of them have been applied to the automatic extraction of landslide events. In this study, Twitter has been investigated to detect data about landslide events in Italian-language. The main aim is to obtain an automatic text classification on the basis of information about natural hazards. The text classification for landslide events in Italian-language has still not been applied to detect this type of natural hazard. Over 13,000 data were extracted within Twitter considering five keywords referring to landslide events. The dataset was classified manually, providing a solid base for applying deep learning. The combination of BERT + CNN has been chosen for text classification and two different pre-processing approaches and bert-model have been applied. BERT-multicase + CNN without preprocessing archived the highest values of accuracy, equal to 96% and AUC of 0.96. Two advantages resulted from this studio: the Italian-language classified dataset for landslide events fills that present gap of analysing natural events using Twitter. BERT + CNN was trained to detect this information and proved to be an excellent classifier for the Italian language for landslide events.
期刊介绍:
Geoenvironmental Disasters is an international journal with a focus on multi-disciplinary applied and fundamental research and the effects and impacts on infrastructure, society and the environment of geoenvironmental disasters triggered by various types of geo-hazards (e.g. earthquakes, volcanic activity, landslides, tsunamis, intensive erosion and hydro-meteorological events).
The integrated study of Geoenvironmental Disasters is an emerging and composite field of research interfacing with areas traditionally within civil engineering, earth sciences, atmospheric sciences and the life sciences. It centers on the interactions within and between the Earth''s ground, air and water environments, all of which are affected by climate, geological, morphological and anthropological processes; and biological and ecological cycles. Disasters are dynamic forces which can change the Earth pervasively, rapidly, or abruptly, and which can generate lasting effects on the natural and built environments.
The journal publishes research papers, case studies and quick reports of recent geoenvironmental disasters, review papers and technical reports of various geoenvironmental disaster-related case studies. The focus on case studies and quick reports of recent geoenvironmental disasters helps to advance the practical understanding of geoenvironmental disasters and to inform future research priorities; they are a major component of the journal. The journal aims for the rapid publication of research papers at a high scientific level. The journal welcomes proposals for special issues reflecting the trends in geoenvironmental disaster reduction and monothematic issues. Researchers and practitioners are encouraged to submit original, unpublished contributions.