E. Ferreira, H. Rausch, S. Campos, A. Faria-Campos, Enio Pietra, Lílian Silva dos Santos
{"title":"医学数据挖掘:副球孢子菌病患者数据库的案例研究","authors":"E. Ferreira, H. Rausch, S. Campos, A. Faria-Campos, Enio Pietra, Lílian Silva dos Santos","doi":"10.1109/HealthCom.2014.7001854","DOIUrl":null,"url":null,"abstract":"Data mining applied to medical databases is a challenging process. The unavailability of large sources of data and data complexity are some of the difficulties encountered. This is especially true for rare and neglected diseases. Those databases are, in general, relatively small, wide and sparse, making them very challenging to analyze. There are also ethical, legal and social issues regarding privacy and clinical validation of the findings. This work proposes a way of dealing with this challenge with a case study of data mining applied in a Paracoccidioidomycosis (PCM) patients database. Paracoccidioidomycosis (PCM) is a typical Brazilian disease, caused by the yeast Paracoccidioides brasiliensis. This disease represents an important Public Health issue, due to its high incapacitating potential and the amount of premature deaths it causes if untreated. This paper discusses methods for the analysis of this complex dataset, to help increase the understanding of both the disease and this type of data. Despite the challenges of the dataset, some interesting findings were made being: flaws in form filling protocols, notably the lack of chest X-ray in 40% of the records; the discovery of a possible new relation between smoking habits and PCM evolution time. The average evolution time for smoking patients was 2.8 times longer; the successful classification/prediction of the cutaneous form of the disease with a 93% precision rate are some of the discoveries made.","PeriodicalId":269964,"journal":{"name":"2014 IEEE 16th International Conference on e-Health Networking, Applications and Services (Healthcom)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Medical data mining: A case study of a Paracoccidioidomycosis patient's database\",\"authors\":\"E. Ferreira, H. Rausch, S. Campos, A. Faria-Campos, Enio Pietra, Lílian Silva dos Santos\",\"doi\":\"10.1109/HealthCom.2014.7001854\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data mining applied to medical databases is a challenging process. The unavailability of large sources of data and data complexity are some of the difficulties encountered. This is especially true for rare and neglected diseases. Those databases are, in general, relatively small, wide and sparse, making them very challenging to analyze. There are also ethical, legal and social issues regarding privacy and clinical validation of the findings. This work proposes a way of dealing with this challenge with a case study of data mining applied in a Paracoccidioidomycosis (PCM) patients database. Paracoccidioidomycosis (PCM) is a typical Brazilian disease, caused by the yeast Paracoccidioides brasiliensis. This disease represents an important Public Health issue, due to its high incapacitating potential and the amount of premature deaths it causes if untreated. This paper discusses methods for the analysis of this complex dataset, to help increase the understanding of both the disease and this type of data. Despite the challenges of the dataset, some interesting findings were made being: flaws in form filling protocols, notably the lack of chest X-ray in 40% of the records; the discovery of a possible new relation between smoking habits and PCM evolution time. The average evolution time for smoking patients was 2.8 times longer; the successful classification/prediction of the cutaneous form of the disease with a 93% precision rate are some of the discoveries made.\",\"PeriodicalId\":269964,\"journal\":{\"name\":\"2014 IEEE 16th International Conference on e-Health Networking, Applications and Services (Healthcom)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 16th International Conference on e-Health Networking, Applications and Services (Healthcom)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HealthCom.2014.7001854\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 16th International Conference on e-Health Networking, Applications and Services (Healthcom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HealthCom.2014.7001854","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Medical data mining: A case study of a Paracoccidioidomycosis patient's database
Data mining applied to medical databases is a challenging process. The unavailability of large sources of data and data complexity are some of the difficulties encountered. This is especially true for rare and neglected diseases. Those databases are, in general, relatively small, wide and sparse, making them very challenging to analyze. There are also ethical, legal and social issues regarding privacy and clinical validation of the findings. This work proposes a way of dealing with this challenge with a case study of data mining applied in a Paracoccidioidomycosis (PCM) patients database. Paracoccidioidomycosis (PCM) is a typical Brazilian disease, caused by the yeast Paracoccidioides brasiliensis. This disease represents an important Public Health issue, due to its high incapacitating potential and the amount of premature deaths it causes if untreated. This paper discusses methods for the analysis of this complex dataset, to help increase the understanding of both the disease and this type of data. Despite the challenges of the dataset, some interesting findings were made being: flaws in form filling protocols, notably the lack of chest X-ray in 40% of the records; the discovery of a possible new relation between smoking habits and PCM evolution time. The average evolution time for smoking patients was 2.8 times longer; the successful classification/prediction of the cutaneous form of the disease with a 93% precision rate are some of the discoveries made.