{"title":"具有可扩展性的物种分布建模:基于规则集生成的并行遗传算法P-GARP的案例研究","authors":"F. Santana, C. Pariente, A. Saraiva","doi":"10.1109/IRI.2017.93","DOIUrl":null,"url":null,"abstract":"Species distribution modeling (SDM) calculates a species’ probabilistic distribution by combining Environmental raster layers with species datasets. Such models can help to answer complex questions in Ecology/Biology/Health, e.g., by calculating impacts of climate changes in Biodiversity, or the potential for a disease spread (vectors’ modeling). Machine learning is largely applied in SDM, being the Genetic Algorithm for Rule-set Production (GARP) one of the most reliable solutions. However, GARP’s convergence needs to speedup under certain conditions (high resolution or number of layers), for which this paper proposes P-GARP, a parallel, scalable implementation of GARP. P-GARP was implemented onto a SGI Altix XE 1300 cluster with 2 quad-core processors/node. Preliminary results show an expressive 3.2/node speedup. Premature convergence is not observed in PGARP and its accuracy is very similar to GARP´s. Effective solutions to improve this speedup in even larger scale are proposed, along with a discussion about P-GARP correctness and efficiency.","PeriodicalId":254330,"journal":{"name":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Species Distribution Modeling with Scalability: The Case Study of P-GARP, a Parallel Genetic Algorithm for Rule-Set Production\",\"authors\":\"F. Santana, C. Pariente, A. Saraiva\",\"doi\":\"10.1109/IRI.2017.93\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Species distribution modeling (SDM) calculates a species’ probabilistic distribution by combining Environmental raster layers with species datasets. Such models can help to answer complex questions in Ecology/Biology/Health, e.g., by calculating impacts of climate changes in Biodiversity, or the potential for a disease spread (vectors’ modeling). Machine learning is largely applied in SDM, being the Genetic Algorithm for Rule-set Production (GARP) one of the most reliable solutions. However, GARP’s convergence needs to speedup under certain conditions (high resolution or number of layers), for which this paper proposes P-GARP, a parallel, scalable implementation of GARP. P-GARP was implemented onto a SGI Altix XE 1300 cluster with 2 quad-core processors/node. Preliminary results show an expressive 3.2/node speedup. Premature convergence is not observed in PGARP and its accuracy is very similar to GARP´s. Effective solutions to improve this speedup in even larger scale are proposed, along with a discussion about P-GARP correctness and efficiency.\",\"PeriodicalId\":254330,\"journal\":{\"name\":\"2017 IEEE International Conference on Information Reuse and Integration (IRI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Information Reuse and Integration (IRI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRI.2017.93\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2017.93","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
物种分布模型(SDM)通过将环境栅格层与物种数据集相结合来计算物种的概率分布。这些模型可以帮助回答生态学/生物学/卫生学中的复杂问题,例如,通过计算气候变化对生物多样性的影响,或疾病传播的可能性(媒介建模)。机器学习在SDM中得到了广泛的应用,是GARP (Genetic Algorithm for Rule-set Production)中最可靠的解决方案之一。然而,GARP的收敛速度在一定条件下(高分辨率或多层)需要加快,为此本文提出了GARP的并行、可扩展实现P-GARP。P-GARP是在一个具有2个四核处理器/节点的SGI Altix XE 1300集群上实现的。初步结果显示了显著的3.2/节点加速。PGARP不存在过早收敛现象,其精度与GARP非常接近。提出了在更大范围内提高这种加速的有效解决方案,并讨论了P-GARP的正确性和效率。
Species Distribution Modeling with Scalability: The Case Study of P-GARP, a Parallel Genetic Algorithm for Rule-Set Production
Species distribution modeling (SDM) calculates a species’ probabilistic distribution by combining Environmental raster layers with species datasets. Such models can help to answer complex questions in Ecology/Biology/Health, e.g., by calculating impacts of climate changes in Biodiversity, or the potential for a disease spread (vectors’ modeling). Machine learning is largely applied in SDM, being the Genetic Algorithm for Rule-set Production (GARP) one of the most reliable solutions. However, GARP’s convergence needs to speedup under certain conditions (high resolution or number of layers), for which this paper proposes P-GARP, a parallel, scalable implementation of GARP. P-GARP was implemented onto a SGI Altix XE 1300 cluster with 2 quad-core processors/node. Preliminary results show an expressive 3.2/node speedup. Premature convergence is not observed in PGARP and its accuracy is very similar to GARP´s. Effective solutions to improve this speedup in even larger scale are proposed, along with a discussion about P-GARP correctness and efficiency.