Samia Loucif, Murad Al-Rajab, Raed Abu Zitar, Mahmoud Rezk
{"title":"实现全球月历:新月能见度预测的机器学习驱动方法","authors":"Samia Loucif, Murad Al-Rajab, Raed Abu Zitar, Mahmoud Rezk","doi":"10.1186/s40537-024-00979-6","DOIUrl":null,"url":null,"abstract":"<p>This paper presents a comprehensive approach to harmonizing lunar calendars across different global regions, addressing the long-standing challenge of variations in new crescent Moon sightings that mark the beginning of lunar months. We propose a machine learning (ML)-based framework to predict the visibility of the new crescent Moon, representing a significant advancement toward a globally unified lunar calendar. Our study utilized a dataset covering various countries globally, making it the first to analyze all 12 lunar months over a span of 13 years. We applied a wide array of ML algorithms and techniques. These techniques included feature selection, hyperparameter tuning, ensemble learning, and region-based clustering, all aimed at maximizing the model’s performance. The overall results reveal that the gradient boosting (GB) model surpasses all other models, achieving the highest F1 score of 0.882469 and an area under the curve (AUC) of 0.901009. However, with selected features identified through the ANOVA F-test and optimized parameters, the Extra Trees model exhibited the best performance with an F1 score of 0.887872, and an AUC of 0.906242. We expanded our analysis to explore ensemble models, aiming to understand how a combination of models might boost predictive accuracy. The Ensemble Model exhibited a slight improvement, with an F1 score of 0.888058 and an AUC of 0.907482. Additionally, the geographical segmentation of the dataset enhanced predictive performance in certain areas, such as Africa and Asia. In conclusion, ML techniques can provide efficient and reliable tool for predicting the new crescent Moon visibility that would support the decisions of marking the beginning of new lunar months.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"4 1","pages":""},"PeriodicalIF":8.6000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Toward a globally lunar calendar: a machine learning-driven approach for crescent moon visibility prediction\",\"authors\":\"Samia Loucif, Murad Al-Rajab, Raed Abu Zitar, Mahmoud Rezk\",\"doi\":\"10.1186/s40537-024-00979-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>This paper presents a comprehensive approach to harmonizing lunar calendars across different global regions, addressing the long-standing challenge of variations in new crescent Moon sightings that mark the beginning of lunar months. We propose a machine learning (ML)-based framework to predict the visibility of the new crescent Moon, representing a significant advancement toward a globally unified lunar calendar. Our study utilized a dataset covering various countries globally, making it the first to analyze all 12 lunar months over a span of 13 years. We applied a wide array of ML algorithms and techniques. These techniques included feature selection, hyperparameter tuning, ensemble learning, and region-based clustering, all aimed at maximizing the model’s performance. The overall results reveal that the gradient boosting (GB) model surpasses all other models, achieving the highest F1 score of 0.882469 and an area under the curve (AUC) of 0.901009. However, with selected features identified through the ANOVA F-test and optimized parameters, the Extra Trees model exhibited the best performance with an F1 score of 0.887872, and an AUC of 0.906242. We expanded our analysis to explore ensemble models, aiming to understand how a combination of models might boost predictive accuracy. The Ensemble Model exhibited a slight improvement, with an F1 score of 0.888058 and an AUC of 0.907482. Additionally, the geographical segmentation of the dataset enhanced predictive performance in certain areas, such as Africa and Asia. In conclusion, ML techniques can provide efficient and reliable tool for predicting the new crescent Moon visibility that would support the decisions of marking the beginning of new lunar months.</p>\",\"PeriodicalId\":15158,\"journal\":{\"name\":\"Journal of Big Data\",\"volume\":\"4 1\",\"pages\":\"\"},\"PeriodicalIF\":8.6000,\"publicationDate\":\"2024-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Big Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1186/s40537-024-00979-6\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Big Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s40537-024-00979-6","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
摘要
本文提出了一种协调全球不同地区农历的综合方法,以解决标志着农历月份开始的新月视线变化这一长期存在的难题。我们提出了一个基于机器学习(ML)的框架来预测新月的能见度,这代表着向全球统一的农历迈进了一大步。我们的研究利用了一个涵盖全球多个国家的数据集,这也是首个对 13 年间所有 12 个农历月份进行分析的研究。我们应用了多种 ML 算法和技术。这些技术包括特征选择、超参数调整、集合学习和基于区域的聚类,所有这些都旨在最大限度地提高模型的性能。总体结果显示,梯度提升(GB)模型超越了所有其他模型,获得了最高的 F1 分数 0.882469 和曲线下面积(AUC)0.901009。然而,通过方差分析 F 检验和优化参数确定的选定特征,Extra Trees 模型表现出最佳性能,F1 得分为 0.887872,AUC 为 0.906242。我们扩大了分析范围,探索了集合模型,旨在了解模型组合如何提高预测准确性。集合模型略有改进,F1 得分为 0.888058,AUC 为 0.907482。此外,数据集的地理细分也提高了某些地区(如非洲和亚洲)的预测性能。总之,ML 技术可以为预测新月能见度提供高效、可靠的工具,从而为标记新月开始的决策提供支持。
Toward a globally lunar calendar: a machine learning-driven approach for crescent moon visibility prediction
This paper presents a comprehensive approach to harmonizing lunar calendars across different global regions, addressing the long-standing challenge of variations in new crescent Moon sightings that mark the beginning of lunar months. We propose a machine learning (ML)-based framework to predict the visibility of the new crescent Moon, representing a significant advancement toward a globally unified lunar calendar. Our study utilized a dataset covering various countries globally, making it the first to analyze all 12 lunar months over a span of 13 years. We applied a wide array of ML algorithms and techniques. These techniques included feature selection, hyperparameter tuning, ensemble learning, and region-based clustering, all aimed at maximizing the model’s performance. The overall results reveal that the gradient boosting (GB) model surpasses all other models, achieving the highest F1 score of 0.882469 and an area under the curve (AUC) of 0.901009. However, with selected features identified through the ANOVA F-test and optimized parameters, the Extra Trees model exhibited the best performance with an F1 score of 0.887872, and an AUC of 0.906242. We expanded our analysis to explore ensemble models, aiming to understand how a combination of models might boost predictive accuracy. The Ensemble Model exhibited a slight improvement, with an F1 score of 0.888058 and an AUC of 0.907482. Additionally, the geographical segmentation of the dataset enhanced predictive performance in certain areas, such as Africa and Asia. In conclusion, ML techniques can provide efficient and reliable tool for predicting the new crescent Moon visibility that would support the decisions of marking the beginning of new lunar months.
期刊介绍:
The Journal of Big Data publishes high-quality, scholarly research papers, methodologies, and case studies covering a broad spectrum of topics, from big data analytics to data-intensive computing and all applications of big data research. It addresses challenges facing big data today and in the future, including data capture and storage, search, sharing, analytics, technologies, visualization, architectures, data mining, machine learning, cloud computing, distributed systems, and scalable storage. The journal serves as a seminal source of innovative material for academic researchers and practitioners alike.