Johan Hallberg Szabadváry , Tuwe Löfström , Ulf Johansson , Cecilia Sönströd , Ernst Ahlberg , Lars Carlsson
{"title":"Classification with reject option: Distribution-free error guarantees via conformal prediction","authors":"Johan Hallberg Szabadváry , Tuwe Löfström , Ulf Johansson , Cecilia Sönströd , Ernst Ahlberg , Lars Carlsson","doi":"10.1016/j.mlwa.2025.100664","DOIUrl":"10.1016/j.mlwa.2025.100664","url":null,"abstract":"<div><div>Machine learning (ML) models always make a prediction, even when they are likely to be wrong. This causes problems in practical applications, as we do not know if we should trust a prediction. ML with reject option addresses this issue by abstaining from making a prediction if it is likely to be incorrect. In this work, we formalise the approach to ML with reject option in binary classification, deriving theoretical guarantees on the resulting error rate. This is achieved through conformal prediction (CP), which produce prediction sets with distribution-free validity guarantees. In binary classification, CP can output prediction sets containing exactly one, two or no labels. By accepting only the singleton predictions, we turn CP into a binary classifier with reject option.</div><div>Here, CP is formally put in the framework of predicting with reject option. We state and prove the resulting error rate, and give finite sample estimates. Numerical examples provide illustrations of derived error rate through several different conformal prediction settings, ranging from full conformal prediction to offline batch inductive conformal prediction. The former has a direct link to sharp validity guarantees, whereas the latter is more fuzzy in terms of validity guarantees but can be used in practice. Error-reject curves illustrate the trade-off between error rate and reject rate, and can serve to aid a user to set an acceptable error rate or reject rate in practice.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100664"},"PeriodicalIF":0.0,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143935733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongzhou Wang , Wenliang Lv , Weijie Wu , Guanheng Xie , BiBo Lu , ChunYang Wang , Chao Zhan , Baishun Su
{"title":"TransTab: A transformer-based approach for table detection and tabular data extraction from scanned document images","authors":"Yongzhou Wang , Wenliang Lv , Weijie Wu , Guanheng Xie , BiBo Lu , ChunYang Wang , Chao Zhan , Baishun Su","doi":"10.1016/j.mlwa.2025.100665","DOIUrl":"10.1016/j.mlwa.2025.100665","url":null,"abstract":"<div><div>Table detection and content extraction are crucial tasks in document analysis. Traditional convolutional neural network (CNN) methods often face limitations when dealing with complex tables, such as cross-column, cross-row, and multi-dimensional tables. Although existing methods have shown good performance in recognizing simpler tables, the model’s effectiveness often falls short of meeting practical application needs in the case of complex layouts. The structural intricacy of tables requires more advanced recognition and extraction strategies, particularly in the precise localization and extraction of rows and columns. To address the shortcomings of traditional methods in handling complex table structures, this paper proposes an end-to-end document table detection and content extraction method based on Transformer, named TransTab. TransTab effectively overcomes the limitations of traditional CNN approaches by incorporating Vision Transformer (ViT) into the table recognition task, enabling it to handle complex table structures across columns and rows. The self-attention mechanism of ViT allows the model to capture long-range dependencies within the table, resulting in high accuracy in detecting table boundaries, cell separations, and internal table structures. This paper also introduces separate modules for table detection and column detection, which are responsible for recognizing the overall table structure and accurately positioning columns, respectively. Through this modular design, the model can better adapt to tables with diverse complex layouts, thereby improving its ability to process intricate tables. Finally, EasyOCR technology is employed to extract text from the table. Experimental results demonstrate that TransTab outperforms the state-of-the-art methods across several metrics. This research provides a novel solution for the automatic recognition and processing of document tables, paving the way for future developments in document analysis tasks.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100665"},"PeriodicalIF":0.0,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143943548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chalachew Muluken Liyew , Elvira Di Nardo , Stefano Ferraris , Rosa Meo
{"title":"Hyperparameter optimization of machine learning models for predicting actual evapotranspiration","authors":"Chalachew Muluken Liyew , Elvira Di Nardo , Stefano Ferraris , Rosa Meo","doi":"10.1016/j.mlwa.2025.100661","DOIUrl":"10.1016/j.mlwa.2025.100661","url":null,"abstract":"<div><div>Direct measurement of actual evapotranspiration (AET) using eddy covariance and lysimeters is challenging, particularly in large areas, due to high cost, technical complexity, and the need for specialized instrumentation. Consequently, AET data is limited, prompting the use of meteorological and soil features for prediction. This study develops and evaluates machine learning models for AET prediction based on two input combinations. The first group, selected through Pearson correlation, tolerance, and VIF scores to address multicollinearity, includes net CO<sub>2</sub>, sensible heat flux, air temperature, relative humidity, and wind speed. The second group, chosen for practical applicability and more accessible, consists of soil surface temperature, air temperature, relative humidity, and wind speed.</div><div>Two predictive approaches are proposed: (i) deep learning models (LSTM, GRU, CNN) and (ii) classical machine learning models (SVR, RF). Hyperparameters were optimized using Bayesian optimization and compared with grid search. Bayesian optimization demonstrated higher performance and reduced computation time. Model performance was evaluated using statistical indicators (RMSE, MSE, MAE, R<sup>2</sup>). Deep learning methods outperformed classical methods, with LSTM achieving the best results (Bayesian optimization: RMSE=0.0230, MSE=0.0005, MAE=0.0139, R<sup>2</sup>=0.8861).</div><div>Performance decreased with fewer predictors. LSTM maintained superiority, achieving R<sup>2</sup>=0.8861 with five predictors and R<sup>2</sup>=0.8467 with four. LSTM also slightly outperformed SVR (R<sup>2</sup> = 0.8456) with fewer predictors. Overall, deep learning methods, especially with Bayesian optimization, have been shown to be more effective than classical machine learning methods for AET prediction. This findings encourage future research using varied input combinations and advanced modeling approaches for AET accurate prediction.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100661"},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143927577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Timo T. Stomberg , Lennart A. Reißner , Martin G. Schultz , Ribana Roscher
{"title":"Building consistency in explanations: Harmonizing CNN attributions for satellite-based land cover classification","authors":"Timo T. Stomberg , Lennart A. Reißner , Martin G. Schultz , Ribana Roscher","doi":"10.1016/j.mlwa.2025.100653","DOIUrl":"10.1016/j.mlwa.2025.100653","url":null,"abstract":"<div><div>Explainable machine learning has gained substantial attention for its role in enhancing transparency and trust in computer vision applications. Attribution methods like Grad-CAM and occlusion sensitivity analysis are frequently used to identify how features contribute to predictions of neural networks. However, a key challenge is that different attribution methods often produce different outcomes undermining trust in their results. Furthermore, the unique characteristics of remote sensing imagery pose additional challenges for attribution interpretation: it primarily comprises continuous “stuff” classes rather than objects, exhibits fine-grained spatial variability, contains mixed pixels, is often multispectral, and exhibits spatially heterogeneity. To tackle this challenge, we present a novel methodology that harmonizes attributions, resulting in: 1. greater consistency across different attribution methods; 2. more meaningful explanations when validated against known segmentation ground truth; and 3. enhanced transparency and traceability. This is achieved by coherently linking feature representations to attributions derived from analyzing the training data, enabling direct attribution assignment to features in (unseen) images. We evaluate our methodology using two satellite-based land cover classification datasets, three convolutional neural network architectures, and nine attribution methods. Harmonizing attributions increases the Pearson correlation coefficient between different attribution methods by an average of 0.18 across all datasets, models, and methods; and improves the micro F1-score — a measure of accuracy — by 12%. We demonstrate that Grad-CAM attributions are inherently well-aligned with the features, whereas other gradient-based attribution methods exhibit significant noise, mitigated through harmonization. It further enhances the resolution of occlusion-based attribution maps and adjusts misleading explanations.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100653"},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143927650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EMD-based local matching for occluded person re-identification","authors":"Hoang-Anh Nguyen , Thuy-Binh Nguyen , Hong-Quan Nguyen , Thi-Lan Le","doi":"10.1016/j.mlwa.2025.100663","DOIUrl":"10.1016/j.mlwa.2025.100663","url":null,"abstract":"<div><div>Person re-identification (Re-ID) is a vital computer vision task focused on matching images of a person of interest as they move across multiple non-overlapping cameras. Thanks to advancements in deep learning models, numerous important milestones have been achieved in the field of person Re-ID. Recent efforts have concentrated on addressing a more realistic scenario where pedestrians are partially occluded. This trend indicates a promising future for the practical implementation of person Re-ID systems. This paper builds upon our previous work, which successfully addressed single-shot person Re-ID using local matching information. For this task, Earth Mover’s Distance (EMD) is employed as a metric to measure similarity between two distributions. To handle multi-shot Re-ID, the proposed framework integrates a feature block, adapting the single-shot methodology to a multi-shot setting. Unlike conventional person Re-ID methods that employ a manually determined images of person, the proposed framework takes a query tracklet as input, which is automatically generated through human detection and tracking steps. To evaluate the proposed method, FAPR dataset (Fully Automated Person ReID) is used. This dataset is one of the few publicly available datasets built specifically for an end-to-end person Re-ID system. Various scenarios are rigorously examined to demonstrate the effectiveness of the proposed framework, especially in challenging conditions with strong occlusion. Across eight experimental scenarios, the proposed method achieves matching rates at rank-1 ranging from 76.3% to 100%. These results underscore the robustness and efficacy of our approach. Our source code is made available at: <span><span>https://github.com/anhnhust/emd-person-reid</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100663"},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143927578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Averi Bates , Ryan Vavricka , Shane Carleton , Ruosi Shao , Chongle Pan
{"title":"Unified modeling language code generation from diagram images using multimodal large language models","authors":"Averi Bates , Ryan Vavricka , Shane Carleton , Ruosi Shao , Chongle Pan","doi":"10.1016/j.mlwa.2025.100660","DOIUrl":"10.1016/j.mlwa.2025.100660","url":null,"abstract":"<div><div>The Unified Modeling Language is a standardized visual language widely used for modeling and documenting the design of software systems. Although many tools are available that generate UML diagrams from UML code, generating executable UML code from image-based UML diagrams remains challenging. This paper proposes a new approach to generate UML code using a large multimodal language model automatically. Synthetic UML activity and sequence diagram datasets were created to train and test the model. We compared the standard fine-tuning with LoRA techniques to optimize base models. The experiments measured the code generation accuracy across different model sizes and training strategies. These results demonstrated that domain-adapted MM-LLMs perform for UML code generation automation, whereby, at the best model, it achieved BLEU and SSIM of 0.779 and 0.942 on sequence diagrams. This will enable the modernization of legacy systems and decrease the manual effort put into software development workflows.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100660"},"PeriodicalIF":0.0,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143911626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmed Ibrahim , Abdullah Hosseini , Salma Ibrahim , Aamenah Sattar , Ahmed Serag
{"title":"D3: A Small Language Model for Drug-Drug Interaction prediction and comparison with Large Language Models","authors":"Ahmed Ibrahim , Abdullah Hosseini , Salma Ibrahim , Aamenah Sattar , Ahmed Serag","doi":"10.1016/j.mlwa.2025.100658","DOIUrl":"10.1016/j.mlwa.2025.100658","url":null,"abstract":"<div><div>Large Language Models (LLMs) have significantly advanced Natural Language Processing (NLP) applications, including healthcare. However, their high computational demands pose challenges for deployment in resource-constrained settings. Small Language Models (SLMs) offer a promising alternative, balancing performance and efficiency. In this study, we introduce D3, a compact SLM with approximately 70 million parameters, designed for Drug-Drug Interaction (DDI) prediction. Trained on a curated DrugBank dataset, D3 was compared against fine-tuned state-of-the-art LLMs, Qwen 2.5, Gemma 2, Mistral v0.3, and LLaMA 3.1, ranging from 1.5 billion to 70 billion parameters. Despite being 1000 times smaller than LLaMA 3.1, D3 achieved an F1 score of 0.86, comparable to larger models (Mistral v0.3: 0.88, LLaMA 3.1: 0.89), with no statistically significant performance difference. Expert evaluations further confirmed that D3’s predictions were clinically relevant and closely aligned with those of larger models. Our findings demonstrate that SLMs can effectively compete with LLMs in DDI prediction, achieving strong performance while significantly reducing computational requirements. Beyond DDI prediction, this work highlights the broader potential of small models in healthcare, where balancing accuracy and efficiency is critical.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100658"},"PeriodicalIF":0.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144089374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Munachiso Okenyi , Grace Ataguba , Kosi Clinton Henry , Sussan Anukem , Rita Orji
{"title":"Going vegan with ChatGPT: Towards designing LLMs for personalized lifestyle changes","authors":"Munachiso Okenyi , Grace Ataguba , Kosi Clinton Henry , Sussan Anukem , Rita Orji","doi":"10.1016/j.mlwa.2025.100659","DOIUrl":"10.1016/j.mlwa.2025.100659","url":null,"abstract":"<div><div>Large language models (LLMs), one of the recent technological revolutions, have become applicable to all areas of human endeavor, including health. In the area of health, LLMs have contributed to disease management, diagnosis, stress management, and other major lifestyle-related changes. However, little is yet known about their impact in the area of nutrition and lifestyle-related changes associated with diseases such as diabetes, cardiovascular diseases, obesity, and others. In this paper, we present two case studies of ChatGPT as an LLM intervention for making lifestyle-related decisions, such as transitioning to a vegan lifestyle: 1. normal weight (healthy) and 2. obesity. Additionally, we considered three (3) dietary restrictions that could affect people in both case studies to transition to a vegan lifestyle. These include 1) allergies to nuts; 2) allergies to gluten; and 3) no allergies. We used ChatGPT to generate a one-week (seven-day) meal plan based on these dietary restrictions. We analyzed all responses from ChatGPT and found that ChatGPT provides a rich combination of vegan diets and is sensitive to these food allergies to some extent. Additionally, we found some challenges that relate to how an appropriate prompt can be employed to optimize ChatGPT’s recommendations and precisions relating to the total calories of foods recommended by ChatGPT. Furthermore, we provide recommendations to overcome these challenges in future work, including supporting user's domain-specific literacy and precision sensitivity for metrics that have an overall impact on human health.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100659"},"PeriodicalIF":0.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144116362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of machine learning in the determination of rock brittleness for CO2 geosequestration","authors":"Efenwengbe Nicholas Aminaho , Mamdud Hossain , Nadimul Haque Faisal , Reza Sanaee","doi":"10.1016/j.mlwa.2025.100656","DOIUrl":"10.1016/j.mlwa.2025.100656","url":null,"abstract":"<div><div>The underground storage of carbon dioxide (CO<sub>2</sub>), also called CO<sub>2</sub> geosequestration, represents one of the most promising options for reducing greenhouse gases in the atmosphere. However, fluid-rock interactions in reservoir and cap rocks before and during CO<sub>2</sub> geosequestration alter their mineralogical composition, and consequently, their brittleness index which is paramount in determining the suitability of formations for CO<sub>2</sub> geosequestration. Therefore, it is important to monitor the brittleness of reservoir and cap rocks, to ascertain their integrity for CO<sub>2</sub> storage. In this study, an algorithm was developed to generate numerical simulation datasets for a more reliable machine learning model development, and an artificial neural network (ANN) model was developed to evaluate the brittleness index of rocks using data from numerical simulations of CO<sub>2</sub> geosequestration in sandstone and carbonate reservoirs, overlain by shale caprock. The model was developed using Python programming language. The model developed in this study predicted the brittleness index of rocks with an R<sup>2</sup> value greater than 99 %, and mean absolute percentage error (MAPE) <0.6 % on the training, validation, and testing datasets. Hence, the model predicts the brittleness index of rocks with high accuracy. The findings of the study revealed that the geochemical composition of formation fluids is related to the brittleness index of rocks. In terms of feature importance in predicting the brittleness index of rocks, the concentrations of SiO<sub>2</sub> (aq), SO<sub>4</sub><sup>2</sup>, <em>K</em><sup>+</sup>, Ca<sup>2+</sup>, and O<sub>2</sub> (aq) have a stronger impact on the brittleness of rocks considered in this study.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100656"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143903825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative insights into the Winnipeg rental sector: A data-driven analytical approach using geographic and property metrics","authors":"Lahiru Wickramasinghe , Aditya Jain","doi":"10.1016/j.mlwa.2025.100657","DOIUrl":"10.1016/j.mlwa.2025.100657","url":null,"abstract":"<div><div>In the dynamic rental market of Winnipeg, accurately predicting rental property prices is essential for a wide range of stakeholders, including landlords, tenants, prospective renters, property managers, and urban planners. Traditional rental market assessments often fail to incorporate advanced analytical techniques, leading to less precise price forecasts and hindering strategic decision-making. This paper aims to bridge this gap by developing sophisticated predictive models using a dataset that contains rental property information as well as demographic and socio-economic information in Winnipeg. This paper highlights the importance of integrating advanced computational methods in rental market analysis, which can significantly benefit economic planning and personal investment decisions in urban environments. By utilizing both machine learning and statistical learning methods, this paper seeks to improve the accuracy of rental price estimations across different neighborhoods in Winnipeg.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100657"},"PeriodicalIF":0.0,"publicationDate":"2025-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}