SOLAR: Illuminating LLM performance in API discovery and service ranking for edge AI and IoT

IF 6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Internet of Things Pub Date : 2025-04-26 DOI:10.1016/j.iot.2025.101630

Eyhab Al-Masri, Ishwarya Narayana Subramanian

{"title":"SOLAR: Illuminating LLM performance in API discovery and service ranking for edge AI and IoT","authors":"Eyhab Al-Masri, Ishwarya Narayana Subramanian","doi":"10.1016/j.iot.2025.101630","DOIUrl":null,"url":null,"abstract":"<div><div>The growing complexity of web service and API discovery calls for robust methods to evaluate how well Large Language Models (LLMs) retrieve, rank, and assess APIs. However, current LLMs often produce inconsistent results, highlighting the need for structured, multi-dimensional evaluation. This paper introduces SOLAR (Systematic Observability of LLM API Retrieval), a framework that assesses LLM performance across three key dimensions: functional capability, implementation feasibility, and service sustainability. We evaluate four leading LLMs—GPT-4 Turbo (OpenAI), Claude 3.5 Sonnet (Anthropic), LLaMA 3.2 (Meta), and Gemini 2.0 Flash (Google)—on their ability to identify, prioritize, and evaluate APIs across varying query complexities. Results show GPT-4 Turbo and Claude 3.5 Sonnet achieve high functional alignment (FCA ≥ 0.75 for simple queries) and strong ranking consistency (Spearman’s ρ ≈ 0.95). However, all models struggle with implementation feasibility and long-term sustainability, with feasibility scores declining as complexity increases and sustainability scores remaining low (SSI ≈ 0.40), limiting deployment potential. Despite retrieving overlapping APIs, models often rank them inconsistently, raising concerns for AI-driven service selection. SOLAR identifies strong correlations between functional accuracy and ranking stability but weaker links to real-world feasibility and longevity. These findings are particularly relevant for Edge AI environments, where real-time processing, distributed intelligence, and reliable API integration are critical. SOLAR offers a comprehensive lens for evaluating LLM effectiveness in service discovery, providing actionable insights to advance robust, intelligent API integration across IoT and AI-driven systems. Our work aims to inform both future model development and deployment practices in high-stakes computing environments.</div></div>","PeriodicalId":29968,"journal":{"name":"Internet of Things","volume":"32 ","pages":"Article 101630"},"PeriodicalIF":6.0000,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet of Things","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2542660525001441","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The growing complexity of web service and API discovery calls for robust methods to evaluate how well Large Language Models (LLMs) retrieve, rank, and assess APIs. However, current LLMs often produce inconsistent results, highlighting the need for structured, multi-dimensional evaluation. This paper introduces SOLAR (Systematic Observability of LLM API Retrieval), a framework that assesses LLM performance across three key dimensions: functional capability, implementation feasibility, and service sustainability. We evaluate four leading LLMs—GPT-4 Turbo (OpenAI), Claude 3.5 Sonnet (Anthropic), LLaMA 3.2 (Meta), and Gemini 2.0 Flash (Google)—on their ability to identify, prioritize, and evaluate APIs across varying query complexities. Results show GPT-4 Turbo and Claude 3.5 Sonnet achieve high functional alignment (FCA ≥ 0.75 for simple queries) and strong ranking consistency (Spearman’s ρ ≈ 0.95). However, all models struggle with implementation feasibility and long-term sustainability, with feasibility scores declining as complexity increases and sustainability scores remaining low (SSI ≈ 0.40), limiting deployment potential. Despite retrieving overlapping APIs, models often rank them inconsistently, raising concerns for AI-driven service selection. SOLAR identifies strong correlations between functional accuracy and ranking stability but weaker links to real-world feasibility and longevity. These findings are particularly relevant for Edge AI environments, where real-time processing, distributed intelligence, and reliable API integration are critical. SOLAR offers a comprehensive lens for evaluating LLM effectiveness in service discovery, providing actionable insights to advance robust, intelligent API integration across IoT and AI-driven systems. Our work aims to inform both future model development and deployment practices in high-stakes computing environments.

查看原文本刊更多论文

SOLAR：照亮LLM在边缘AI和IoT的API发现和服务排名方面的性能

web服务和API发现的日益复杂需要健壮的方法来评估大型语言模型（llm）检索、排序和评估API的效果。然而，目前的法学硕士经常产生不一致的结果，突出了对结构化、多维评估的需求。本文介绍了SOLAR （LLM API检索的系统可观察性），这是一个从三个关键维度评估LLM性能的框架：功能能力、实施可行性和服务可持续性。我们评估了四个领先的llms——gpt -4 Turbo （OpenAI）、Claude 3.5 Sonnet （Anthropic）、LLaMA 3.2 （Meta）和Gemini 2.0 Flash（谷歌）——在不同查询复杂性下识别、优先排序和评估api的能力。结果表明，GPT-4 Turbo和Claude 3.5 Sonnet具有较高的功能一致性（简单查询的FCA≥0.75）和较强的排序一致性（Spearman ρ≈0.95）。然而，所有模型都在实施可行性和长期可持续性方面存在问题，可行性得分随着复杂性的增加而下降，可持续性得分仍然很低（SSI≈0.40），限制了部署潜力。尽管检索重叠的api，但模型通常对它们进行不一致的排序，这引起了对人工智能驱动的服务选择的担忧。SOLAR发现功能准确性和排名稳定性之间存在很强的相关性，但与现实世界的可行性和寿命之间的联系较弱。这些发现与边缘人工智能环境特别相关，在这些环境中，实时处理、分布式智能和可靠的API集成至关重要。SOLAR为评估LLM在服务发现方面的有效性提供了一个全面的视角，提供了可操作的见解，以推进跨物联网和人工智能驱动系统的强大、智能API集成。我们的工作旨在为高风险计算环境中的未来模型开发和部署实践提供信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Internet of Things Multiple-

CiteScore

3.60

自引率

5.10%

发文量

115

审稿时长

37 days

期刊介绍： Internet of Things; Engineering Cyber Physical Human Systems is a comprehensive journal encouraging cross collaboration between researchers, engineers and practitioners in the field of IoT & Cyber Physical Human Systems. The journal offers a unique platform to exchange scientific information on the entire breadth of technology, science, and societal applications of the IoT. The journal will place a high priority on timely publication, and provide a home for high quality. Furthermore, IOT is interested in publishing topical Special Issues on any aspect of IOT.