{"title":"LLMs: A game-changer for software engineers?","authors":"Md. Asraful Haque","doi":"10.1016/j.tbench.2025.100204","DOIUrl":"10.1016/j.tbench.2025.100204","url":null,"abstract":"<div><div>Large Language Models (LLMs) like GPT-3 and GPT-4 have emerged as groundbreaking innovations with capabilities that extend far beyond traditional AI applications. These sophisticated models, trained on massive datasets, can generate human-like text, respond to complex queries, and even write and interpret code. Their potential to revolutionize software development has captivated the software engineering (SE) community, sparking debates about their transformative impact. Through a critical analysis of technical strengths, limitations, real-world case studies, and future research directions, this paper argues that LLMs are not just reshaping how software is developed but are redefining the role of developers. While challenges persist, LLMs offer unprecedented opportunities for innovation and collaboration. Early adoption of LLMs in software engineering is crucial to stay competitive in this rapidly evolving landscape. This paper serves as a guide, helping developers, organizations, and researchers understand how to harness the power of LLMs to streamline workflows and acquire the necessary skills.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"5 1","pages":"Article 100204"},"PeriodicalIF":0.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144168925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluatology’s perspective on AI evaluation in critical scenarios: From tail quality to landscape","authors":"Zhengxin Yang","doi":"10.1016/j.tbench.2025.100203","DOIUrl":"10.1016/j.tbench.2025.100203","url":null,"abstract":"<div><div>Tail Quality, as a metric for evaluating AI inference performance in critical scenarios, reveals the extreme behaviors of AI inference systems in real-world applications, offering significant practical value. However, its adoption has been limited due to the lack of systematic theoretical support. To address this issue, this paper analyzes AI inference system evaluation activities from the perspective of Evaluatology, bridging the gap between theory and practice. Specifically, we begin by constructing a rigorous, consistent, and comprehensive evaluation system for AI inference systems, with a focus on defining the evaluation subject and evaluation conditions. We then refine the Quality@Time-Threshold (Q@T) statistical evaluation framework by formalizing these components, thereby enhancing its theoretical rigor and applicability. By integrating the principles of Evaluatology, we extend Q@T to incorporate stakeholder considerations, ensuring its adaptability to varying time tolerance. Through refining the Q@T evaluation framework and embedding it within Evaluatology, we provide a robust theoretical foundation that enhances the accuracy and reliability of AI system evaluations, making the approach both scientifically rigorous and practically reliable. Experimental results further validate the effectiveness of this refined framework, confirming its scientific rigor and practical applicability. The theoretical analysis presented in this paper provides valuable guidance for researchers aiming to apply Evaluatology in practice.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"5 1","pages":"Article 100203"},"PeriodicalIF":0.0,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144168926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenji Han , Xinyu Li , Feng Xue , Weitong Wang , Yuxuan Wu , Wenxiang Wang , Fuxin Zhang
{"title":"MultiPoint: Enabling scalable pre-silicon performance evaluation for multi-task workloads","authors":"Chenji Han , Xinyu Li , Feng Xue , Weitong Wang , Yuxuan Wu , Wenxiang Wang , Fuxin Zhang","doi":"10.1016/j.tbench.2025.100189","DOIUrl":"10.1016/j.tbench.2025.100189","url":null,"abstract":"<div><div>With the core numbers integrated within single processors growing and the fast development of cloud computing, performance evaluation for multi-core systems is increasingly crucial. It is typically conducted by executing multi-task workloads, exemplified by SPEC CPU Rate, to measure metrics like system’s throughput. In response, several sampling-based methods have been developed for their pre-silicon performance evaluation. Nevertheless, these methods involve directly capturing multi-task checkpoints, which presents scalability issues of significant storage and time overheads. Therefore, enabling more scalable performance evaluation remains a critical problem.</div><div>In this work, we propose MultiPoint to enable scalable pre-silicon performance evaluation for multi-task workloads. It is noted that in the multi-task workloads of interest, each task executes independently without inter-task communication. Therefore, MultiPoint is motivated to construct the required multi-task checkpoints by recovering multiple single-task checkpoints across different cores and guarantee their smooth execution through address remapping and shuffling. We implemented MultiPoint on the Emulator Accelerator and assessed its evaluation accuracy against its post-silicon Loongson 3A6000 processor. Using SPEC CPU 2017 as the benchmark, MultiPoint achieved the estimation errors of 6.20%, 5.45%, and 6.99% for Rate 2, Rate 4, and Rate 8, respectively, achieving comparable accuracy compared to direct multi-task checkpointing but in a more scalable manner with substantially 86.0% lower storage and 93.7% less time overheads.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 3","pages":"Article 100189"},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143445331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tarikur Jaman Pramanik , Md. Rafiquzzaman , Anup Karmakar , Marzan Hasan Nayeem , S M Kalbin Salim Turjo , Md. Ragib Abid
{"title":"Evaluation of mechanical properties of natural fiber based polymer composite","authors":"Tarikur Jaman Pramanik , Md. Rafiquzzaman , Anup Karmakar , Marzan Hasan Nayeem , S M Kalbin Salim Turjo , Md. Ragib Abid","doi":"10.1016/j.tbench.2024.100183","DOIUrl":"10.1016/j.tbench.2024.100183","url":null,"abstract":"<div><div>Natural fiber based polymer composites are eco-friendly alternatives to synthetic materials, with greater mechanical properties, biodegradability, availability, ease of access, and affordability. Jute fiber is widely recognized as one of the most important and beneficial natural fibers due to its strength, durability, and biodegradability. In this study, the jute composite is designed and fabricated using a 5-layer jute and epoxy resin, utilizing the manual hand lay-up technique. The combination of 52.5 % jute and 47.5 % of epoxy resin and harder is found optimized to achieve the goals of improving the tensile strength and flexural strength, reducing the cost of epoxy resin, and promoting eco-friendliness and sustainability. Tensile testing was performed on a universal testing machine, while flexural testing was done with a three-point bending test. Experimentally, the composites reinforced with jute and epoxy resin were capable of achieving the required levels of tensile strength (42.91 MPa) and bending strength (69.30 MPa). To validate and visualize specimens, numerical analysis was performed on the ABAQUS simulation software. The numerical simulation utilized ASTM D3039 and ASTM D7264 as the specified requirements for tensile and flexural behavior. For validation, these tensile and flexural test results were then numerically analyzed and compared to the experimental data. Finally, composite design, fabrication, and optimization can improve mechanical properties, reduce composite weight, lower resin cost, and increase sustainability. The proposed design and composition can be implemented to achieve lightweight properties in various applications, such as car components, door handle sheets, bicycle seat backs, and luggage covers.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 3","pages":"Article 100183"},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142418440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md. Hasibul Hasan Hemal , Farjana Parvin , Alberuni Aziz
{"title":"Analyzing the obstacles to the establishment of sustainable supply chain in the textile industry of Bangladesh","authors":"Md. Hasibul Hasan Hemal , Farjana Parvin , Alberuni Aziz","doi":"10.1016/j.tbench.2024.100185","DOIUrl":"10.1016/j.tbench.2024.100185","url":null,"abstract":"<div><div>Bangladesh's textile sector plays a crucial role in its economy by creating jobs and significantly contributing to export revenue. However, this industry faces challenges, including contaminated water sources and the release of airborne pollutants due to its high-water usage, chemical dyes, and manufacturing processes. Therefore, establishing a sustainable supply chain is essential. This study aims to identify the critical obstacles to establishing a sustainable supply chain. Multi-Criteria Decision Making (MCDM) techniques, such as DEMATEL, help reveal the relationships between different components and determine the relative importance of each in the decision-making model. Meanwhile, Fuzzy TOPSIS proves reliable in situations of uncertainty, allowing for effective ranking of the barriers. The findings indicate that the most pressing barriers include resistance to change and the adoption of innovation, financial constraints or high costs, and a lack of support and commitment from top management. This assessment helps pinpoint crucial obstacles that must be addressed to achieve sustainability in the textile sector. By effectively identifying and eliminating these barriers, this study aims to assist those involved in the industry in their pursuit of a more sustainable future.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 3","pages":"Article 100185"},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corrigendum regarding missing Declaration Conflict-of -Interests statements in previously published articles","authors":"","doi":"10.1016/j.tbench.2024.100186","DOIUrl":"10.1016/j.tbench.2024.100186","url":null,"abstract":"","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 3","pages":"Article 100186"},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guoxin Kang , Wanling Gao , Lei Wang , Chunjie Luo , Hainan Ye , Qian He , Shaopeng Dai , Jianfeng Zhan
{"title":"Could bibliometrics reveal top science and technology achievements and researchers? The case for evaluatology-based science and technology evaluation","authors":"Guoxin Kang , Wanling Gao , Lei Wang , Chunjie Luo , Hainan Ye , Qian He , Shaopeng Dai , Jianfeng Zhan","doi":"10.1016/j.tbench.2024.100182","DOIUrl":"10.1016/j.tbench.2024.100182","url":null,"abstract":"<div><div>By utilizing statistical methods to analyze bibliographic data, bibliometrics faces inherent limitations in identifying the most significant science and technology achievements and researchers. To overcome this challenge, we present an evaluatology-based science and technology evaluation methodology. At the heart of this approach lies the concept of an extended evaluation condition (EC), encompassing nine crucial components derived from a field. We define four relationships that illustrate the connections among various achievements based on their mapped extended EC components, as well as their temporal and citation links. Within a relationship under an extended EC, evaluators can effectively compare these achievements by carefully addressing the influence of confounding variables. We establish a real-world evaluation system encompassing an entire collection of achievements, each of which is mapped to several components of an extended EC. Within a specific field like chip technology or open source, we construct a perfect evaluation model that can accurately trace the evolution and development of all achievements in terms of four relationships based on the real-world evaluation system. Building upon the foundation of the perfect evaluation model, we put forth four-round rules to eliminate non-significant achievements by utilizing four relationships. This process allows us to establish a pragmatic evaluation model that effectively captures the essential achievements, serving as a curated collection of the top N achievements within a specific field during a specific timeframe. We present a case study on the top 100 Chip achievements to demonstrate the effectiveness of our science and technology evaluatology. The case study highlights its practical application and efficacy in identifying significant achievements and researchers that otherwise cannot be identified by using bibliometrics.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 3","pages":"Article 100182"},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142662450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Five Axioms of Things","authors":"Jianfeng Zhan","doi":"10.1016/j.tbench.2024.100184","DOIUrl":"10.1016/j.tbench.2024.100184","url":null,"abstract":"<div><div>This article explicitly defines several concepts, such as variables, models, and truth of a thing, that are fundamental to natural and social sciences. I present a generalized methodology for understaning a thing, categorically defining six foundational understanding approaches based on the nature of the thing and diverse perspectives: conjecture, observation, experiment, evaluation, measurement, and testing. I extend my previous work on the five axioms of evaluation to understanding a thing, which I call the five axioms of things. Also, I comment on five paradigms of science.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 3","pages":"Article 100184"},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring the Orca Predation Algorithm for Economic Dispatch Optimization in Power Systems","authors":"Vivi Aida Fitria , Arif Nur Afandi , Aripriharta","doi":"10.1016/j.tbench.2024.100187","DOIUrl":"10.1016/j.tbench.2024.100187","url":null,"abstract":"<div><div>The Economic Dispatch problem is essential for minimizing generation costs while satisfying power demand in electrical systems. This research looks into the Orca Predation Algorithm, an optimization method based on biology that can solve the Economic Dispatch problem for systems with 6, 13, or 15 producing units. The idea behind Orca Predation Algorithm came from the way orcas hunt for food. It solves problems that other optimization methods and bio-inspired algorithms have, like too much population diversity and too early convergence. This research shows that Orca Predation Algorithm consistently does better than other bio-inspired algorithms like Particle Swarm Optimization, Whale Optimization Algorithm, Grey Wolf Optimizer, the Bat Algorithm, Genetic Algorithm and Ant Colony Optimization in terms of minimum cost, average cost, and solution stability. The sensitivity analysis of the parameters regulating the exploration-exploitation balance in Orca Predation Algorithm demonstrated substantial performance enhancements. By changing these parameters, the best prices came in at $15,275.93 for the 6-unit system, $17,932.49 for the 13-unit system, and $32,256.97 for the 15-unit system. These prices are lower than those in the previous parameter setting. Although Orca Predation Algorithm demonstrates greater performance, it necessitates extended computing time, which future research could mitigate by exploring parallelization or hybrid methodologies. This paper shows that Orca Predation Algorithm is a reliable tool for optimizing Economic Dispatch problems. It gives useful information to power system engineers who are looking for effective and scalable optimization methods for modern power systems.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 3","pages":"Article 100187"},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fundamental concepts and methodologies in evaluatology","authors":"Jianfeng Zhan","doi":"10.1016/j.tbench.2025.100188","DOIUrl":"10.1016/j.tbench.2025.100188","url":null,"abstract":"<div><div>While I have authored three articles introducing Evaluatology, a novel discipline that encompasses the science and engineering of evaluation across various domains, I have struggled to fully depict this challenging yet promising field.</div><div>This article delves into the fundamental concepts and methodologies within Evaluatology. I aim to provide a complete picture of evaluation problems in Evaluatology based on my proposed fundamental methodology of understanding a thing. In diverse engineering fields, testbeds, experimental platforms, or simulation environments are commonly utilized to evaluate design or implementation decisions. However, a rigorous methodology is often lacking. I propose a rigorous methodology rooted in Evaluatology for testbeds, experimental platforms, or simulation environments.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"4 3","pages":"Article 100188"},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143314904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}