{"title":"Code histories: Documenting development by recording code influences and changes in code","authors":"Vo Thien Tri Pham, Caitlin Kelleher","doi":"10.1016/j.cola.2024.101313","DOIUrl":"10.1016/j.cola.2024.101313","url":null,"abstract":"<div><div>Developers frequently encounter challenges when working with large code bases found in modern software applications, from navigating through files to more complex tasks like understanding code histories, dependencies, and evolutions. While many applications use Version Control Systems (VCSs) to archive present-day programs and provide a historical perspective on code development, the level of detail they offer is often insufficient for in-depth analyses. As a result, it becomes difficult to fully explore the potential benefits of historical data in software development. We introduce an enhanced recording framework that integrates both the Visual Studio Code (VS Code) development environment and the Google Chrome web browser to capture more detailed development activities. Our framework is designed to offer additional recording options, thereby providing researchers with more opportunities to study how different historical resources can be utilized. Through an observational study, we demonstrate the utility of our framework in capturing the complex dynamics of code change activities, highlighting its potential value in both academic and practical contexts.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"82 ","pages":"Article 101313"},"PeriodicalIF":1.7,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohamed Amine Daoud , Sid Ahmed Mokhtar Mostefaoui , Abdelkader Ouared , Hadj Madani Meghazi , Bendaoud Mebarek , Abdelkader Bouguessa , Hasan Ahmed
{"title":"A comprehensive meta-analysis of efficiency and effectiveness in the detection community","authors":"Mohamed Amine Daoud , Sid Ahmed Mokhtar Mostefaoui , Abdelkader Ouared , Hadj Madani Meghazi , Bendaoud Mebarek , Abdelkader Bouguessa , Hasan Ahmed","doi":"10.1016/j.cola.2024.101314","DOIUrl":"10.1016/j.cola.2024.101314","url":null,"abstract":"<div><div>Creating an intrusion detection system (IDS) is a prominent area of research that continuously draws attention from both scholars and practitioners who tirelessly innovate new solutions. The complexity of IDS naturally escalates alongside technological advancements, whether they are manually implemented within security infrastructures or elaborated upon in academic literature. However, accessing and comparing these IDS solutions requires sifting through a multitude of hypotheses presented in research papers, which is a laborious and error-prone endeavor. Consequently, many researchers encounter difficulties in replicating results or reanalyzing published IDSs. This challenge primarily arises due to the absence of a standardized process for elucidating IDS methodologies. In response, this paper advocates for a framework aimed at enhancing the reproducibility of IDS outcomes, thereby enabling their seamless reuse across diverse cybersecurity contexts, benefiting both end-users and experts alike. The proposed framework introduces a descriptive language for the precise specification of IDS descriptions. Additionally, a model repository facilitates the sharing and reusability of IDS configurations. Lastly, through a case study, we showcase the effectiveness of our framework in addressing challenges associated with data acquisition and knowledge organization and sharing. Our results demonstrate satisfactory prediction accuracy for configuration reuse and precise identification of reusable components.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"82 ","pages":"Article 101314"},"PeriodicalIF":1.7,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MTable: Visual query interface for browsing and navigation in NoSQL data stores","authors":"Kanika Soni, Shelly Sachdeva","doi":"10.1016/j.cola.2024.101312","DOIUrl":"10.1016/j.cola.2024.101312","url":null,"abstract":"<div><div>Almost all human endeavors in the era of the digital revolution, from commercial and industrial processes to scientific and medical research, depend on the use of ever-increasing amounts of data. However, this humungous data and its complexity make data exploration and querying challenging even for experts. This led to the demand for easy access to data, even for naive users, all the more evident. Considering this, the database community has tilted toward NoSQL Data stores. While there has been much study on query formulation assistance for NoSQL data stores, many users still want help when specifying complex queries (such as aggregation pipeline queries), which require an in-depth understanding of the data storage architecture of a specific NoSQL data store. To help users perform interactive browsing and navigation in NoSQL data stores (MongoDB), this paper proposes a novel, simple, and user-friendly interface, MTable, that provides users with a presentation-level interactive view. This view compactly presents the query results from multiple embedded documents within a single tabular format compared to MongoDB's find operation, which always returns the main document. A certain cell of the MTable contains clickable hyperlinks for users to interact directly with the data persisted in the document stores. This helps the users to incrementally construct complex queries and navigate the document stores without worrying about the tedious task of writing complex queries. In a user study, participants performed various querying tasks faster with MTable than with the traditional querying mechanism. MTable has received positive subjective feedback as well.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"82 ","pages":"Article 101312"},"PeriodicalIF":1.7,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mental stress analysis by measuring heart rate variability during learning programming: Comparison of visual- and text-based languages","authors":"Katsuyuki Umezawa , Takumi Koshikawa , Makoto Nakazawa , Shigeichi Hirasawa","doi":"10.1016/j.cola.2024.101311","DOIUrl":"10.1016/j.cola.2024.101311","url":null,"abstract":"<div><div>Visual-based programming languages that facilitate block-based coding have gained popularity as introductory methods for learning programming. Conversely, programming experts typically use text-based programming languages like C and Java. Nevertheless, a seamless method for transitioning from a visual- to text-based language has yet to be developed. Therefore, our research project aims to develop a methodology that facilitates this transition by bridging the gap between the two languages and verifying the variations in the biometric information of learners of both languages. In this study, we measured the participants’ heart rate variability (HRV) and evaluated variations in mental stress experienced while learning visual- and text-based languages. The experimental results confirmed that participants proficient in text-based languages experienced lower HRV (indicating higher stress levels) when learning visual-based languages. Conversely, those poorly proficient in text-based languages exhibited higher HRVs (indicating more favorable stress levels) while learning text-based languages. This study successfully observed differences in stress levels while learning both language types using experimental methods. These findings serve as a preliminary step toward clarifying the impact of stress experienced during learning outcomes and identifying the factors that constitute beneficial stress. This study establishes a foundation for an intermediate language that can enhance transitions between the two types of languages.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"82 ","pages":"Article 101311"},"PeriodicalIF":1.7,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining type inference techniques for semi-automatic UML generation from Pharo code","authors":"Jan Blizničenko, Robert Pergl","doi":"10.1016/j.cola.2024.101300","DOIUrl":"10.1016/j.cola.2024.101300","url":null,"abstract":"<div><div>This paper explores how to reconstruct UML diagrams from dynamically typed languages such as Smalltalk, which do not use explicit type information. This lack of information makes traditional methods for extracting associations difficult. It addresses the need for automated techniques, particularly in legacy software systems, to facilitate their transformation into modern technologies, focusing on Smalltalk as a case study due to its extensive industrial legacy and modern adaptations like Pharo. We propose a way to create UML diagrams from Smalltalk code, focusing on using type inference to determine UML associations. For optimal outcomes for large-scale software systems, we recommend combining different type inference methods in an automatic or semi-automatic way.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"82 ","pages":"Article 101300"},"PeriodicalIF":1.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient instance selection algorithm for fast training of support vector machine for cross-project software defect prediction pairs","authors":"Manpreet Singh, Jitender Kumar Chhabra","doi":"10.1016/j.cola.2024.101301","DOIUrl":"10.1016/j.cola.2024.101301","url":null,"abstract":"<div><div>SVM is limited in its use for cross-project software defect prediction because of its very slow training process. So, this research article proposes a new instance selection (IS) algorithm called boundary detection among classes (BDAC) to reduce the training dataset size for faster training of SVM without degrading the prediction performance. The proposed algorithm is evaluated against six existing IS algorithms based on accuracy, running time, data reduction rate, etc. using 23 general datasets, 18 software defect prediction datasets, and two shape-based datasets, and results prove that BDAC is better than the selected algorithm based on collective comparison.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"81 ","pages":"Article 101301"},"PeriodicalIF":1.7,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex Holmquist , Vitor Emanuel , Fernando C. Alves , Fernando Magno Quintão Pereira
{"title":"Detection and treatment of string events in the limit","authors":"Alex Holmquist , Vitor Emanuel , Fernando C. Alves , Fernando Magno Quintão Pereira","doi":"10.1016/j.cola.2024.101299","DOIUrl":"10.1016/j.cola.2024.101299","url":null,"abstract":"<div><div>A string event is a pattern that occurs in a stream of characters. The need to detect and handle string events in infinite texts emerges in many scenarios, including online treatment of logs, web crawling, and syntax highlighting. This paper describes a technique to specify and treat string events. Users determine patterns of interest via a markup language. From such examples, tokens are generalized via a semi-lattice of regular expressions. Such tokens are combined into a context-free language that recognizes patterns in the text stream. These techniques are implemented in a text processing system called <span>Lushu</span>, which runs on the Java Virtual Machine (JVM). <span>Lushu</span> intercepts strings emitted by the JVM. Once patterns are detected, it invokes a user-specified action handler. As a proof of concept, this paper shows that <span>Lushu</span> outperforms state-of-the-art parsers and parser generators, such as <span>Comby</span>, <span>BeautifulSoup4</span> and <span>ZheFuscator</span>, in terms of memory consumption and running time.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"81 ","pages":"Article 101299"},"PeriodicalIF":1.7,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ClangOz: Parallel constant evaluation of C++ map and reduce operations","authors":"Paul Keir , Andrew Gozillon","doi":"10.1016/j.cola.2024.101298","DOIUrl":"10.1016/j.cola.2024.101298","url":null,"abstract":"<div><div>Interest in metaprogramming, reflection, and compile-time evaluation continues to inspire and foster innovation among the users and designers of the C++ programming language. Regrettably, the impact on compile-times of such features can be significant; and outside of build systems, multi-core parallelism is unable to bring down compilation times of individual translation units. We present ClangOz, a novel Clang-based research compiler that addresses this issue by evaluating annotated constant expressions in parallel, thereby reducing compilation times. Prior benchmarks analyzed parallel map operations, but were unable to consider reduction operations. Thus we also introduce parallel reduction functionality, alongside two additional benchmark programs.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"81 ","pages":"Article 101298"},"PeriodicalIF":1.7,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142440881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MoTion: A new declarative object matching approach in Pharo","authors":"Aless Hosry , Vincent Aranega , Nicolas Anquetil","doi":"10.1016/j.cola.2024.101290","DOIUrl":"10.1016/j.cola.2024.101290","url":null,"abstract":"<div><p>Pattern matching is an expressive way of matching data and extracting pieces of information from it. The recent inclusion of pattern matching in the Java and Python languages highlights that such a facility is more and more adopted by developers for everyday development. Other main stream programming languages also offer pattern matching capabilities as part of the language (Rust, Scala, Haskell, and OCaml), with different degrees of expressivity in what can be matched. In the meantime, in graphs, pattern matching takes a slightly different turn; it enhances the expressivity of the patterns that can be defined. Smalltalk currently offers little pattern matching capability to find specific objects inside a large graph of objects using a declarative pattern. In Pharo, the closest library to classical pattern matching that exists is the <span>RBParseTreeSearcher</span>, which allows to express specialized patterns over a Pharo Abstract Syntax Tree to find some inner node. The question arises of what features a flexible pattern matching language should have. In this paper, we review the features found in different existing pattern matching languages, both in General Purpose Languages (like Java) and in declarative graph pattern matching languages. We then describe MoTion, a new pattern matching engine for Pharo smalltalk, combining all these features. We discuss some aspects of MoTion’s implementation and illustrate its use with real case examples.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"81 ","pages":"Article 101290"},"PeriodicalIF":1.7,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142129955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenzhou Tian , Yuchen Gong , Chenhao Chang , Jiaze Sun , Yanping Chen , Lingwei Chen
{"title":"An empirical study on divergence of differently-sourced LLVM IRs","authors":"Zhenzhou Tian , Yuchen Gong , Chenhao Chang , Jiaze Sun , Yanping Chen , Lingwei Chen","doi":"10.1016/j.cola.2024.101289","DOIUrl":"10.1016/j.cola.2024.101289","url":null,"abstract":"<div><p>In solving binary code similarity detection, many approaches choose to operate on certain unified intermediate representations (IRs), such as Low Level Virtual Machine (LLVM) IR, to overcome the cross-architecture analysis challenge induced by the significant morphological and syntactic gaps across the diverse instruction set architectures (ISAs). However, the LLVM IRs of the same program can be affected by diverse factors, such as the acquisition source, i.e., compiled from source code or disassembled and lifted from binary code. While the impact of compilation settings on binary code has been explored, the specific differences between LLVM IRs from varied sources remain underexamined. To this end, we pioneer an in-depth empirical study to assess the discrepancies in LLVM IRs derived from different sources. Correspondingly, an extensive dataset containing nearly 98 million LLVM IR instructions distributed in 808,431 functions is curated with respect to these potential IR-influential factors. On this basis, three types of code metrics detailing the syntactic, structural, and semantic aspects of the IR samples are devised and leveraged to assess the divergence of the IRs across different origins. The findings offer insights into how and to what extent the various factors affect the IRs, providing valuable guidance for assembling a training corpus aimed at developing robust LLVM IR-oriented pre-training models, as well as facilitating relevant program analysis studies that operate on the LLVM IRs.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"81 ","pages":"Article 101289"},"PeriodicalIF":1.7,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141937759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}