{"title":"Identifying and Extracting Hierarchical Information from Business PDF Documents","authors":"Rohit Shere, Pavan Kumar Chittimalli, Ravindra Naik","doi":"10.1145/3511430.3511440","DOIUrl":"https://doi.org/10.1145/3511430.3511440","url":null,"abstract":"Portable Document Format (PDF) is a popular choice for a secure communication and persistence of business information and is a universally accepted format by businesses choosing to become digital. PDF provides multiple ways to make the information visually appealing and readable, and device independent rendering. To achieve this, PDF stores metadata with individual text characters, graphic components and other layout elements. Such atomic component wise meta-data makes machine processing of information in the PDF format very challenging; the challenge is further extended due to the difficulty of stitching together the original semantics from the componentized information. We propose a generic approach for extracting the hierarchy of the document structure while separating the content from header and footer, and extracting metadata associated with checkboxes to annotate the business information contained in PDF for tasks like mining specifications and rules from the document. Our prototype is able to process real-life, large PDF documents each running into roughly 400 pages, with nearly 95% of the extraction requiring no human intervention.","PeriodicalId":138760,"journal":{"name":"15th Innovations in Software Engineering Conference","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126777243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supporting Readability by Comprehending the Hierarchical Abstraction of a Software Project","authors":"Avijit Bhattacharjee, B. Roy, Kevin A. Schneider","doi":"10.1145/3511430.3511441","DOIUrl":"https://doi.org/10.1145/3511430.3511441","url":null,"abstract":"Exploring the source code of a software system is a prevailing task that is frequently done by contributors to a system. Practitioners often use call graphs to aid in understanding the source code of an inadequately documented software system. Call graphs, when visualized, show caller and callee relationships between functions. A static call graph provides an overall structure of a software system and dynamic call graphs generated from dynamic execution logs can be used to trace program behaviour for a particular scenario. Unfortunately a call graph of an entire system can be very complicated and hard to understand. Hierarchically abstracting a call graph can be used to summarize an entire system’s structure and more easily comprehending function calls. In this work, we mine concepts from source code entities (functions) to generate a concept cluster tree with improved naming of cluster nodes to complement existing studies and facilitate more effective program comprehension for developers. We apply three different information retrieval techniques (TFIDF, LDA, and LSI) on function names and function name variants to label the nodes of a concept cluster tree generated by clustering execution paths. From our experiment in comparing automatic labelling with manual labeling by participants for 12 use cases, we found that among the techniques on average, TFIDF performs better with 64% matching. LDA and LSI had 37% and 23% matching respectively. In addition, using the words in function name variants performed at least 5% better in participant ratings for all three techniques on average for the use cases.","PeriodicalId":138760,"journal":{"name":"15th Innovations in Software Engineering Conference","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128115492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Raghav Mittal, Sai Anirudh Karre, Y. P. Gururaj, Y. R. Reddy
{"title":"Enhancing Configurable Limitless Paths in Virtual Reality Environments","authors":"Raghav Mittal, Sai Anirudh Karre, Y. P. Gururaj, Y. R. Reddy","doi":"10.1145/3511430.3511452","DOIUrl":"https://doi.org/10.1145/3511430.3511452","url":null,"abstract":"Locomotion in a virtual environment within a limited physical space is a complex activity. There exist established techniques to support limitless natural walking in virtual environments. These include Redirected walking, Dynamic path generation, and Walk-In-place technique, etc. PragPal is one such limitless path generation technique that supports natural walking in virtual environments. It is a novel software-based non-haptic locomotion technique. In this paper, we detail the enhancements to the existing PragPal path generation technique that addresses underlying issues in the technique like (1) path collision at angular turns, (2) effective usage of the physical play area, and (3) the ability to set path-width during path turns.","PeriodicalId":138760,"journal":{"name":"15th Innovations in Software Engineering Conference","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122380433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative Quality Score for Software","authors":"R. Nandakumar","doi":"10.1145/3511430.3511457","DOIUrl":"https://doi.org/10.1145/3511430.3511457","url":null,"abstract":"In this paper, need for a third-party software quality evaluation and certification is put forth. Software that are meant for use by public are required to have a formal quality certification based on evaluation by an unbiased third-party who are other than the software development or acquiring agencies. Formal software evaluation and certification are proposed to be carried out by an independent organisation exclusively established for this purpose, using crowd-sourcing. Qualified senior-citizens and user-group representatives contribute to the testing, evaluation and certification based on available software documents and templates for evaluation and certification. A quantitative quality-score is computed and assigned to each software under evaluation, prior to its release for operational use. Each major revision requires such certification. Detailed modalities of assigning a quantitative quality score are presented with particular emphasis on consideration of weights. Computing the quality score is based on software quality requirements specification, when it exists, otherwise based on subjective evaluation by individuals testing the software using a standard method.","PeriodicalId":138760,"journal":{"name":"15th Innovations in Software Engineering Conference","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122898282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Domain Specific Text Preprocessing for Open Information Extraction","authors":"Chandan Prakash, Pavan Kumar Chittimalli, Ravindra Naik","doi":"10.1145/3511430.3511456","DOIUrl":"https://doi.org/10.1145/3511430.3511456","url":null,"abstract":"Preprocessing is an integral part of Natural Language Processing (NLP) based applications. Standard preprocessing steps consist of removal of irrelevant, unwanted characters or parts of the text based on several observed patterns, while preserving the original intent of the text. We introduce domain-specific preprocessing to filter domain-irrelevant parts of the text while preserving the intended, semantically relevant meaning and syntactic correctness of the text. For this, we define multiple patterns using the dependency tree that represents the Natural Language text based on its dependency grammar. We applied this technique and the patterns to the United States retirement domain documents for open information extraction task as a pre-cursor for mining business product information and rules, and were able to reduce the document data aka information for analysis and mining by at least 13%, which enhanced the F1-score of relation extraction by a minimum of 16%.","PeriodicalId":138760,"journal":{"name":"15th Innovations in Software Engineering Conference","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116035123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Tiwari, S. Rathore, Sheikh Umar Farooq, Prutha Patani
{"title":"How Students Are Using GitHub? A Survey","authors":"S. Tiwari, S. Rathore, Sheikh Umar Farooq, Prutha Patani","doi":"10.1145/3511430.3511454","DOIUrl":"https://doi.org/10.1145/3511430.3511454","url":null,"abstract":"Recently, computer science educators have started adopting GitHub in teaching software engineering (SE) and other programming-related courses to impart the teamwork and collaboration aspects to the students when working on team projects. The educators aim to leverage the technical and social features of GitHub to deliver the course material effectively and to promote students’ collaboration, and monitor student activity on team projects. A few works have highlighted the benefits of using GitHub in student projects. However, the students’ perspective of adopting GitHub and using it for the SE and related courses is widely missing. To shed more light on this, the presented work investigates the students’ viewpoints of using GitHub and its adoption in classroom courses and further reports the benefits and drawbacks. We surveyed a total of 315 students, including undergraduate, postgraduate, and PhD students of computer science stream. The research method includes a survey and a qualitative analysis of students’ behaviour in the course. The analysis and findings reported in this paper provide several valuable insights on how students perceive and utilize the GitHub tool. Inherently, GitHub is not an educational tool. However, the findings reported in this work can be used to improve software engineering and computer science education, and also be helpful for instructors on how to use GitHub more effectively in their courses.","PeriodicalId":138760,"journal":{"name":"15th Innovations in Software Engineering Conference","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122530563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated SC-MCC Test Case Generation","authors":"Monika Rani Golla","doi":"10.1145/3511430.3511460","DOIUrl":"https://doi.org/10.1145/3511430.3511460","url":null,"abstract":"In the case of safety-critical applications, software testing plays an important role, as such applications failure results in huge financial losses or even human fatality. Hence, a systematic certification process, DO-178B that ensures safe software systems are adapted in the aviation industry. Currently, one of its key objectives is to hold satisfactory Multiple Condition/Decision Coverage(MC/DC) coverage. Among other code coverage criteria, the MC/DC is preferred due to its linear number of test case generation from N+1 to 2N, where N is the total number of Atomic Conditions in a Boolean expression. Indeed, this number is relatively better than the exponential test cases generated for Multiple Condition Coverage(MCC) i.e. 2N. However, since most of the safety-critical applications are being developed using high-level languages that have Short-Circuit evaluation property, there is no need to test an application by ignoring this property. Hence, MCC with Short-Circuit evaluation(SC-MCC) is recommended. In this research work, we aim to demonstrate the effectiveness of SC-MCC with the help of the well-known Automated Test Case Generation techniques such as Bounded Model Checker, Coverage Guided Fuzzing, Dynamic Symbolic Execution, and DSE with Interpolation.","PeriodicalId":138760,"journal":{"name":"15th Innovations in Software Engineering Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128407970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Commit-Checker: A human-centric approach for adopting bug inducing commit detection using machine learning models","authors":"Naz Zarreen Zarreen Oishie, B. Roy","doi":"10.1145/3511430.3511463","DOIUrl":"https://doi.org/10.1145/3511430.3511463","url":null,"abstract":"Software bug prediction is one of the promising research areas in software engineering. Software developers must allocate a reasonable amount of time and resources to test and debug the developed software extensively to improve software quality. However, it is not always possible to test software thoroughly with limited time and resources to develop high quality software. Sometimes software companies release software products in a hurry to make profit in a competitive environment. As a result the released software might have software defects and can affect the reputation of those software companies. Ideally, any software application that is already in the market should not contain bugs. If it does, depending on its severity, it might cause a great cost. Although a significant amount of work has been done to automate different parts of testing to detect bugs, fixing a bug after it is discovered is still a costly task that developers need to do. Sometimes these bug fixing changes introduce new bugs in the system. Researchers estimated that 80% of the total cost of a software system is spent on fixing bugs [8]. They show that the software faults and failures costs the US economy $59.5 billion a year [9].","PeriodicalId":138760,"journal":{"name":"15th Innovations in Software Engineering Conference","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134623983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature Transformation for Improved Software Bug Detection Models","authors":"Shamse Tasnim Cynthia, B. Roy, Debajyoti Mondal","doi":"10.1145/3511430.3511444","DOIUrl":"https://doi.org/10.1145/3511430.3511444","url":null,"abstract":"Testing software is considered to be one of the most crucial phases in software development life cycle. Software bug fixing requires a significant amount of time and effort. A rich body of recent research explored ways to predict bugs in software artifacts using machine learning based techniques. For a reliable and trustworthy prediction, it is crucial to also consider the explainability aspects of such machine learning models. In this paper, we show how the feature transformation techniques can significantly improve the prediction accuracy and build confidence in building bug prediction models. We propose a novel approach for improved bug prediction that first extracts the features, then finds a weighted transformation of these features using a genetic algorithm that best separates bugs from non-bugs when plotted in a low-dimensional space, and finally, trains the machine learning model using the transformed dataset. In our experiment with real-life bug datasets, the random forest and k-nearest neighbor classifier models that leveraged feature transformation showed 4.25% improvement in recall values on an average of over 8 software systems when compared to the models built on original data.","PeriodicalId":138760,"journal":{"name":"15th Innovations in Software Engineering Conference","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116622208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RMVRVM – A Paradigm for Creating Energy Efficient User Applications Connected to Cloud through REST API","authors":"Lavneet Singh","doi":"10.1145/3511430.3511434","DOIUrl":"https://doi.org/10.1145/3511430.3511434","url":null,"abstract":"The applications that run on resource-constrained devices, especially for batteries, pose a challenge. The activities such applications do while running on such devices consume energy and drain the device's battery. Many of these applications use REST API to communicate with their backend services running outside of the devices, primarily on the cloud. The paradigms like Model View View-Model (MVVM) used on the application side require data transformations that cause applications to consume more battery. There is a need for an improved approach and a paradigm that can be used to develop green software with reduced battery consumption. This paper proposes a novel Remote-Model View Remote-View-Model (RMVRVM) paradigm. The use of RMVRVM paradigm lowers the battery consumption on devices where the application is running and hence contributes to writing green software. In addition, RMVRVM makes an application more responsive and thus a delight to use. This paradigm has been implemented in industrial case studies, and significant gains in terms of the reduced amount of data transfer, reduced battery consumption, and faster response time were observed. Experiments were also done to further validate the paradigm with encouraging results. The practitioners can apply the RMVRVM to design applications for battery-constrained devices with smaller energy footprints and better response times.","PeriodicalId":138760,"journal":{"name":"15th Innovations in Software Engineering Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128887906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}