GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation ConferenceFull Citation in the ACM Digital Library
SESSION: Search-based software engineering
Search-based unit test generation applies evolutionary search to maximize code coverage. Although the performance of this approach is often good, sometimes it is not, and how the fitness landscape affects this performance is poorly understood. This paper presents a thorough analysis of 331 Java classes by (i) characterizing their fitness landscape using six established fitness landscape measures, (ii) analyzing the impact of these fitness landscape measures on the search, and (iii) investigating the underlying properties of the source code influencing these measures. Our results reveal that classical indicators for rugged fitness landscapes suggest well searchable problems in the case of unit test generation, but the fitness landscape for most problem instances is dominated by detrimental plateaus. A closer look at the underlying source code suggests that these plateaus are frequently caused by code in private methods, methods throwing exceptions, and boolean flags. This suggests that inter-procedural distance metrics and testability transformations could improve search-based test generation.
Heuristic-based search techniques have been increasingly used to automate different aspects of software testing. Several studies suggest that variable interdependencies may exist in branching conditions of real-life programs, and these dependencies result in the need for highly precise data values (such as of the form i=j=k) for code coverage analysis. This requirement makes it very difficult for Genetic Algorithm (GA)-based approach to successfully search for the required test data from vast search spaces of real-life programs.
Ariadne is the only Grammatical Evolution (GE)-based test data generation system, proposed to date, that uses grammars to exploit variable interdependencies to improve code coverage. Ariadne has been compared favourably to other well-known test data generation techniques in the literature; however, its scalability has not yet been tested for increasingly complex programs.
This paper presents the results of a rigorous analysis performed to examine Ariadne's scalability. We also designed and employed a large set of highly scalable 18 benchmark programs for our experiments. Our results suggest that Ariadne is highly scalable as it exhibited 100% coverage across all the programs of increasing complexity with significantly smaller search costs than GA-based approaches, which failed even with huge search budgets.
Seeding strategies for multi-objective test case selection: an application on simulation-based testing
The time it takes software systems to be tested is usually long. This is often caused by the time it takes the entire test suite to be executed. To optimize this, regression test selection approaches have allowed for improvements to the cost-effectiveness of verification and validation activities in the software industry. In this area, multi-objective algorithms have played a key role in selecting the appropriate subset of test cases from the entire test suite. In this paper, we propose a set of seeding strategies for the test case selection problem that generate the initial population of multi-objective algorithms. We integrated these seeding strategies with an NSGA-II algorithm for solving the test case selection problem in the context of simulation-based testing. We evaluated the strategies with six case studies and a total of 21 fitness combinations for each case study (i.e., a total of 126 problems). Our evaluation suggests that these strategies are indeed helpful for solving the multi-objective test case selection problem. In fact, two of the proposed seeding strategies outperformed the NSGA-II algorithm without seeding population with statistical significance for 92.8 and 96% of the problems.
The optimisation of software energy consumption is of growing importance across all scales of modern computing, i.e., from embedded systems to data-centres. Practitioners in the field of Search-Based Software Engineering and Genetic Improvement of Software acknowledge that optimising software energy consumption is difficult due to noisy and expensive fitness evaluations. However, it is apparent from results to date that more progress needs to be made in rigorously validating optimisation results. This problem is pressing because modern computing platforms have highly complex and variable behaviour with respect to energy consumption. To compare solutions fairly we propose in this paper a new validation approach called R3-validation which exercises software variants in a rotated-round-robin order. Using a case study, we present an in-depth analysis of the impacts of changing system states on software energy usage, and we show how R3-validation mitigates these. We compare it with current validation approaches across multiple devices and operating systems, and we show that it aligns best with actual platform behaviour.
It is imperative for testing to determine if the components within large-scale software systems operate functionally. Interaction testing involves designing a suite of tests, which guarantees to detect a fault if one exists among a small number of components interacting together. The cost of this testing is typically modeled by the number of tests, and thus much effort has been taken in reducing this number. Here, we incorporate redundancy into the model, which allows for testing in non-deterministic environments. Existing algorithms for constructing these test suites usually involve one "fast" algorithm for generating most of the tests, and another "slower" algorithm to "complete" the test suite. We employ a genetic algorithm that generalizes these approaches that also incorporates redundancy by increasing the number of algorithms chosen, which we call "stages." By increasing the number of stages, we show that not only can the number of tests be reduced compared to existing techniques, but the computational time in generating them is also greatly reduced.
The Product Line Architecture (PLA) is one of the most important artifacts of a Software Product Line. PLA designing has been formulated as a multi-objective optimization problem and successfully solved by a state-of-the-art search-based approach. However, the majority of empirical studies optimize PLA designs without applying one of the fundamental genetic operators: the crossover. An operator for PLA design, named Feature-driven Crossover, was proposed in a previous study. In spite of the promising results, this operator occasionally generated incomplete solutions. To overcome these limitations, this paper aims to enhance the search-based PLA design optimization by improving the Feature-driven Crossover and introducing a novel crossover operator specific for PLA design. The proposed operators were evaluated in two well-studied PLA designs, using three experimental configurations of NSGA-II in comparison with a baseline that uses only mutation operators. Empirical results show the usefulness and efficiency of the presented operators on reaching consistent solutions. We also observed that the two operators complement each other, leading to PLA design solutions with better feature modularization than the baseline experiment.