# GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference

Full Citation in the ACM Digital Library## SESSION: General evolutionary computation and hybrids

### Using implicit multi-objectives properties to mitigate against forgetfulness in coevolutionary algorithms

It had been noticed that, while coevolutionary computational systems have only a single objective when evaluating, there is a subtle multi-objective aspect to evaluation since different pairings can be thought of as different objectives (all in support of the single original objective). Previously researchers used this to identify pairings of individuals during evaluation within a single generation. However, because of the problems of forgetfulness and the Red-Queen effect, this does not allow for the proper control that the technique promises. In this research, this implicit multi-objective approach is extended to function between generations as well as within. This makes it possible to implement a more powerful form of elitism as well as mitigate against some of the pathologies of Coevolutionary systems that forgetfulness and the Red-Queen effect engender, thus providing more robust solutions.

### Initial design strategies and their effects on sequential model-based optimization: an exploratory case study based on BBOB

Sequential model-based optimization (SMBO) approaches are algorithms for solving problems that require computationally or otherwise expensive function evaluations. The key design principle of SMBO is a substitution of the true objective function by a surrogate, which is used to propose the point(s) to be evaluated next.

SMBO algorithms are intrinsically modular, leaving the user with many important design choices. Significant research efforts go into understanding which settings perform best for which type of problems. Most works, however, focus on the choice of the model, the acquisition function, and the strategy used to optimize the latter. The choice of the initial sampling strategy, however, receives much less attention. Not surprisingly, quite diverging recommendations can be found in the literature.

We analyze in this work how the size and the distribution of the initial sample influences the overall quality of the efficient global optimization (EGO) algorithm, a well-known SMBO approach. While, overall, small initial budgets using Halton sampling seem preferable, we also observe that the performance landscape is rather unstructured. We furthermore identify several situations in which EGO performs unfavorably against random sampling. Both observations indicate that an adaptive SMBO design could be beneficial, making SMBO an interesting test-bed for automated algorithm design.

### ϵ-shotgun: ϵ-greedy batch bayesian optimisation

Bayesian optimisation is a popular surrogate model-based approach for optimising expensive
black-box functions. Given a surrogate model, the next location to expensively evaluate
is chosen via maximisation of a cheap-to-query acquisition function. We present an
*ϵ*-greedy procedure for Bayesian optimisation in batch settings in which the black-box
function can be evaluated multiple times in parallel. Our *ϵ*-shotgun algorithm leverages the model's prediction, uncertainty, and the approximated
rate of change of the landscape to determine the spread of batch solutions to be distributed
around a putative location. The initial target location is selected either in an exploitative
fashion on the mean prediction, or - with probability *ϵ* - from elsewhere in the design space. This results in locations that are more densely
sampled in regions where the function is changing rapidly and in locations predicted
to be good (i.e. close to predicted optima), with more scattered samples in regions
where the function is flatter and/or of poorer quality. We empirically evaluate the
*ϵ*-shotgun methods on a range of synthetic functions and two real-world problems, finding
that they perform at least as well as state-of-the-art batch methods and in many cases
exceed their performance.

### Bivariate estimation-of-distribution algorithms can find an exponential number of optima

Finding a large set of optima in a multimodal optimization landscape is a challenging task. Classical population-based evolutionary algorithms (EAs) typically converge only to a single solution. While this can be counteracted by applying niching strategies, the number of optima is nonetheless trivially bounded by the population size.

Estimation-of-distribution algorithms (EDAs) are an alternative, maintaining a probabilistic model of the solution space instead of an explicit population. Such a model is able to implicitly represent a solution set that is far larger than any realistic population size.

To support the study of how optimization algorithms handle large sets of optima, we
propose the test function EqalBlocksOneMax (EBOM). It has an easy to optimize fitness
landscape, however, with an exponential number of optima. We show that the bivariate
EDA *mutual-information-maximizing input clustering* (MIMIC), without any problem-specific modification, quickly generates a model that
behaves very similarly to a theoretically ideal model for that function, which samples
each of the exponentially many optima with the same maximal probability.

### From understanding genetic drift to a smart-restart parameter-less compact genetic algorithm

One of the key difficulties in using estimation-of-distribution algorithms is choosing the population sizes appropriately: Too small values lead to genetic drift, which can cause enormous difficulties. In the regime with no genetic drift, however, often the runtime is roughly proportional to the population size, which renders large population sizes inefficient.

Based on a recent quantitative analysis which population sizes lead to genetic drift, we propose a parameter-less version of the compact genetic algorithm that automatically finds a suitable population size without spending too much time in situations unfavorable due to genetic drift.

We prove an easy mathematical runtime guarantee for this algorithm and conduct an extensive experimental analysis on four classic benchmark problems. The former shows that under a natural assumption, our algorithm has a performance similar to the one obtainable from the best population size. The latter confirms that missing the right population size can be highly detrimental and shows that our algorithm as well as a previously proposed parameter-less one based on parallel runs avoids such pitfalls. Comparing the two approaches, ours profits from its ability to abort runs which are likely to be stuck in a genetic drift situation.

### Effective reinforcement learning through evolutionary surrogate-assisted prescription

There is now significant historical data available on decision making in organizations, consisting of the decision problem, what decisions were made, and how desirable the outcomes were. Using this data, it is possible to learn a surrogate model, and with that model, evolve a decision strategy that optimizes the outcomes. This paper introduces a general such approach, called Evolutionary Surrogate-Assisted Prescription, or ESP. The surrogate is, for example, a random forest or a neural network trained with gradient descent, and the strategy is a neural network that is evolved to maximize the predictions of the surrogate model. ESP is further extended in this paper to sequential decision-making tasks, which makes it possible to evaluate the framework in reinforcement learning (RL) benchmarks. Because the majority of evaluations are done on the surrogate, ESP is more sample efficient, has lower variance, and lower regret than standard RL approaches. Surprisingly, its solutions are also better because both the surrogate and the strategy network regularize the decision making behavior. ESP thus forms a promising foundation to decision optimization in real-world problems.

### Analysis of the performance of algorithm configurators for search heuristics with global mutation operators

Recently it has been proved that a simple algorithm configurator called ParamRLS can
efficiently identify the optimal neighbourhood size to be used by stochastic local
search to optimise two standard benchmark problem classes. In this paper we analyse
the performance of algorithm configurators for tuning the more sophisticated global
mutation operator used in standard evolutionary algorithms, which flips each of the
*n* bits independently with probability *χ/n* and the best value for *χ* has to be identified. We compare the performance of configurators when the best-found
fitness values within the cutoff time *k* are used to compare configurations against the actual optimisation time for two standard
benchmark problem classes, Ridge and LeadingOnes. We rigorously prove that all algorithm
configurators that use optimisation time as performance metric require cutoff times
that are at least as large as the expected optimisation time to identify the optimal
configuration. Matters are considerably different if the fitness metric is used. To
show this we prove that the simple ParamRLS-F configurator can identify the optimal
mutation rates even when using cutoff times that are considerably smaller than the
expected optimisation time of the best parameter value for both problem classes.

### On the choice of the parameter control mechanism in the (1+(λ, λ)) genetic algorithm

The self-adjusting (1 + (*λ, λ*)) GA is the best known genetic algorithm for problems with a good fitness-distance
correlation as in OneMax. It uses a parameter control mechanism for the parameter
*λ* that governs the mutation strength and the number of offspring. However, on multimodal
problems, the parameter control mechanism tends to increase *λ* uncontrollably.

We study this problem and possible solutions to it using rigorous runtime analysis
for the standard Jump_{k} benchmark problem class. The original algorithm behaves like a (1+*n*) EA whenever the maximum value *λ* = *n* is reached. This is ineffective for problems where large jumps are required. Capping
*λ* at smaller values is beneficial for such problems. Finally, resetting *λ* to 1 allows the parameter to cycle through the parameter space. We show that this
strategy is effective for all Jump_{k} problems: the (1 + (*λ, λ*)) GA performs as well as the (1 + 1) EA with the optimal mutation rate and fast evolutionary
algorithms, apart from a small polynomial overhead.

Along the way, we present new general methods for bounding the runtime of the (1 +
(*λ, λ*)) GA that allows to translate existing runtime bounds from the (1 + 1) EA to the
self-adjusting (1 + (*λ, λ*)) GA. Our methods are easy to use and give upper bounds for novel classes of functions.

### Landscape-aware fixed-budget performance regression and algorithm selection for modular CMA-ES variants

Automated algorithm selection promises to support the user in the decisive task of selecting a most suitable algorithm for a given problem. A common component of these machine-trained techniques are regression models which predict the performance of a given algorithm on a previously unseen problem instance. In the context of numerical black-box optimization, such regression models typically build on exploratory landscape analysis (ELA), which quantifies several characteristics of the problem. These measures can be used to train a supervised performance regression model.

First steps towards ELA-based performance regression have been made in the context of a fixed-target setting. In many applications, however, the user needs to select an algorithm that performs best within a given budget of function evaluations. Adopting this fixed-budget setting, we demonstrate that it is possible to achieve high-quality performance predictions with off-the-shelf supervised learning approaches, by suitably combining two differently trained regression models. We test this approach on a very challenging problem: algorithm selection on a portfolio of very similar algorithms, which we choose from the family of modular CMA-ES algorithms.

### Algorithm selection of anytime algorithms

Anytime algorithms for optimization problems are of particular interest since they
allow to trade off execution time with result quality. However, the selection of the
*best* anytime algorithm for a given problem instance has been focused on a particular budget
for execution time or particular target result quality. Moreover, it is often assumed
that these anytime preferences are known when developing or training the algorithm
selection methodology. In this work, we study the algorithm selection problem in a
context where the decision maker's anytime preferences are defined by a general utility
function, and only known at the time of selection. To this end, we first examine how
to measure the performance of an anytime algorithm with respect to this utility function.
Then, we discuss approaches for the development of selection methodologies that receive
a utility function as an argument at the time of selection. Then, to illustrate one
of the discussed approaches, we present a preliminary study on the selection between
an exact and a heuristic algorithm for a bi-objective knapsack problem. The results
show that the proposed methodology has an accuracy greater than 96% in the selected
scenarios, but we identify room for improvement.

### CMA-ES for one-class constraint synthesis

We propose CMA-ES for One-Class Constraint Synthesis (CMAESOCCS), a method that synthesizes Mixed-Integer Linear Programming (MILP) model from exemplary feasible solutions to this model using Covariance Matrix Adaptation - Evolutionary Strategy (CMA-ES). Given a one-class training set, CMAESOCCS adaptively detects partitions in this set, synthesizes independent Linear Programming models for all partitions and merges these models into a single MILP model. CMAESOCCS is evaluated experimentally using synthetic problems. A practical use case of CMAESOCCS is demonstrated based on a problem of synthesis of a model for a rice farm. The obtained results are competitive when compared to a state-of-the-art method.

### Expected improvement versus predicted value in surrogate-based optimization

Surrogate-based optimization relies on so-called infill criteria (acquisition functions) to decide which point to evaluate next. When Kriging is used as the surrogate model of choice (also called Bayesian optimization), one of the most frequently chosen criteria is expected improvement. We argue that the popularity of expected improvement largely relies on its theoretical properties rather than empirically validated performance. Few results from the literature show evidence, that under certain conditions, expected improvement may perform worse than something as simple as the predicted value of the surrogate model. We benchmark both infill criteria in an extensive empirical study on the 'BBOB' function set. This investigation includes a detailed study of the impact of problem dimensionality on algorithm performance. The results support the hypothesis that exploration loses importance with increasing problem dimensionality. A statistical analysis reveals that the purely exploitative search with the predicted value criterion performs better on most problems of five or higher dimensions. Possible reasons for these results are discussed. In addition, we give an in-depth guide for choosing the infill criteria based on prior knowledge about the problem at hand, its dimensionality, and the available budget.

### Model-based optimization with concept drifts

Model-based Optimization (MBO) is a method to optimize expensive black-box functions that uses a surrogate to guide the search. We propose two practical approaches that allow MBO to optimize black-box functions where the relation between input and output changes over time, which are known as dynamic optimization problems (DOPs). The window approach trains the surrogate only on the most recent observations, and the time-as-covariate approach includes the time as an additional input variable in the surrogate, giving it the ability to learn the effect of the time on the outcomes. We focus on problems where the change happens systematically and label this systematic change concept drift. To benchmark our methods we define a set of benchmark functions built from established synthetic static functions that are extended with controlled drifts. We evaluate how the proposed approaches handle scenarios of no drift, sudden drift and incremental drift. The results show that both new methods improve the performance if a drift is present. For higher-dimensional multimodal problems the window approach works best and on lower-dimensional problems, where it is easier for the surrogate to capture the influence of the time, the time-as-covariate approach works better.

### An evolutionary optimization algorithm for gradually saturating objective functions

Evolutionary algorithms have been actively studied for dynamic optimization problems in the last two decades, however the research is mainly focused on problems with large, periodical or abrupt changes during the optimization. In contrast, this paper concentrates on gradually changing environments with an additional imposition of a saturating objective function. This work is motivated by an evolutionary neural architecture search methodology where a population of Convolutional Neural Networks (CNNs) is evaluated and iteratively modified using genetic operators during the training process. The objective of the search, namely the prediction accuracy of a CNN, is a continuous and slow moving target, increasing with each training epoch and eventually saturating when the training is nearly complete. Population diversity is an important consideration in dynamic environments wherein a large diversity restricts the algorithm from converging to a small area of the search space while the environment is still transforming. Our proposed algorithm adaptively influences the population diversity, depending on the rate of change of the objective function, using disruptive crossovers and non-elitist population replacements. We compare the results of our algorithm with a traditional evolutionary algorithm and demonstrate that the proposed modifications improve the algorithm performance in gradually saturating dynamic environments.

### Sensitivity analysis in constrained evolutionary optimization

Sensitivity analysis deals with the question of how changes in input parameters of a model affect its outputs. For constrained optimization problems, one question may be how variations in budget or capacity constraints influence the optimal solution value. Although well established in the domain of linear programming, it is hardly addressed in evolutionary computation. In this paper, a general approach is proposed which allows to identify how the outcome of an evolutionary algorithm is affected when model parameters, such as constraints, are changed. Using evolutionary bilevel optimization in combination with data mining and visualization techniques, the recently suggested concept of bilevel innovization allows to find trade-offs among constraints and objective value. Additionally, it enables decision-makers to gain insights into the overall model behavior under changing framework conditions. The concept of bilevel innovization as a tool for sensitivity analysis is illustrated, without loss of generality, by the example of the multidimensional knapsack problem. The experimental results show that by applying bilevel innovization it is possible to determine how the solution values are influenced by changes of different constraints. Furthermore, rules were obtained that provide information on how parameters can be modified to achieve efficient trade-offs between constraints and objective value.

### Integrated vs. sequential approaches for selecting and tuning CMA-ES variants

When faced with a specific optimization problem, deciding which algorithm to apply is always a difficult task. Not only is there a vast variety of algorithms to select from, but these algorithms are often controlled by many hyperparameters, which need to be suitably tuned in order to achieve peak performance. Usually, the problem of selecting and configuring the optimization algorithm is addressed sequentially, by first selecting a suitable algorithm and then tuning it for the application at hand. Integrated approaches, commonly known as Combined Algorithm Selection and Hyperparameter (CASH) solvers, have shown promise in several applications.

In this work we compare sequential and integrated approaches for selecting and tuning the best out of the 4,608 variants of the modular Covariance Matrix Adaptation Evolution Strategy (CMA-ES). We show that the ranking of these variants depends to a large extent on the quality of the hyperparameters. Sequential approaches are therefore likely to recommend sub-optimal choices. Integrated approaches, in contrast, manage to provide competitive results at much smaller computational cost. We also highlight important differences in the search behavior of two CASH approaches, which build on racing (irace) and on model-based optimization (MIP-EGO), respectively.