GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation ConferenceFull Citation in the ACM Digital Library
SESSION: Complex systems (artificial life/artificial immune systems/generative and developmental systems/evolutionary robotics/evolvable hardware)
The MAP-Elites quality-diversity algorithm has been successful in robotics because it can create a behaviorally diverse set of solutions that later can be used for adaptation, for instance to unanticipated damages. In MAP-Elites, the choice of the behaviour space is essential for adaptation, the recovery of performance in unseen environments, since it defines the diversity of the solutions. Current practice is to hand-code a set of behavioural features, however, given the large space of possible behaviour-performance maps, the designer does not know a priori which behavioural features maximise a map's adaptation potential. We introduce a new meta-evolution algorithm that discovers those behavioural features that maximise future adaptations. The proposed method applies Covariance Matrix Adaptation Evolution Strategy to evolve a population of behaviour-performance maps to maximise a meta-fitness function that rewards adaptation. The method stores solutions found by MAP-Elites in a database which allows to rapidly construct new behaviour-performance maps on-the-fly. To evaluate this system, we study the gait of the RHex robot as it adapts to a range of damages sustained on its legs. When compared to MAP-Elites with user-defined behaviour spaces, we demonstrate that the meta-evolution system learns high-performing gaits with or without damages injected to the robot.
Minimal Criterion Coevolution (MCC) is a recently-introduced algorithm that demonstrates how interactions between two populations, each subject to a simple reproductive constraint, can produce an open-ended search process. Unlike conventional quality diversity (QD) algorithms, which also promote divergence, MCC does not require an explicit characterization of behavior or a comparison of performance, thereby addressing bottlenecks introduced by an intrinsically-finite behavior descriptor and by an assessment of comparative quality. Genetic speciation, a common method of diversity preservation, maintains population diversity in MCC; however, it requires an unnatural explicit comparison of genetic similarity. In nature, organisms are implicitly segregated into niches that each have a carrying capacity dictated by the amount of available resources. To show that MCC can be simpler and more natural while still working effectively, this paper introduces a method of diversity preservation through resource limitation, thereby alleviating the need to formalize and compare genetic distance. Experimental results in a maze navigation domain demonstrate that resource limitation not only maintains higher population diversity in both the maze and agent populations, but also accelerates evolution by forcing individuals to explore new niches, thereby suggesting that resource limitation is an effective, simpler, and more natural alternative for diversity preservation in MCC.
Quality-Diversity (QD) algorithms, and MAP-Elites (ME) in particular, have proven very useful for a broad range of applications including enabling real robots to recover quickly from joint damage, solving strongly deceptive maze tasks or evolving robot morphologies to discover new gaits. However, present implementations of ME and other QD algorithms seem to be limited to low-dimensional controllers with far fewer parameters than modern deep neural network models. In this paper, we propose to leverage the efficiency of Evolution Strategies (ES) to scale MAP-Elites to high-dimensional controllers parameterized by large neural networks. We design and evaluate a new hybrid algorithm called MAP-Elites with Evolution Strategies (ME-ES) for post-damage recovery in a difficult high-dimensional control task where traditional ME fails. Additionally, we show that ME-ES performs efficient exploration, on par with state-of-the-art exploration algorithms in high-dimensional control tasks with strongly deceptive rewards.
Securities markets are quintessential complex adaptive systems in which heterogeneous agents compete in an attempt to maximize returns. Species of trading agents are also subject to evolutionary pressure as entire classes of strategies become obsolete and new classes emerge. Using an agent-based model of interacting heterogeneous agents as a flexible environment that can endogenously model many diverse market conditions, we subject deep neural networks to evolutionary pressure to create dominant trading agents. After analyzing the performance of these agents and noting the emergence of anomalous superdiffusion through the evolutionary process, we construct a method to turn high-fitness agents into trading algorithms. We backtest these trading algorithms on real high-frequency foreign exchange data, demonstrating that elite trading algorithms are consistently profitable in a variety of market conditions---even though these algorithms had never before been exposed to real financial data. These results provide evidence to suggest that developing ab initio trading strategies by repeated simulation and evolution in a mechanistic market model may be a practical alternative to explicitly training models with past observed market data.
Evolvability is an important feature that impacts the ability of evolutionary processes to find interesting novel solutions and to deal with changing conditions of the problem to solve. The estimation of evolvability is not straight-forward and is generally too expensive to be directly used as selective pressure in the evolutionary process. Indirectly promoting evolvability as a side effect of other easier and faster to compute selection pressures would thus be advantageous. In an unbounded behavior space, it has already been shown that evolvable individuals naturally appear and tend to be selected as they are more likely to invade empty behavior niches. Evolvability is thus a natural byproduct of the search in this context. However, practical agents and environments often impose limits on the reachable behavior space. How do these boundaries impact evolvability? In this context, can evolvability still be promoted without explicitly rewarding it? We show that Novelty Search implicitly creates a pressure for high evolvability even in bounded behavior spaces, and explore the reasons for such a behavior. More precisely we show that, throughout the search, the dynamic evaluation of novelty rewards individuals which are very mobile in the behavior space, which in turn promotes evolvability.
We focus on the challenge of finding a diverse collection of quality solutions on complex continuous domains. While quality diversity (QD) algorithms like Novelty Search with Local Competition (NSLC) and MAP-Elites are designed to generate a diverse range of solutions, these algorithms require a large number of evaluations for exploration of continuous spaces. Meanwhile, variants of the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) are among the best-performing derivative-free optimizers in single-objective continuous domains. This paper proposes a new QD algorithm called Covariance Matrix Adaptation MAP-Elites (CMA-ME). Our new algorithm combines the self-adaptation techniques of CMA-ES with archiving and mapping techniques for maintaining diversity in QD. Results from experiments based on standard continuous optimization benchmarks show that CMA-ME finds better-quality solutions than MAP-Elites; similarly, results on the strategic game Hearthstone show that CMA-ME finds both a higher overall quality and broader diversity of strategies than both CMA-ES and MAP-Elites. Overall, CMA-ME more than doubles the performance of MAP-Elites using standard QD performance metrics. These results suggest that QD algorithms augmented by operators from state-of-the-art optimization algorithms can yield high-performing methods for simultaneously exploring and optimizing continuous search spaces, with significant applications to design, testing, and reinforcement learning among other domains.
The encoding of solutions in black-box optimization is a delicate, handcrafted balance between expressiveness and domain knowledge --- between exploring a wide variety of solutions, and ensuring that those solutions are useful. Our main insight is that this process can be automated by generating a dataset of high-performing solutions with a quality diversity algorithm (here, MAP-Elites), then learning a representation with a generative model (here, a Variational Autoencoder) from that dataset. Our second insight is that this representation can be used to scale quality diversity optimization to higher dimensions --- but only if we carefully mix solutions generated with the learned representation and those generated with traditional variation operators. We demonstrate these capabilities by learning an low-dimensional encoding for the inverse kinematics of a thousand joint planar arm. The results show that learned representations make it possible to solve high-dimensional problems with orders of magnitude fewer evaluations than the standard MAP-Elites, and that, once solved, the produced encoding can be used for rapid optimization of novel, but similar, tasks. The presented techniques not only scale up quality diversity algorithms to high dimensions, but show that black-box optimization encodings can be automatically learned, rather than hand designed.
Voxel-based soft robots (VSRs) are aggregations of elastic, cubic blocks that have sparkled the interest of Robotics and Artificial Life researchers. VSRs can move by varying the volume of individual blocks, according to control signals dictated by a controller, possibly based on inputs coming from sensors embedded in the blocks. Neural networks (NNs) have been used as centralized processing units for those sensing controllers, with weights optimized using evolutionary computation. This structuring breaks the intrinsic modularity of VSRs: decomposing a VSR into modules to be assembled in a different way is very hard.
In this work we propose an alternative approach that enables full modularity and is based on a distributed neural controller. Each block contains a small NN that outputs signals to adjacent blocks and controls the local volume, based on signals from adjacent blocks and on local sensor readings. We show experimentally for the locomotion task that our controller is as effective as the centralized one. Our experiments also suggest that the proposed framework indeed allows exploiting modularity: VSRs composed of pre-trained parts (body and controller) can be evolved more efficiently than starting from scratch.
Quality Diversity (QD) algorithms are a recent family of optimization algorithms that search for a large set of diverse but high-performing solutions. In some specific situations, they can solve multiple tasks at once. For instance, they can find the joint positions required for a robotic arm to reach a set of points, which can also be solved by running a classic optimizer for each target point. However, they cannot solve multiple tasks when the fitness needs to be evaluated independently for each task (e.g., optimizing policies to grasp many different objects). In this paper, we propose an extension of the MAP-Elites algorithm, called Multi-task MAP-Elites, that solves multiple tasks when the fitness function depends on the task. We evaluate it on a simulated parametrized planar arm (10-dimensional search space; 5000 tasks) and on a simulated 6-legged robot with legs of different lengths (36-dimensional search space; 2000 tasks). The results show that in both cases our algorithm outperforms the optimization of each task separately with the CMA-ES algorithm.
A critical issue in evolutionary robotics is the transfer of controllers learned in simulation to reality. This is especially the case for small Unmanned Aerial Vehicles (UAVs), as the platforms are highly dynamic and susceptible to breakage. Previous approaches often require simulation models with a high level of accuracy, otherwise significant errors may arise when the well-designed controller is being deployed onto the targeted platform. Here we try to overcome the transfer problem from a different perspective, by designing a spiking neurocontroller which uses synaptic plasticity to cross the reality gap via online adaptation. Through a set of experiments we show that the evolved plastic spiking controller can maintain its functionality by self-adapting to model changes that take place after evolutionary training, and consequently exhibit better performance than its non-plastic counterpart.