GECCO '19- Proceedings of the Genetic and Evolutionary Computation ConferenceFull Citation in the ACM Digital Library
SESSION: Complex systems (artificial life/artificial immune systems/generative and developmental systems/evolutionary robotics/evolvable hardware)
It has been long known from the theoretical work on evolution strategies, that recombination improves convergence towards better solution and improves robustness against selection error in noisy environment. We propose to investigate the effect of recombination in online embodied evolutionary robotics, where evolution is decentralized on a swarm of agents. We hypothesize that these properties can also be observed in these algorithms and thus could improve their performance. We introduce the (μ/μ, 1)-On-line EEA which use a recombination operator inspired from evolution strategies and apply it to learn three different collective robotics tasks, locomotion, item collection and item foraging. Different recombination operators are investigated and compared against a purely mutative version of the algorithm. The experiments show that, when correctly designed, recombination improves significantly the adaptation of the swarm in all scenarios.
Minimal criterion coevolution (MCC) was recently introduced to show that a very simple criterion can lead to an open-ended expansion of two coevolving populations. Inspired by the simplicity of striving to survive and reproduce in nature, in MCC there are few of the usual mechanisms of quality diversity algorithms: no explicit novelty, no fitness function, and no local competition. While the idea that a simple minimal criterion could produce quality diversity on its own is provocative, its initial demonstration on mazes and maze solvers was limited because the size of the potential mazes was static, effectively capping the potential for complexity to increase. This paper overcomes this limitation to make two significant contributions to the field: (1) By introducing a completely novel maze encoding with higher-quality mazes that allow indefinite expansion in size and complexity, it offers for the first time a viable, computationally cheap domain for benchmarking open-ended algorithms, and (2) it leverages this new domain to show for the first time a succession of mazes that increase in size indefinitely while solutions continue to appear. With this initial result, a baseline is now established that can help researchers to begin to mark progress in the field systematically.
Quality-Diversity optimization is a new family of optimization algorithms that, instead of searching for a single optimal solution to solving a task, searches for a large collection of solutions that all solve the task in a different way. This approach is particularly promising for learning behavioral repertoires in robotics, as such a diversity of behaviors enables robots to be more versatile and resilient. However, these algorithms require the user to manually define behavioral descriptors, which is used to determine whether two solutions are different or similar. The choice of a behavioral descriptor is crucial, as it completely changes the solution types that the algorithm derives. In this paper, we introduce a new method to automatically define this descriptor by combining Quality-Diversity algorithms with unsupervised dimensionality reduction algorithms. This approach enables robots to autonomously discover the range of their capabilities while interacting with their environment. The results from two experimental scenarios demonstrate that robot can autonomously discover a large range of possible behaviors, without any prior knowledge about their morphology and environment. Furthermore, these behaviors are deemed to be similar to hand-crafted solutions that uses domain knowledge and significantly more diverse than when using existing unsupervised methods.
Financial asset markets are sociotechnical systems whose constituent agents are subject to evolutionary pressure as unprofitable agents exit the marketplace and more profitable agents continue to trade assets. Using a population of evolving zero-intelligence agents and a frequent batch auction price-discovery mechanism as substrate, we analyze the role played by evolutionary selection mechanisms in determining macro-observable market statistics. Specifically, we show that selection mechanisms incorporating a local fitness-proportionate component are associated with high correlation between a micro, risk-aversion parameter and a commonly-used macro-volatility statistic, while a purely quantile-based selection mechanism shows significantly less correlation and is associated with higher absolute levels of fitness (profit) than other selection mechanisms. These results point the way to a possible restructuring of market incentives toward reduction in market-wide worst performance, leading profit-driven agents to behave in ways that are associated with beneficial macro-level outcomes.
Novelty Search is an exploration algorithm driven by the novelty of a behavior. The same individual evaluated at different generations has different fitness values. The corresponding fitness landscape is thus constantly changing and if, at the scale of a single generation, the metaphor of a fitness landscape with peaks and valleys still holds, this is not the case anymore at the scale of the whole evolutionary process. How does this kind of algorithms behave? Is it possible to define a model that would help understand how it works? This understanding is critical to analyse existing Novelty Search variants and design new and potentially more efficient ones. We assert that Novelty Search asymptotically behaves like a uniform random search process in the behavior space. This is an interesting feature, as it is not possible to directly sample in this space: the algorithm has a direct access to the genotype space only, whose relationship to the behavior space is complex. We describe the model and check its consistency on a classical Novelty Search experiment. We also show that it sheds a new light on results of the literature and suggests future research work.
Designing evolutionary algorithms capable of uncovering highly evolvable representations is an open challenge in evolutionary computation; such evolvability is important in practice, because it accelerates evolution and enables fast adaptation to changing circumstances. This paper introduces evolvability ES, an evolutionary algorithm designed to explicitly and efficiently optimize for evolvability, i.e. the ability to further adapt. The insight is that it is possible to derive a novel objective in the spirit of natural evolution strategies that maximizes the diversity of behaviors exhibited when an individual is subject to random mutations, and that efficiently scales with computation. Experiments in 2-D and 3-D locomotion tasks highlight the potential of evolvability ES to generate solutions with tens of thousands of parameters that can quickly be adapted to solve different tasks and that can productively seed further evolution. We further highlight a connection between evolvability in EC and a recent and popular gradient-based meta-learning algorithm called MAML; results show that evolvability ES can perform competitively with MAML and that it discovers solutions with distinct properties. The conclusion is that evolvability ES opens up novel research directions for studying and exploiting the potential of evolvable representations for deep neural networks.
The initial phase in real world engineering optimization and design is a process of discovery in which not all requirements can be made in advance, or are hard to formalize. Quality diversity algorithms, which produce a variety of high performing solutions, provide a unique chance to support engineers and designers in the search for what is possible and high performing. In this work we begin to answer the question how a user can interact with quality diversity and turn it into an interactive innovation aid. By modeling a user's selection it can be determined whether the optimization is drifting away from the user's preferences. The optimization is then constrained by adding a penalty to the objective function. We present an interactive quality diversity algorithm that can take into account the user's selection. The approach is evaluated in a new multimodal optimization benchmark that allows various optimization tasks to be performed. The user selection drift of the approach is compared to a state of the art alternative on both a planning and a neuroevolution control task, thereby showing its limits and possibilities.
This paper studies the effects of different environments on morphological and behavioral properties of evolving populations of modular robots. To assess these properties, a set of morphological and behavioral descriptors was defined and the evolving population mapped in this multi-dimensional space. Surprisingly, the results show that seemingly distinct environments can lead to the same regions of this space, i.e., evolution can produce the same kind of morphologies/behaviors under conditions that humans perceive as quite different. These experiments indicate that demonstrating the 'ground truth' of evolution stating the firm impact of the environment on evolved morphologies is harder in evolutionary robotics than usually assumed.
Overcoming robotics challenges in the real world requires resilient control systems capable of handling a multitude of environments and unforeseen events. Evolutionary optimization using simulations is a promising way to automatically design such control systems, however, if the disparity between simulation and the real world becomes too large, the optimization process may result in dysfunctional real-world behaviors. In this paper, we address this challenge by considering embodied phase coordination in the evolutionary optimization of a quadruped robot controller based on central pattern generators. With this method, leg phases, and indirectly also inter-leg coordination, are influenced by sensor feedback. By comparing two very similar control systems we gain insight into how the sensory feedback approach affects the evolved parameters of the control system, and how the performances differ in simulation, in transferal to the real world, and to different real-world environments. We show that evolution enables the design of a control system with embodied phase coordination which is more complex than previously seen approaches, and that this system is capable of controlling a real-world multi-jointed quadruped robot. The approach reduces the performance discrepancy between simulation and the real world, and displays robustness towards new environments.
How can progress in machine learning and reinforcement learning be automated to generate its own never-ending curriculum of challenges without human intervention? The recent emergence of quality diversity (QD) algorithms offers a glimpse of the potential for such continual open-ended invention. For example, novelty search showcases the benefits of explicit novelty pressure, MAP-Elites and Innovation Engines highlight the advantage of explicit elitism within niches in an otherwise divergent process, and minimal criterion coevolution (MCC) reveals that problems and solutions can coevolve divergently. The Paired Open-Ended Trailblazer (POET) algorithm introduced in this paper combines these principles to produce a practical approach to generating an endless progression of diverse and increasingly challenging environments while at the same time explicitly optimizing their solutions. An intriguing implication is the opportunity to transfer solutions among environments, reflecting the view that innovation is a circuitous and unpredictable process. POET is tested in a 2-D obstacles course domain, where it generates diverse and sophisticated behaviors that create and solve a wide range of environmental challenges, many of which cannot be solved by direct optimization, or by a direct-path curriculum-building control algorithm. We hope that POET will inspire a new push towards open-ended discovery across many domains.
The plasticity property of biological neural networks allows them to perform learning and optimize their behavior by changing their configuration. Inspired by biology, plasticity can be modeled in artificial neural networks by using Hebbian learning rules, i.e. rules that update synapses based on the neuron activations and reinforcement signals. However, the distal reward problem arises when the reinforcement signals are not available immediately after each network output to associate the neuron activations that contributed to receiving the reinforcement signal. In this work, we extend Hebbian plasticity rules to allow learning in distal reward cases. We propose the use of neuron activation traces (NATs) to provide additional data storage in each synapse to keep track of the activation of the neurons. Delayed reinforcement signals are provided after each episode relative to the networks' performance during the previous episode. We employ genetic algorithms to evolve delayed synaptic plasticity (DSP) rules and perform synaptic updates based on NATs and delayed reinforcement signals. We compare DSP with an analogous hill climbing algorithm that does not incorporate domain knowledge introduced with the NATs, and show that the synaptic updates performed by the DSP rules demonstrate more effective training performance relative to the HC algorithm.