GECCO '21: Proceedings of the Genetic and Evolutionary Computation ConferenceFull Citation in the ACM Digital Library
SESSION: Complex systems (artificial life, artificial immune systems, generative and developmental systems, evolutionary robotics, evolvable hardware)
Generative Adversarial Networks (GANs) can generate levels for a variety of games. This paper focuses on combining GAN-generated segments in a snaking pattern to create levels for Mega Man. Adjacent segments in such levels can be orthogonally adjacent in any direction, meaning that an otherwise fine segment might impose a barrier between its neighbor depending on what sorts of segments in the training set are being most closely emulated: horizontal, vertical, or corner segments. To pick appropriate segments, multiple GANs were trained on different types of segments to ensure better flow between segments. Flow was further improved by evolving the latent vectors for the segments being joined in the level to maximize the length of the level's solution path. Using multiple GANs to represent different types of segments results in significantly longer solution paths than using one GAN for all segment types, and a human subject study verifies that these levels are more fun and have more human-like design than levels produced by one GAN.
Quality-Diversity algorithms search for large collections of diverse and high-performing solutions, rather than just for a single solution like typical optimisation methods. They are specially adapted for multi-modal problems that can be solved in many different ways, such as complex reinforcement learning or robotics tasks. However, these approaches are highly dependent on the choice of feature descriptors (FDs) quantifying the similarity in behaviour of the solutions. While FDs usually needs to be hand-designed, recent studies have proposed ways to define them automatically by using feature extraction techniques, such as PCA or Auto-Encoders, to learn a representation of the problem from previously explored solutions. Here, we extend these approaches to more complex problems which cannot be efficiently explored by relying only on a single representation but require instead a set of diverse and complementary representations. We describe MC-AURORA, a Quality-Diversity approach that optimises simultaneously several collections of solutions, each with a different set of FDs, which are, in turn, defined automatically by an ensemble of modular auto-encoders. We show that this approach produces solutions that are more diverse than those produced by single-representation approaches.
Multi-emitter MAP-elites: improving quality, diversity and data efficiency with heterogeneous sets of emitters
Quality-Diversity (QD) optimisation is a new family of learning algorithms that aims at generating collections of diverse and high-performing solutions. Among those algorithms, the recently introduced Covariance Matrix Adaptation MAP-Elites (CMA-ME) algorithm proposes the concept of emitters, which uses a predefined heuristic to drive the algorithm's exploration. This algorithm was shown to outperform MAP-Elites, a popular QD algorithm that has demonstrated promising results in numerous applications. In this paper, we introduce Multi-Emitter MAP-Elites (ME-MAP-Elites), an algorithm that directly extends CMA-ME and improves its quality, diversity and data efficiency. It leverages the diversity of a heterogeneous set of emitters, in which each emitter type improves the optimisation process in different ways. A bandit algorithm dynamically finds the best selection of emitters depending on the current situation. We evaluate the performance of ME-MAP-Elites on six tasks, ranging from standard optimisation problems (in 100 dimensions) to complex locomotion tasks in robotics. Our comparisons against CMA-ME and MAP-Elites show that ME-MAP-Elites is faster at providing collections of solutions that are significantly more diverse and higher performing. Moreover, in cases where no fruitful synergy can be found between the different emitters, ME-MAP-Elites is equivalent to the best of the compared algorithms.
There is evidence of a relationship between the dynamics of resource availability and the evolution of cooperative behaviour in complex networks. Previous studies have used mathematical models, agent-based models, and studies of hunter-gatherer societies to investigate the causal mechanisms behind this relationship. Here, we present a novel, agent-based software system, built using Unity 3D, which we employ to investigate the adaptation of food sharing networks to fluctuating resource availability. As a benefit of using Unity, our system possesses an easily customisable, visually compelling interface where evolution can be observed in real-time. Across four types of populations, under three environmental conditions, we performed a quantitative analysis of the evolving structure of social interactions. A biologically-inspired gene-sequencing function translates an arbitrarily extendable genome into phenotypic behaviour. Our results contribute to the understanding of how resource availability affects the evolutionary path taken by artificial societies. It emerges that environmental conditions have a greater impact on social evolution compared to the initial genetic configurations of each society. In particular, we find that scenarios of periodically fluctuating resources lead to the evolution of stable, tightly organised societies, which form small, local, mutualistic food-sharing networks.
Jamming Grippers are a novel class of soft robotic actuators that can robustly grasp and manipulate objects of arbitrary shape. They are formed by placing a granular material within a flexible skin connected to a vacuum pump and function by pressing the un-jammed gripper against a target object and evacuating the air to transition the material to a jammed (solid) state, gripping the target object. However, due to the complex interactions between grain morphology and target object shape, much uncertainty still remains regarding optimal constituent grain shapes for specific gripping applications. We address this challenge by utilising a modern Evolutionary Algorithm, NSGA-III, combined with a Discrete Element Method soft robot model to perform a multi-objective optimisation of grain morphology for use in jamming grippers for a range of target object sizes. Our approach optimises the microscopic properties of the system to elicit bespoke functional granular material performance driven by the complex relationship between the individual particle morphologies and the related emergent behaviour of the bulk state. Results establish the important contribution of grain morphology to gripper performance and the critical role of local surface curvature and the length scale governed by the relative sizes of the grains and target object.
On the impact of tangled program graph marking schemes under the atari reinforcement learning benchmark
Tangled program graphs (TPG) support emergent modularity by first identifying subsets of programs that can usefully coexist (a team/ graph node) and then identifying the circumstance under which to reference other teams (arc adaptation). Variation operators manipulate the content of teams and arcs. This introduces cycles into the TPG structures. Previously, this effect was eradicated at run time by marking nodes while evaluating TPG individuals. In this work, a new marking heuristic is introduced, that of arc (learner) marking. This means that nodes can be revisited, but not the same arcs. We investigate the impact of this through 18 titles from the Arcade Learning Environment. The performance and complexity of the policies appear to be similar, but with specific tasks (game titles) resulting in preferences for one scheme over another.
The evolution of symbolic communication is a longstanding open research question in biology. While some theories suggest that it originated from sub-symbolic communication (i.e., iconic or indexical), little experimental evidence exists on how organisms can actually evolve to define a shared set of symbols with unique interpretable meaning, thus being capable of encoding and decoding discrete information. Here, we use a simple synthetic model composed of sender and receiver agents controlled by Continuous-Time Recurrent Neural Networks, which are optimized by means of neuro-evolution. We characterize signal decoding as either regression or classification, with limited and unlimited signal amplitude. First, we show how this choice affects the complexity of the evolutionary search, and leads to different levels of generalization. We then assess the effect of noise, and test the evolved signaling system in a referential game. In various settings, we observe agents evolving to share a dictionary of symbols, with each symbol spontaneously associated to a 1-D unique signal. Finally, we analyze the constellation of signals associated to the evolved signaling systems and note that in most cases these resemble a Pulse Amplitude Modulation system.
In many natural environments, there are different forms of living creatures that successfully accomplish the same task while being diverse in shape and behavior. This biodiversity is what made life capable of adapting to disrupting changes. Being able to reproduce biodiversity in non-biological agents, while still optimizing them for a particular task, might increase their applicability to scenarios where human response to unexpected changes is not possible.
In this work, we focus on Voxel-based Soft Robots (VSRs), a form of robots that grants great freedom in the design of both body and controller and is hence promising in terms of biodiversity. We use evolutionary computation for optimizing, at the same time, body and controller of VSRs for the task of locomotion. We investigate experimentally whether two key factors---evolutionary algorithm (EA) and representation---impact the emergence of biodiversity and if this occurs at the expense of effectiveness. We devise a way for measuring biodiversity, systematically characterizing the robots shape and behavior, and apply it to the VSRs evolved with three EAs and two representations.
The experimental results suggest that the representation matters more than the EA and that there is not a clear trade-off between diversity and effectiveness.
An open question for both natural and artificial evolutionary systems is how, and under what environmental and evolutionary conditions complexity evolves. This study investigates the impact of increasingly complex task environments on the evolution of robot complexity. Specifically, the impact of evolving body-brain couplings on locomotive task performance, where robot evolution was directed by either body-brain exploration (novelty search) or objective-based (fitness function) evolutionary search. Results indicated that novelty search enabled the evolution of increased robot body-brain complexity and efficacy given specific environment conditions. The key contribution is thus the demonstration that body-brain exploration is suitable for evolving robot complexity that enables high fitness robots in specific environments.
Autonomous robots are increasingly used in remote and hazardous environments, where damage to sensory-actuator systems cannot be easily repaired. Such robots must therefore have controllers that continue to function effectively given unexpected malfunctions and damage to robot morphology. This study applies the Intelligent Trial and Error (IT&E) algorithm to adapt hexapod robot control to various leg failures and demonstrates the IT&E map-size parameter as a critical parameter in influencing IT&E adaptive task performance. We evaluate robot adaptation for multiple leg failures on two different map-sizes in simulation and validate evolved controllers on a physical hexapod robot. Results demonstrate a trade-off between adapted gait speed and adaptation duration, dependent on adaptation task complexity (leg damage incurred), where map-size is crucial for generating behavioural diversity required for adaptation.
Reward-based optimization algorithms require both exploration, to find rewards, and exploitation, to maximize performance. The need for efficient exploration is even more significant in sparse reward settings, in which performance feedback is given sparingly, thus rendering it unsuitable for guiding the search process. In this work, we introduce the SparsE Reward Exploration via Novelty and Emitters (SERENE) algorithm, capable of efficiently exploring a search space, as well as optimizing rewards found in potentially disparate areas. Contrary to existing emitters-based approaches, SERENE separates the search space exploration and reward exploitation into two alternating processes. The first process performs exploration through Novelty Search, a divergent search algorithm. The second one exploits discovered reward areas through emitters, i.e. local instances of population-based optimization algorithms. A meta-scheduler allocates a global computational budget by alternating between the two processes, ensuring the discovery and efficient exploitation of disjoint reward areas. SERENE returns both a collection of diverse solutions covering the search space and a collection of high-performing solutions for each distinct reward area. We evaluate SERENE on various sparse reward environments and show it compares favorably to existing baselines.
Evolving effective coordination strategies in tightly coupled multi-agent settings with sparse team fitness evaluations is challenging. It relies on multiple agents simultaneously stumbling upon the goal state to generate a learnable feedback signal. In such settings, estimating an agent's contribution to the overall team performance is extremely difficult, leading to a well-known structural credit assignment problem. This problem is further exacerbated when agents must complete sub-tasks with added spatial and temporal constraints, and different sub-tasks may require different local skills. We introduce MAEDyS, Multiagent Evolution via Dynamic Skill Selection, a hybrid bi-level optimization framework that augments evolutionary methods with policy gradient methods to generate effective coordination policies. MAEDyS learns to dynamically switch between multiple local skills towards optimizing the team fitness. It adopts fast policy gradients to learn several local skills using dense local rewards. It utilizes an evolutionary process to optimize the delayed team fitness by recruiting the most optimal skill at any given time. The ability to switch between various local skills during an episode eliminates the need for designing heuristic mixing functions. We evaluate MAEDyS in complex multiagent coordination environments with spatial and temporal constraints and show that it outperforms prior methods.
As open-ended learning based on divergent search algorithms such as Novelty Search (NS) draws more and more attention from the research community, it is natural to expect that its application to increasingly complex real-world problems will require the exploration to operate in higher dimensional Behavior Spaces (BSs) which will not necessarily be Euclidean. Novelty Search traditionally relies on k-nearest neighbours search and an archive of previously visited behavior descriptors which are assumed to live in a Euclidean space. This is problematic because of a number of issues. On one hand, Euclidean distance and Nearest-neighbour search are known to behave differently and become less meaningful in high dimensional spaces. On the other hand, the archive has to be bounded since, memory considerations aside, the computational complexity of finding nearest neighbours in that archive grows linearithmically with its size. A sub-optimal bound can result in "cycling" in the behavior space, which inhibits the progress of the exploration. Furthermore, the performance of NS depends on a number of algorithmic choices and hyperparameters, such as the strategies to add or remove elements to the archive and the number of neighbours to use in k-nn search. In this paper, we discuss an alternative approach to novelty estimation, dubbed Behavior Recognition based Novelty Search (BR-NS), which does not require an archive, makes no assumption on the metrics that can be defined in the behavior space and does not rely on nearest neighbours search. We conduct experiments to gain insight into its feasibility and dynamics as well as potential advantages over archive-based NS in terms of time complexity.
A core challenge of evolutionary search is the need to balance between exploration of the search space and exploitation of highly fit regions. Quality-diversity search has explicitly walked this tightrope between a population's diversity and its quality. This paper extends a popular quality-diversity search algorithm, MAP-Elites, by treating the selection of parents as a multi-armed bandit problem. Using variations of the upper-confidence bound to select parents from under-explored but potentially rewarding areas of the search space can accelerate the discovery of new regions as well as improve its archive's total quality. The paper tests an indirect measure of quality for parent selection: the survival rate of a parent's offspring. Results show that maintaining a balance between exploration and exploitation leads to the most diverse and high-quality set of solutions in three different testbeds.
Seeking quality diversity in evolutionary co-design of morphology and control of soft tensegrity modular robots
Designing optimal soft modular robots is difficult, due to non-trivial interactions between morphology and controller. Evolutionary algorithms (EAs), combined with physical simulators, represent a valid tool to overcome this issue. In this work, we investigate algorithmic solutions to improve the Quality Diversity of co-evolved designs of Tensegrity Soft Modular Robots (TSMRs) for two robotic tasks, namely goal reaching and squeezing trough a narrow passage. To this aim, we use three different EAs, i.e., MAP-Elites and two custom algorithms: one based on Viability Evolution (ViE) and NEAT (ViE-NEAT), the other named Double Map MAP-Elites (DM-ME) and devised to seek diversity while co-evolving robot morphologies and neural network (NN)-based controllers. In detail, DM-ME extends MAP-Elites in that it uses two distinct feature maps, referring to morphologies and controllers respectively, and integrates a mechanism to automatically define the NN-related feature descriptor. Considering the fitness, in the goal-reaching task ViE-NEAT outperforms MAP-Elites and results equivalent to DM-ME. Instead, when considering diversity in terms of "illumination" of the feature space, DM-ME outperforms the other two algorithms on both tasks, providing a richer pool of possible robotic designs, whereas ViE-NEAT shows comparable performance to MAP-Elites on goal reaching, although it does not exploit any map.