GECCO '19- Proceedings of the Genetic and Evolutionary Computation ConferenceFull Citation in the ACM Digital Library
SESSION: Keynote talk
Biological evolution can be really fast. 5000 years ago the Sahara was still green. Now, after the formation of the desert, an ingenious locomotion technique has been invented in an exclave of the Sahara. Like a cyclist, the spider Cebrennus rechenbergi moves over the obstacle-free surface of the isolated Moroccan desert Erg Chebbi. Turboevolotion, as known from the Darwin Finches on the Galapagos Islands, implies the question: How fast can evolution be? A short introduction to the theory of the evolution-strategy gives the answer. Comparing with a simple random search the speed of progress increases enormously with the accuracy of the imitation of the rules of biological evolution. We are bionicists: We have transferred the ingenious leg movements of the cyclist spider to a robot. The result is a machine, perhaps a future Mars rover, that can run and roll in many fashions. Videos from the Moroccan Erg Chebbi desert demonstrate the extraordinary performance of the bionics rover.
Reinforcement Learning (RL) algorithms can be used to optimally solve dynamic decision-making and control problems. With continuous-valued state and input variables, RL algorithms must rely on function approximators to represent the value function and policy mappings. Commonly used numerical approximators, such as neural networks or basis function expansions, have two main drawbacks: they are black-box models offering no insight in the mappings learnt, and they require significant trial and error tuning of their meta-parameters. In addition, results obtained with deep neural networks suffer from the lack of reproducibility. In this talk, we discuss a family of new approaches to constructing smooth approximators for RL by means of genetic programming and more specifically by symbolic regression. We show how to construct process models and value functions represented by parsimonious analytic expressions using state-of-the-art algorithms, such as Single Node Genetic Programming and Multi-Gene Genetic Programming. We will include examples of nonlinear control problems that can be successfully solved by reinforcement learning with symbolic regression and illustrate some of the challenges this exciting field of research is currently facing.
Deep reinforcement learning has rapidly grown as a research field with far-reaching potential for artificial intelligence. Games and simple physical simulations have been used as the main benchmark domains for many fundamental developments. As the field matures, it is important to develop more sophisticated learning systems with the aim of solving more complex real-world tasks, but problems like catastrophic forgetting remain critical, and important capabilities such as skill composition through curriculum learning remain unsolved. Continual learning is an important challenge for reinforcement learning, because RL agents are trained sequentially, in interactive environments, and are especially vulnerable to the phenomena of catastrophic forgetting and catastrophic interference. Successful methods for continual learning have broad potential, because they could enable agents to learn multiple skills, potentially enabling complex behaviors.
In particular, while deep learning has shown excellent progress towards training systems to perform with human or superhuman ability on various tasks (domains like vision, speech, and language as well as games such as Starcraft and Go), the resulting systems are still sluggish to respond to new information, or non-stationarities in the environment, compared to humans. Learning algorithms do exist that can quickly adapt to new data, but these are often at odds with large-scale deep learning systems. Meta-learning is one example of a learning paradigm that may not have this dilemma and thus holds promise as a framework for supporting fast and slow learning in a single learner. In this framework, one could view the learning process as having two levels of optimisation: an outer loop, which might adapt slowly towards a "species" level of optimisation, tailored for an environment, a morphology, and a family of skills or tasks; and an inner loop which allows an individual agent to more quickly adapt and diversify in response to a lifetime of experiences. I would argue that model-free deep reinforcement learning is an effective algorithm for optimising the outer loop of this process, but it may not be as successful as an algorithm for effective lifelong learning - the inner loop of the process.