GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation ConferenceFull Citation in the ACM Digital Library
SESSION: Evolutionary machine learning
Rapid online adaptation to changing tasks is an important problem in machine learning and, recently, a focus of meta-reinforcement learning. However, reinforcement learning (RL) algorithms struggle in POMDP environments because the state of the system, essential in a RL framework, is not always visible. Additionally, hand-designed meta-RL architectures may not include suitable computational structures for specific learning problems. The evolution of online learning mechanisms, on the contrary, has the ability to incorporate learning strategies into an agent that can (i) evolve memory when required and (ii) optimize adaptation speed to specific online learning problems. In this paper, we exploit the highly adaptive nature of neuromodulated neural networks to evolve a controller that uses the latent space of an autoencoder in a POMDP. The analysis of the evolved networks reveals the ability of the proposed algorithm to acquire inborn knowledge in a variety of aspects such as the detection of cues that reveal implicit rewards, and the ability to evolve location neurons that help with navigation. The integration of inborn knowledge and online plasticity enabled fast adaptation and better performance in comparison to some non-evolutionary meta-reinforcement learning algorithms. The algorithm proved also to succeed in the 3D gaming environment Malmo Minecraft.
The choice of activation function can have a large effect on the performance of a neural network. While there have been some attempts to hand-engineer novel activation functions, the Rectified Linear Unit (ReLU) remains the most commonly-used in practice. This paper shows that evolutionary algorithms can discover novel activation functions that outperform ReLU. A tree-based search space of candidate activation functions is defined and explored with mutation, crossover, and exhaustive search. Experiments on training wide residual networks on the CIFAR-10 and CIFAR-100 image datasets show that this approach is effective. Replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy. Optimal performance is achieved when evolution is allowed to customize activation functions to a particular task; however, these novel activation functions are shown to generalize, achieving high performance across tasks. Evolutionary optimization of activation functions is therefore a promising new dimension of metalearning in neural networks.
Generative adversarial networks (GANs) achieved relevant advances in the field of generative algorithms, presenting high-quality results mainly in the context of images. However, GANs are hard to train, and several aspects of the model should be previously designed by hand to ensure training success. In this context, evolutionary algorithms such as COEGAN were proposed to solve the challenges in GAN training. Nevertheless, the lack of diversity and premature optimization can be found in some of these solutions. We propose in this paper the application of a quality-diversity algorithm in the evolution of GANs. The solution is based on the Novelty Search with Local Competition (NSLC) algorithm, adapting the concepts used in COEGAN to this new proposal. We compare our proposal with the original COEGAN model and with an alternative version using a global competition approach. The experimental results evidenced that our proposal increases the diversity of the discovered solutions and leverage the performance of the models found by the algorithm. Furthermore, the global competition approach was able to consistently find better models for GANs.
Symbolic regression is a common application of genetic programming where model structure and corresponding parameters are evolved in unison. In the majority of work exploring symbolic regression, features are used directly without acknowledgement of their relative scale or unit. This paper extends recent work on the importance of standardisation of features when conducting symbolic regression. Specifically, z-score standardisation of input features is applied to both inputs and response to ensure that evolution explores a model space with zero mean and unit variance. This paper demonstrates that standardisation allows a simpler function set to be used without increasing bias. Additionally, it is demonstrated that standardisation can significantly improve the performance of coefficient optimisation through gradient descent to produce accurate models. Through analysis of several benchmark data sets, we demonstrate that feature standardisation enables simple but effective approaches that are comparable in performance to the state-of-the-art in symbolic regression.
Improving neuroevolutionary transfer learning of deep recurrent neural networks through network-aware adaptation
Transfer learning entails taking an artificial neural network (ANN) that is trained on a source dataset and adapting it to a new target dataset. While this has been shown to be quite powerful, its use has generally been restricted by architectural constraints. Previously, in order to reuse and adapt an ANN's internal weights and structure, the underlying topology of the ANN being transferred across tasks must remain mostly the same while a new output layer is attached, discarding the old output layer's weights. This work introduces network-aware adaptive structure transfer learning (N-ASTL), an advancement over prior efforts to remove this restriction. N-ASTL utilizes statistical information related to the source network's topology and weight distribution in order to inform how new input and output neurons are to be integrated into the existing structure. Results show improvements over prior state-of-the-art, including the ability to transfer in challenging real-world datasets not previously possible and improved generalization over RNNs without transfer.
A common problem machine learning developers are faced with is overfitting, that is, fitting a pipeline too closely to the training data that the performance degrades for unseen data. Automated machine learning aims to free (or at least ease) the developer from the burden of pipeline creation, but this overfitting problem can persist. In fact, this can become more of a problem as we look to iteratively optimise the performance of an internal cross-validation (most often k-fold). While this internal cross-validation hopes to reduce this overfitting, we show we can still risk overfitting to the particular folds used. In this work, we aim to remedy this problem by introducing dynamic fitness evaluations which approximate repeated k-fold cross-validation, at little extra cost over single k-fold, and far lower cost than typical repeated k-fold. The results show that when time equated, the proposed fitness function results in significant improvement over the current state-of-the-art baseline method which uses an internal single k-fold. Furthermore, the proposed extension is very simple to implement on top of existing evolutionary computation methods, and can provide essentially a free boost in generalisation/testing performance.
Improving constrained clustering via decomposition-based multiobjective optimization with memetic elitism
Clustering has always been a topic of interest in knowledge discovery, it is able to provide us with valuable information within the unsupervised machine learning framework. It received renewed attention when it was shown to produce better results in environments where partial information about how to solve the problem is available, thus leading to a new machine learning paradigm: semi-supervised machine learning. This new type of information can be given in the form of constraints, which guide the clustering process towards quality solutions. In particular, this study considers the pairwise instance-level must-link and cannot-link constraints. Given the ill-posed nature of the constrained clustering problem, we approach it from the multiobjective optimization point of view. Our proposal consists in a memetic elitist evolutionary strategy that favors exploitation by applying a local search procedure to the elite of the population and transferring its results only to the external population, which will also be used to generate new individuals. We show the capability of this method to produce quality results for the constrained clustering problem when considering incremental levels of constraint-based information. For the comparison with state-of-the-art methods, we include previous multiobjective approaches, single-objective genetic algorithms and classic constrained clustering methods.
This paper proposes a self-adaptation technique of parameter settings used in the XCS learning scheme. Since we adaptively set those settings to their optimum values derived by the recent XCS learning theory, our proposal does not require any trial and error process to find their proper values. Thus, our proposal can always satisfy the optimality of XCS learning scheme, i.e. to distinguish accurate rules from inaccurate rules with the minimum update number of rules. Experimental results on artificial classification problems including overlapping problems show that XCS with our self-adaptation technique significantly outperforms the standard XCS.
In the contemporary big data realm, Deep Neural Networks (DNNs) are evolving towards more complex architectures to achieve higher inference accuracy. Model compression techniques can be leveraged to efficiently deploy these compute-intensive architectures on resource-limited mobile devices. Such methods comprise various hyperparameters that require per-layer customization to ensure high accuracy. Choosing the hyperparameters is cumbersome as the pertinent search space grows exponentially with model layers. This paper introduces GeneCAI, a novel optimization method that automatically learns how to tune per-layer compression hyperparameters. We devise a bijective translation scheme that encodes compressed DNNs to the genotype space. Each genotype's optimality is measured using a multi-objective score based on the accuracy and number of floating-point operations. We develop customized genetic operations to iteratively evolve the non-dominated solutions towards the optimal Pareto front, thus, capturing the optimal trade-off between model accuracy and complexity. GeneCAI optimization method is highly scalable and can achieve a near-linear performance boost on distributed multi-GPU platforms. Our extensive evaluations demonstrate that GeneCAI outperforms existing rule-based and reinforcement learning methods in DNN compression by finding models that lie on a better accuracy/complexity Pareto curve.
In optimization and machine learning, the divide between discrete and continuous problems and methods is deep and persistent. We attempt to remove this distinction by training neural network autoencoders that embed discrete candidate solutions in continuous latent spaces. This allows us to take advantage of state-of-the-art continuous optimization methods for solving discrete optimization problems, and mitigates certain challenges in discrete optimization, such as design of bias-free search operators. In the experimental part, we consider program synthesis as the special case of combinatorial optimization. We train an autoencoder network on a large sample of programs in a problem-agnostic, unsupervised manner, and then use it with an evolutionary continuous optimization algorithm (CMA-ES) to map the points from the latent space to programs. We propose also a variant in which semantically similar programs are more likely to have similar embeddings. Assessment on a range of benchmarks in two domains indicates the viability of this approach and the usefulness of involving program semantics.
Learning Classifier Systems (LCSs) are a group of rule-based evolutionary computation techniques, which have been frequently applied to data-mining tasks. Evidence shows that LCSs can produce models containing human-discernible patterns. But, traditional LCSs cannot efficiently discover consistent, general rules - especially in domains that have unbalanced class distribution. The reason is that traditional search methods, e.g. crossover, mutation, and roulette wheel deletion, rely on stochasticity to find and keep optimum rules. Recently, absumption has been introduced to deterministically remove over-general rules, which is complementary to subsumption that deterministically removes over-specific rules. It is hypothesized that utilizing just assumption & subsumption transforms the search process from stochastic to deterministic, which benefits LCSs in evolving interpretable models and removing the need to tune search parameters to the problem. Interrogatable artificial Boolean domains with varying numbers of attributes are considered as benchmarks. The new LCS, termed Absumption Subsumption Classifier System (ASCS), successfully produces interpretable models for all the complex domains tested, whereas the non-optimal rules in existing techniques obscure the patterns. ACSC's ability to handle complex search spaces is observed, e.g. for the 14-bits Majority-On problem the required 6435 different cooperating rules were discovered enabling correct pattern visualization.
Multitask Learning is a learning paradigm that deals with multiple different tasks in parallel and transfers knowledge among them. XOF, a Learning Classifier System using tree-based programs to encode building blocks (meta-features), constructs and collects features with rich discriminative information for classification tasks in an Observed List. This paper seeks to facilitate the automation of feature transferring in between tasks by utilising the Observed List. We hypothesise that the best discriminative features of a classification task carry its characteristics. Therefore, the relatedness between any two tasks can be estimated by comparing their most appropriate patterns. We propose a multiple-XOF system, called mXOF, that can dynamically adapt feature transfer among XOFs. This system utilises the Observed List to estimate the task relatedness. This method enables the automation of transferring features. In terms of knowledge discovery, the resemblance estimation provides insightful relations among multiple data. We experimented mXOF on various scenarios, e.g. representative Hierarchical Boolean problems, classification of distinct classes in the UCI Zoo dataset, and unrelated tasks, to validate its abilities of automatic knowledgetransfer and estimating task relatedness. Results show that mXOF can estimate the relatedness reasonably between multiple tasks to aid the learning performance with the dynamic feature transferring.
Neural Architecture Search (NAS) algorithms have discovered highly novel state-of-the-art Convolutional Neural Networks (CNNs) for image classification, and are beginning to improve our understanding of CNN architectures. However, within NAS research, there are limited studies focussing on the role of skip-connections, and how the configurations of connections between layers can be optimised to improve CNN performance. Our work focusses on developing a new evolutionary NAS algorithm based on adjacency matrices to optimise skip-connection structures, creating more specialised and powerful skip-connection structures within a DenseNet-BC network than previously seen in the literature. Our work further demonstrates how simple adjacency matrices can be interpreted in a way which allows for a more dynamic variant of DenseNet-BC. The final algorithm, using this novel interpretation of adjacency matrices for architecture design and evolved on the CIFAR100 dataset, finds networks with improved performance relative to a baseline DenseNet-BC network on both the CIFAR10 and CIFAR100 datasets, being the first, to our knowledge, NAS algorithm for skip-connection optimisation to do so. Finally, skip-connection structures discovered by our algorithm are analysed, and some important skip-connection patterns are highlighted.
Deep learning is an important field of machine learning. It is playing a critical role in a variety of applications ranging from self-driving cars to security and surveillance. However, deep networks have deep flaws. For example, they are highly vulnerable to adversarial attacks. One reason may be the homogeneous nature of their knowledge representation, which allows a single disruptive pattern to cause miss-classification. Biological intelligence has lateral asymmetry, which allows heterogeneous, modular learning at different levels of abstraction, enabling different representations of the same object.
This work aims to incorporate lateralization and modular learning at different levels of abstraction in an evolutionary machine learning system. The results of image classification tasks show that the lateralized system efficiently learns hierarchical distributions of knowledge, demonstrating performance that is similar to (or better than) other state-of-the-art deep systems as it reasons using multiple representations. Crucially, the novel system outperformed all the state-of-the-art deep models for the classification of normal and adversarial images by 0.43% -- 2.56% and 2.15% -- 25.84%, respectively. Lateralisation enabled the system to exhibit robustness beyond previous work, which advocates for the creation of data sets that enable components of objects and the objects themselves to be learned specifically or in an end-to-end manner.
XCS constitutes the most deeply investigated classifier system today. It offers strong potentials and comes with inherent capabilities for mastering a variety of different learning tasks. Besides outstanding successes in various classification and regression tasks, XCS also proved very effective in certain multi-step environments from the domain of reinforcement learning. Especially in the latter domain, recent advances have been mainly driven by algorithms which model their policies based on deep neural networks, among which the Deep-Q-Network (DQN) being a prominent representative. Experience Replay (ER) constitutes one of the crucial factors for the DQN's successes, since it facilitates stabilized training of the neural network-based Q-function approximators. Surprisingly, XCS barely takes advantage of similar mechanisms that leverage remembered raw experiences. To bridge this gap, this paper investigates the benefits of extending XCS with ER. We demonstrate that for single-step tasks ER yields strong improvements in terms of sample efficiency. On the downside, however, we reveal that ER might further aggravate well-studied issues not yet solved for XCS when applied to sequential decision problems demanding for long-action-chains.
Inattentional blindness is the psychological phenomenon that causes one to miss things in plain sight. It is a consequence of the selective attention in perception that lets us remain focused on important parts of our world without distraction from irrelevant details. Motivated by selective attention, we study the properties of artificial agents that perceive the world through the lens of a self-attention bottleneck. By constraining access to only a small fraction of the visual input, we show that their policies are directly interpretable in pixel space. We find neuroevolution ideal for training self-attention architectures for vision-based reinforcement learning (RL) tasks, allowing us to incorporate modules that can include discrete, non-differentiable operations which are useful for our agent. We argue that self-attention has similar properties as indirect encoding, in the sense that large implicit weight matrices are generated from a small number of key-query parameters, thus enabling our agent to solve challenging vision based tasks with at least 1000x fewer parameters than existing methods. Since our agent attends to only task critical visual hints, they are able to generalize to environments where task irrelevant elements are modified while conventional methods fail.1
Generative Adversarial Networks (GANs) are popular tools for generative modeling. The dynamics of their adversarial learning give rise to convergence pathologies during training such as mode and discriminator collapse. In machine learning, ensembles of predictors demonstrate better results than a single predictor for many tasks. In this study, we apply two evolutionary algorithms (EAs) to create ensembles to re-purpose generative models, i.e., given a set of heterogeneous generators that were optimized for one objective (e.g., minimize Fréchet Inception Distance), create ensembles of them for optimizing a different objective (e.g., maximize the diversity of the generated samples). The first method is restricted by the exact size of the ensemble and the second method only restricts the upper bound of the ensemble size. Experimental analysis on the MNIST image benchmark demonstrates that both EA ensembles creation methods can re-purpose the models, without reducing their original functionality. The EA-based demonstrate significantly better performance compared to other heuristic-based methods. When comparing both evolutionary, the one with only an upper size bound on the ensemble size is the best.
One of the main and largely unexplored challenges in evolving the weights of neural networks using genetic algorithms is to find a sensible crossover operation between parent networks. Indeed, naive crossover leads to functionally damaged offspring that do not retain information from the parents. This is because neural networks are invariant to permutations of neurons, giving rise to multiple ways of representing the same solution. This is often referred to as the competing conventions problem. In this paper, we propose a two-step safe crossover (SC) operator. First, the neurons of the parents are functionally aligned by computing how well they correlate, and only then are the parents recombined. We compare two ways of measuring relationships between neurons: Pairwise Correlation (PwC) and Canonical Correlation Analysis (CCA). We test our safe crossover operators (SC-PwC and SC-CCA) on MNIST and CIFAR-10 by performing arithmetic crossover on the weights of feed-forward neural network pairs. We show that it effectively transmits information from parents to offspring and significantly improves upon naive crossover. Our method is computationally fast, can serve as a way to explore the fitness landscape more efficiently and makes safe crossover a potentially promising operator in future neuroevolution research and applications.
Segmented initialization and offspring modification in evolutionary algorithms for bi-objective feature selection
In classification, feature selection mainly aims at reducing the dataset dimensionality and increasing the classification accuracy, which also results in higher computational efficiency than using the original full set of features. Population-based meta-heuristic, evolutionary algorithms have been widely used to solve the bi-objective feature selection problem, which minimizes the number of selected features and the error of classification model. However, most of them are not specifically designed for feature selection, and disregard many of its complex characteristics. In this paper, we propose a generic approach that focuses on improving the initialization effectiveness and offspring quality, in order to boost the performance of existing evolutionary algorithms for bi-objective feature selection. To be more specific, a segmented initialization mechanism is used to enhance the exploration width, while an offspring modification mechanism is proposed to ensure the exploitation depth. Combining them together will make a good trade-off between the diversity and convergence. In the experiments, we plug the proposed approach into three different types of multi-objective evolutionary algorithms, and test them on 18 classification datasets with two widely-used performance metrics. The empirical results prove the significant contribution of the proposed approach on the optimization and classification performance.
Evolutionary learning algorithms have been successfully applied to multiagent problems where the desired system behavior can be captured by a single fitness signal. However, the complexity of many real world applications cannot be reduced to a single number, particularly when the fitness (i) arrives after a lengthy sequence of actions, and (ii) depends on the joint-action of multiple teammates. In this paper, we introduce the multi-fitness learning paradigm to enable multiagent teams to identify which fitness matters when in domains that require long-term, complex coordination. We demonstrate that multi-fitness learning efficiently solves a cooperative exploration task where teams of rovers must coordinate to observe various points of interest in a specific but unknown order.