GECCO '19- Proceedings of the Genetic and Evolutionary Computation Conference
Full Citation in the ACM Digital LibrarySESSION: Evolutionary machine learning
Lexicase selection in learning classifier systems
The lexicase parent selection method selects parents by considering performance on individual data points in random order instead of using a fitness function based on an aggregated data accuracy. While the method has demonstrated promise in genetic programming and more recently in genetic algorithms, its applications in other forms of evolutionary machine learning have not been explored. In this paper, we investigate the use of lexicase parent selection in Learning Classifier Systems (LCS) and study its effect on classification problems in a supervised setting. We further introduce a new variant of lexicase selection, called batch-lexicase selection, which allows for the tuning of selection pressure. We compare the two lexicase selection methods with tournament and fitness proportionate selection methods on binary classification problems. We show that batch-lexicase selection results in the creation of more generic rules which is favorable for generalization on future data. We further show that batch-lexicase selection results in better generalization in situations of partial or missing data.
An automated ensemble learning framework using genetic programming for image classification
An ensemble consists of multiple learners and can achieve a better generalisation performance than a single learner. Genetic programming (GP) has been applied to construct ensembles using different strategies such as bagging and boosting. However, no GP-based ensemble methods focus on dealing with image classification, which is a challenging task in computer vision and machine learning. This paper proposes an automated ensemble learning framework using GP (EGP) for image classification. The new method integrates feature learning, classification function selection, classifier training, and combination into a single program tree. To achieve this, a novel program structure, a new function set and a new terminal set are developed in EGP. The performance of EGP is examined on nine different image classification data sets of varying difficulty and compared with a large number of commonly used methods including recently published methods. The results demonstrate that EGP achieves better performance than most competitive methods. Further analysis reveals that EGP evolves good ensembles simultaneously balancing diversity and accuracy. To the best of our knowledge, this study is the first work using GP to automatically generate ensembles for image classification.
COEGAN: evaluating the coevolution effect in generative adversarial networks
Generative adversarial networks (GAN) present state-of-the-art results in the generation of samples following the distribution of the input dataset. However, GANs are difficult to train, and several aspects of the model should be previously designed by hand. Neuroevolution is a well-known technique used to provide the automatic design of network architectures which was recently expanded to deep neural networks.
COEGAN is a model that uses neuroevolution and coevolution in the GAN training algorithm to provide a more stable training method and the automatic design of neural network architectures. COEGAN makes use of the adversarial aspect of the GAN components to implement coevolutionary strategies in the training algorithm. Our proposal was evaluated in the Fashion-MNIST and MNIST dataset. We compare our results with a baseline based on DCGAN and also with results from a random search algorithm. We show that our method is able to discover efficient architectures in the Fashion-MNIST and MNIST datasets. The results also suggest that COEGAN can be used as a training algorithm for GANs to avoid common issues, such as the mode collapse problem.
Efficient personalized community detection via genetic evolution
Personalized community detection aims to generate communities associated with user need on graphs, which benefits many downstream tasks such as node recommendation and link prediction for users, etc. It is of great importance but lack of enough attention in previous studies which are on topics of user-independent, semi-supervised, or top-K user-centric community detection. Meanwhile, most of their models are time consuming due to the complex graph structure. Different from these topics, personalized community detection requires to provide higher-resolution partition on nodes that are more relevant to user need while coarser manner partition on the remaining less relevant nodes. In this paper, to solve this task in an efficient way, we propose a genetic model including an offline and an online step. In the offline step, the user-independent community structure is encoded as a binary tree. And subsequently an online genetic pruning step is applied to partition the tree into communities. To accelerate the speed, we also deploy a distributed version of our model to run under parallel environment. Extensive experiments on multiple datasets show that our model outperforms the state-of-arts with significantly reduced running time.
Auto-CVE: a coevolutionary approach to evolve ensembles in automated machine learning
Automated Machine Learning (Auto-ML) is a growing field receiving a lot of attention. Several techniques are being developed to address the question of how to automate the process of defining machine learning pipelines, using diverse types of approaches and with relative success, but still this problem is far from being solved.
Ensembles are frequently employed in machine learning given their better performance, when compared to the use of a single model, and higher robustness. However, until now, not much attention has been given to them in the Auto-ML field.
In this sense, this work presents Auto-CVE (Automated Coevolutionary Voting Ensemble) a new approach to Auto-ML. Based on a coevolutionary model, it uses two populations (one of ensembles and another for components) to actively search for voting ensembles. When compared to the popular algorithm TPOT, Auto-CVE shows competitive results in both accuracy and computing time.
Evolutionary neural AutoML for deep learning
Deep neural networks (DNNs) have produced state-of-the-art results in many benchmarks and problem domains. However, the success of DNNs depends on the proper configuration of its architecture and hyperparameters. Such a configuration is difficult and as a result, DNNs are often not used to their full potential. In addition, DNNs in commercial applications often need to satisfy real-world design constraints such as size or number of parameters. To make configuration easier, automatic machine learning (AutoML) systems for deep learning have been developed, focusing mostly on optimization of hyperparameters.
This paper takes AutoML a step further. It introduces an evolutionary AutoML framework called LEAF that not only optimizes hyperparameters but also network architectures and the size of the network. LEAF makes use of both state-of-the-art evolutionary algorithms (EAs) and distributed computing frameworks. Experimental results on medical image classification and natural language analysis show that the framework can be used to achieve state-of-the-art performance. In particular, LEAF demonstrates that architecture optimization provides a significant boost over hyperparameter optimization, and that networks can be minimized at the same time with little drop in performance. LEAF therefore forms a foundation for democratizing and improving AI, as well as making AI practical in future applications.
Absumption to complement subsumption in learning classifier systems
Learning Classifier Systems (LCSs), a 40-year-old technique, evolve interrogatable production rules. XCSs are the most popular reinforcement learning based LCSs. It is well established that the subsumption method in XCSs removes overly detailed rules. However, the technique still suffers from overly general rules that reduce accuracy and clarity in the discovered patterns. This adverse impact is especially true for domains that are containing accurate solutions that overlap, i.e. one data instance is covered by two plausible, but competing rules. A novel method, termed absumption, is introduced to counter over-general rules. Complex Boolean problems that contain epistasis, heterogeneity and overlap are used to test the absumption method. Results show that absumption successfully improves the training performance of XCSs by counteracting over-general rules. Moreover, absumption enables the rule-set to be compacted, such that underlying patterns can be precisely visualized successfully. Additionally, the equations for the optimal size of solutions for a problem domain can now be determined.
NSGA-Net: neural architecture search using multi-objective genetic algorithm
This paper introduces NSGA-Net --- an evolutionary approach for neural architecture search (NAS). NSGA-Net is designed with three goals in mind: (1) a procedure considering multiple and conflicting objectives, (2) an efficient procedure balancing exploration and exploitation of the space of potential neural network architectures, and (3) a procedure finding a diverse set of trade-off network architectures achieved in a single run. NSGA-Net is a population-based search algorithm that explores a space of potential neural network architectures in three steps, namely, a population initialization step that is based on prior-knowledge from hand-crafted architectures, an exploration step comprising crossover and mutation of architectures, and finally an exploitation step that utilizes the hidden useful knowledge stored in the entire history of evaluated neural architectures in the form of a Bayesian Network. Experimental results suggest that combining the dual objectives of minimizing an error metric and computational complexity, as measured by FLOPs, allows NSGA-Net to find competitive neural architectures. Moreover, NSGA-Net achieves error rate on the CIFAR-10 dataset on par with other state-of-the-art NAS methods while using orders of magnitude less computational resources. These results are encouraging and shows the promise to further use of EC methods in various deep-learning paradigms.
Improvement of code fragment fitness to guide feature construction in XCS
In complex classification problems, constructed features with rich discriminative information can simplify decision boundaries. Code Fragments (CFs) produce GP-tree-like constructed features that can represent decision boundaries effectively in Learning Classifier Systems (LCSs). But the search space for useful CFs is vast due to this richness in boundary creation, which is impractical. Online Feature-generation (OF) improves the search of useful CFs by growing promising CFs from a dynamic list of preferable CFs based on the ability to produce accurate and generalised, i.e. high-fitness, classifiers. However, the previous preference for high-numerosity CFs did not encapsulate information about the applicability of CFs directly. Consequently, learning performances of OF with an accuracy-based LCS (termed XOF) struggled to progress in the final learning phase. The hypothesis is that estimating the CF-fitness of CFs based on classifier fitness will aid the search for useful constructed features. This is anticipated to drive the search of decision boundaries efficiently, and thereby improve learning performances. Experiments on large-scale and hierarchical Boolean problems show that the proposed systems learn faster than traditional LCSs regarding the number of generations and time consumption. Tests on real-world datasets demonstrate its capability to find readable and useful features to solve practical problems.
Population-based ensemble classifier induction for domain adaptation
In classification, the task of domain adaptation is to learn a classifier to classify target data using unlabeled data from the target domain and labeled data from a related, but not identical, source domain. Transfer classifier induction is a common domain adaptation approach that learns an adaptive classifier directly rather than first adapting the source data. However, most existing transfer classifier induction algorithms are gradient-based, so they can easily get stuck at local optima. Moreover, they usually generate only a single classifier which might fit the source data too well, which results in poor target accuracy. In this paper, we propose a population-based algorithm that can address the above two limitations. The proposed algorithm can re-initialize a population member to a promising region when the member is trapped at local optima. The population-based mechanism allows the proposed algorithm to output a set of classifiers which is more reliable than a single classifier. The experimental results show that the proposed algorithm achieves significantly better target accuracy than four state-of-the-art and well-known domain adaptation algorithms on three real-world domain adaptation problems.
Investigating recurrent neural network memory structures using neuro-evolution
This paper presents a new algorithm, Evolutionary eXploration of Augmenting Memory Models (EXAMM), which is capable of evolving recurrent neural networks (RNNs) using a wide variety of memory structures, such as Δ-RNN, GRU, LSTM, MGU and UGRNN cells. EXAMM evolved RNNs to perform prediction of large-scale, real world time series data from the aviation and power industries. These data sets consist of very long time series (thousands of readings), each with a large number of potentially correlated and dependent parameters. Four different parameters were selected for prediction and EXAMM runs were performed using each memory cell type alone, each cell type and simple neurons, and with all possible memory cell types and simple neurons. Evolved RNN performance was measured using repeated k-fold cross validation, resulting in 2420 EXAMM runs which evolved 4, 840, 000 RNNs in ~24,200 CPU hours on a high performance computing cluster. Generalization of the evolved RNNs was examined statistically, providing findings that can help refine the design of RNN memory cells as well as inform future neuro-evolution algorithms.
Deep neuroevolution of recurrent and discrete world models
Neural architectures inspired by our own human cognitive system, such as the recently introduced world models, have been shown to outperform traditional deep reinforcement learning (RL) methods in a variety of different domains. Instead of the relatively simple architectures employed in most RL experiments, world models rely on multiple different neural components that are responsible for visual information processing, memory, and decision-making. However, so far the components of these models have to be trained separately and through a variety of specialized training methods. This paper demonstrates the surprising finding that models with the same precise parts can be instead efficiently trained end-to-end through a genetic algorithm (GA), reaching a comparable performance to the original world model by solving a challenging car racing task. An analysis of the evolved visual and memory system indicates that they include a similar effective representation to the system trained through gradient descent. Additionally, in contrast to gradient descent methods that struggle with discrete variables, GAs also work directly with such representations, opening up opportunities for classical planning in latent space. This paper adds additional evidence on the effectiveness of deep neuroevolution for tasks that require the intricate orchestration of multiple components in complex heterogeneous architectures.
Evolving controllably difficult datasets for clustering
Synthetic datasets play an important role in evaluating clustering algorithms, as they can help shed light on consistent biases, strengths, and weaknesses of particular techniques, thereby supporting sound conclusions. Despite this, there is a surprisingly small set of established clustering benchmark data, and many of these are currently handcrafted. Even then, their difficulty is typically not quantified or considered, limiting the ability to interpret algorithmic performance on these datasets. Here, we introduce HAWKS, a new data generator that uses an evolutionary algorithm to evolve cluster structure of a synthetic data set. We demonstrate how such an approach can be used to produce datasets of a pre-specified difficulty, to trade off different aspects of problem difficulty, and how these interventions directly translate into changes in the clustering performance of established algorithms.
Spatial evolutionary generative adversarial networks
Generative adversary networks (GANs) suffer from training pathologies such as instability and mode collapse. These pathologies mainly arise from a lack of diversity in their adversarial interactions. Evolutionary generative adversarial networks apply the principles of evolutionary computation to mitigate these problems. We hybridize two of these approaches that promote training diversity. One, E-GAN, at each batch, injects mutation diversity by training the (replicated) generator with three independent objective functions then selecting the resulting best performing generator for the next batch. The other, Lipizzaner, injects population diversity by training a two-dimensional grid of GANs with a distributed evolutionary algorithm that includes neighbor exchanges of additional training adversaries, performance based selection and population-based hyper-parameter tuning. We propose to combine mutation and population approaches to diversity improvement. We contribute a superior evolutionary GANs training method, Mustangs, that eliminates the single loss function used across Lipizzaner's grid. Instead, each training round, a loss function is selected with equal probability, from among the three E-GAN uses. Experimental analyses on standard benchmarks, MNIST and CelebA, demonstrate that Mustangs provides a statistically faster training method resulting in more accurate networks.
Adaptive multi-subswarm optimisation for feature selection on high-dimensional classification
Feature space is an important factor influencing the performance of any machine learning algorithm including classification methods. Feature selection aims to remove irrelevant and redundant features that may negatively affect the learning process especially on high-dimensional data, which usually suffers from the curse of dimensionality. Feature ranking is one of the most scalable feature selection approaches to high-dimensional problems, but most of them fail to automatically determine the number of selected features as well as detect redundancy between features. Particle swarm optimisation (PSO) is a population-based algorithm which has shown to be effective in addressing these limitations. However, its performance on high-dimensional data is still limited due to the large search space and high computation cost. This study proposes the first adaptive multi-swarm optimisation (AMSO) method for feature selection that can automatically select a feature subset of high-dimensional data more effectively and efficiently than the compared methods. The subswarms are automatically and dynamically changed based on their performance during the evolutionary process. Experiments on ten high-dimensional datasets of varying difficulties have shown that AMSO is more effective and more efficient than the compared PSO-based and traditional feature selection methods in most cases.
Evolving deep neural networks by multi-objective particle swarm optimization for image classification
In recent years, convolutional neural networks (CNNs) have become deeper in order to achieve better classification accuracy in image classification. However, it is difficult to deploy the state-of-the-art deep CNNs for industrial use due to the difficulty of manually fine-tuning the hyperparameters and the trade-off between classification accuracy and computational cost. This paper proposes a novel multi-objective optimization method for evolving state-of-the-art deep CNNs in real-life applications, which automatically evolves the non-dominant solutions at the Pareto front. Three major contributions are made: Firstly, a new encoding strategy is designed to encode one of the best state-of-the-art CNNs; With the classification accuracy and the number of floating point operations as the two objectives, a multi-objective particle swarm optimization method is developed to evolve the non-dominant solutions; Last but not least, a new infrastructure is designed to boost the experiments by concurrently running the experiments on multiple GPUs across multiple machines, and a Python library is developed and released to manage the infrastructure. The experimental results demonstrate that the non-dominant solutions found by the proposed method form a clear Pareto front, and the proposed infrastructure is able to almost linearly reduce the running time.