GECCO '19- Proceedings of the Genetic and Evolutionary Computation ConferenceFull Citation in the ACM Digital Library
SESSION: Genetic algorithms
Big Data promises new scientific discovery and economic value. Genetic algorithms (GAs) have proven their flexibility in many application areas and substantial research effort has been dedicated to improving their performance through parallelisation. In contrast with most previous efforts we reject approaches that are based on the centralisation of data in the main memory of a single node or that require remote access to shared/distributed memory. We focus instead on scenarios where data is partitioned across machines.
In this partitioned scenario, we explore two parallelisation models: PDMS, inspired by the traditional master-slave model, and PDMD, based on island models; we compare their performance in large-scale classification problems. We implement two distributed versions of Bio-HEL, a popular large-scale single-node GA classifier, using the Spark distributed data processing platform. In contrast to existing GA based on MapReduce, Spark allows a more efficient implementation of parallel GAs thanks to its simple, efficient iterative processing of partitioned datasets.
We study the accuracy, efficiency and scalability of the proposed models. Our results show that PDMS provides the same accuracy of traditional BioHEL and exhibit good scalability up to 64 cores, while PDMD provides substantial reduction of execution time at a minor loss of accuracy.
Genetic algorithms are simple to accelerate by evaluating the fitness of individuals in parallel. However, when fitness evaluation is expensive and time to evaluate each individual is highly variable, workers can be left idle while waiting for long-running tasks to complete. This paper proposes a local search hybridisation scheme which guarantees 100% utilisation of parallel workers. Separate work queues are maintained for individuals produced by genetic crossover and local search neighbourhood operators, and a priority rule determines which queue should be used to distribute work at each step. Hill-climbing local search and a conventional genetic algorithm can both be derived as special cases of the algorithm.
The general case balances allocation of computing time between progressing the genetic algorithm and improving individuals through local search. Through evaluation on a simulated expensive optimisation problem we show that the search performance of the algorithm in terms of number of function evaluations does not vary significantly with the number of workers used, while still achieving linear speedup through parallelisation. Comparisons with an asynchronous genetic algorithm show that this method of using idle worker capacity for local search can improve performance when the computation budget is limited.
We introduce a novel surrogate-assisted Genetic Algorithm (GA) for expensive optimization of problems with discrete categorical variables. Specifically, we leverage the strengths of the Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA), a state-of-the-art GA, and, for the first time, propose to use a convolutional neural network (CNN) as a surrogate model. We propose to train the model on pairwise fitness differences to decrease the number of evaluated solutions that is required to achieve adequate surrogate model training. In providing a proof of principle, we consider relatively standard CNNs, and demonstrate that their capacity is already sufficient to accurately learn fitness landscapes of various well-known benchmark functions. The proposed CS-GOMEA is compared with GOMEA and the widely-used Bayesian-optimization-based expensive optimization frameworks SMAC and Hyperopt, in terms of the number of evaluations that is required to achieve the optimum. In our experiments on binary problems with dimensionalities up to400 variables, CS-GOMEA always found the optimum, whereas SMAC and Hyperopt failed for problem sizes over 16 variables. Moreover, the number of evaluated solutions required by CS-GOMEA to find the optimum was found to scale much better than GOMEA.
On improving the constraint-handling performance with modified multiple constraint ranking (MCR-mod) for engineering design optimization problems solved by evolutionary algorithms
This work presents a new rank-based constraint handling technique (CHT) by implementing a modification on Multiple Constraint Ranking (MCR), a recently proposed rank-based constraint handling technique (CHT) for real-world engineering design optimization problems solved by evolutionary algorithms. The new technique, namely MCR-mod, not only maintains MCR's superior feature, i.e. balanced assessment of constraints with different orders of magnitude and/or different units, but also adds some more good features, such as more proper rank definition that the best feasible solution in the population always has better rank than the best infeasible solution, involvement of good infeasible solution, and easier way of implementation. Numerical experiments on benchmark problems from IEEE-CEC 2006 competition and engineering design are conducted to assess the accuracy and robustness of MCR-mod. From 25 independent runs on each problem, MCR-mod has proven its robustness compared to MCR, by its ability to produce better feasible optimal solution in most problems. Based on nonparametric statistical tests, there are indications that MCR-mod yields significant superiority in terms of accuracy compared with MCR on problems whose most constraints are inequality and active constraints, indicating that all added features of MCR-mod produce some improvements on the constraint-handling performance.
In outsourcing data storage, privacy and utility are significant concerns. Techniques such as data encryption can protect the privacy of sensitive information but affect the efficiency of data usage accordingly. By splitting attributes of sensitive associations, database fragmentation can protect data privacy. In the meantime, data utility can be improved through grouping data of high affinity. In this paper, a benefit-driven genetic algorithm is proposed to achieve a better balance between privacy and utility for database fragmentation. To integrate useful fragmentation information in different solutions, a matching strategy is designed. Two benefit-driven operators for mutation and improvement are proposed to construct valuable fragments and rearrange elements. The experimental results show that the proposed benefit-driven genetic algorithm is competitive when compared with existing approaches in database fragmentation.
Currently, Health Units in a large number of countries in the world present service demand that exceeds their real capacities. This problem causes the inevitable emergence of long waiting lists. The optimization of such waiting list is very challenging, due to the large number of resources that must be considered. This paper proposes a new model, based on a quantum-inspired evolutionary algorithm, to optimize the scheduling of for elective surgical procedures. The Quantum-Inspired Evolutionary Algorithm for Healthcare (QIEA-H) model, aims not only to designate the necessary resources to the patients in order to achieve the successful completion of the chirurgical procedure, but also to reduce the total time used to perform all surgeries and the number of surgeries out of term. For the validation of the proposed model, a waiting list of 2000 surgeries was created artificially and using a simulation tool also developed in this work. The model achieved a reduction in the time of all surgeries of up to 16.25% and the number of surgeries out of date of up to 13.04%.
Success-based parameter control mechanisms for Evolutionary Algorithms (EA) change the parameters every generation based on the success of the previous generation and the current parameter value. In the last years there have been proposed several mechanisms of success-based parameter control in the literature. The purpose of this paper is to evaluate and compare their sequential optimisation time and parallelisation on different types of problems. The geometric mean of the sequential and parallel optimisation times is used as a new metric to evaluate the parallelisation of the EAs capturing the trade off between both optimisation times. We perform an empirical study comprising of 9 different algorithms on four benchmark functions. From the 9 algorithms eight algorithms were taken from the literature and one is a modification proposed here.
We show that the modified algorithms has a 20% faster sequential optimisation time than the fastest known GA on OneMax. Additionally we show the benefits of success-based parameter control mechanisms for NP-hard problems and using the proposed metric we also show that success-based offspring population size mechanisms are outperformed by static choices in parallel EAs.
The procedural generation of levels and content in video games is a challenging AI problem. Often such generation relies on an intelligent way of evaluating the content being generated so that constraints are satisfied and/or objectives maximized. In this work, we address the problem of creating levels that are not only playable but also revolve around specific mechanics in the game. We use constrained evolutionary algorithms and quality-diversity algorithms to generate small sections of Super Mario Bros levels called scenes, using three different simulation approaches: Limited Agents, Punishing Model, and Mechanics Dimensions. All three approaches are able to create scenes that give opportunity for a player to encounter or use targeted mechanics with different properties. We conclude by discussing the advantages and disadvantages of each approach and compare them to each other.
Convection selection in evolutionary algorithms is a method of splitting the population into subpopulations based on the fitness values of solutions. Convection selection was previously found to be superior to standard selection techniques in difficult tasks of evolutionary design. However, reaching its full potential requires tuning of parameters that affect the performance of the evolutionary search process. Performing experiments on benchmark fitness functions does not provide general knowledge required for such tuning. Therefore, in order to gain an insight into the link between the characteristics of the fitness landscape, the parameters of the selection technique, and the quality of the best found solutions, we perform an analysis based on the NKq model of rugged fitness landscapes with neutrality. As a result, we identify several rules that will help researchers and practitioners of evolutionary algorithms adjust the values of convection selection parameters based on the knowledge of the properties of a given optimization problem.
Learning has been shown to be beneficial to an evolutionary process through the Baldwin Effect. Moreover, learning can be classified into two categories: asocial learning, e.g. trial-and-error; and social learning, e.g. imitation learning. A learning strategy, or learning rule - a combination of individual and social learning --- has been suggested by recent research can be more adaptive than both social and individual learning alone. However, this also leaves open an important question as to how best to combine these forms of learning in different environments. This paper investigates this question under a dynamic rugged landscape (i.e. dynamic NK-landscape). Experimental results show that a learning strategy is able to promote an evolving population better, resulting in higher average fitness, over a series of changing environments than asocial learning alone. The results also show that the population of strategic learners maintains a higher proportion of plasticity --- the ability to change the phenotype in response to environmental challenges --- than the population of individual learners alone.
Genetic algorithms using optimal mixing have shown promising results, while lack of theoretical supports. This paper investigates population sizing from the supply aspect under the optimal mixing scenario. Specifically, more precise analyses on supply, including the expectation and the lower bound, are made. In addition, considering recombining one randomly generated chromosome with the rest of the population to achieve the global optimum, the tight bounds of the size of the population providing proper fragments chosen by restricted oracles are derived. Tight bounds on problems with ring topologies where a subfunction overlaps two other subfunctions are also derived. Finally, experiments are conducted and well match the derivations.
Many optimization problems provide access to the partial or total explicit algebraic representation of the problem, including subfunctions and variable interaction graphs. This extra information allows the development of efficient solvers through new appropriate operators. Besides distinctive reproduction operators for a variety of problem categories, Gray Box algorithms have been proposed as a form to explore this additional information during the search. Considering recent evolutionary operators in the literature, we propose adaptations to Gray Box evolutionary reproduction operators and local search algorithms to deal with constrained problems, a field still little explored in gray box evolutionary optimization. The results show that the proposed methods achieve better solutions than traditional algorithms in a set of constrained binary and integer problems and reach optimal solutions in the literature for some instances.
Evolutionary diversity optimization aims to compute a set of solutions that are diverse in the search space or instance feature space, and where all solutions meet a given quality criterion. With this paper, we bridge the areas of evolutionary diversity optimization and evolutionary multi-objective optimization. We show how popular indicators frequently used in the area of multi-objective optimization can be used for evolutionary diversity optimization. Our experimental investigations for evolving diverse sets of TSP instances and images according to various features show that two of the most prominent multi-objective indicators, namely the hypervolume indicator and the inverted generational distance, provide excellent results in terms of visualization and various diversity indicators.
Hierarchical Classification (HC) consists of assigning an instance to multiple classes simultaneously in a hierarchical structure containing dozens or even hundreds of classes. A field that greatly benefits from HC is Bioinformatics, in which interpretable methods that make predictions automatically are still scarce. In this context, a topic that has gained attention is the classification of Transposable Elements (TEs), which are DNA fragments capable of moving inside the genome of their hosts, affecting the genes' functionalities in many species. Thus, in this paper, we propose a method called Hierarchical Classification with a Lexicographic Genetic Algorithm (HC-LGA), which evolves rules towards the HC of TEs. Our proposed method follows a Multi-Objective Lexicographic approach in order to better deal with the still relevant problem of accuracy-interpretability trade-off. Besides, to the best of our knowledge, this is the first work to combine HC with such an approach. Experiments with two popular TEs datasets showed that HC-LGA achieved competitive results compared with most of the state-of-the-art HC methods in the literature, having the advantage of generating an interpretable model. Furthermore, HC-LGA obtained a comparable performance against its simpler optimization version, however generating a more interpretable list of rules.
Offspring population size matters when comparing evolutionary algorithms with self-adjusting mutation rates
We analyze the performance of the 2-rate (1 + λ) Evolutionary Algorithm (EA) with self-adjusting mutation rate control, its 3-rate counterpart, and a (1 + λ) EA variant using multiplicative update rules on the OneMax problem. We compare their efficiency for offspring population sizes ranging up to λ = 3, 200 and problem sizes up to n = 100,000.
Our empirical results show that the ranking of the algorithms is very consistent across all tested dimensions, but strongly depends on the population size. While for small values of λ the 2-rate EA performs best, the multiplicative updates become superior for starting for some threshold value of λ between 50 and 100. Interestingly, for population sizes around 50, the (1 + λ) EA with static mutation rates performs on par with the best of the self-adjusting algorithms.
We also consider how the lower bound pmin for the mutation rate influences the efficiency of the algorithms. We observe that for the 2-rate EA and the EA with multiplicative update rules the more generous bound pmin = 1/n2 gives better results than pmin = 1/n when λ is small. For both algorithms the situation reverses for large λ.
This paper deals with the makespan-minimization job shop scheduling problem (JSSP) with no-wait precedence constraints. The no-wait JSSP is an extension of the well-known JSSP subject to the constraint that once initiated, the operations for any job have to be processed immediately, one after another, until the job's completion. A genetic algorithm with a new decoding strategy to find good quality solutions is proposed. The proposed decoding named as Reverse Decoding (RD) is a simple variant of the standard job-permutation-based representation for this problem. We show, through numerical examples, that by using simultaneously RD with the standard decoding better solutions than when using the direct or the reverse decoding separately are achieved, even under the same computational effort as the combined version. The proposed variant is compared with state-of-the-art approaches on a set of benchmarks. Experimental results on these benchmarks show that the proposed combination also produces competitive results.
A new evolutionary algorithm called the Mixing Genetic Algorithm is introduced for the Traveling Salesman Problem. The Mixing Genetic Algorithm does not use selection or mutation, and the children replace the parents every generation. The recombination operator is partition crossover. Partition crossover is respectful and transmits alleles (edges); this makes it possible to generate two offspring, the best possible offspring and the worst possible offspring, such that no edges are lost during recombination. The Mixing Genetic Algorithm organizes the population so that better solutions are usually recombined with other good solutions. Because no edges are lost or created during recombination, there is no need to access the evaluation function after the first generation. This dramatically reduces communication costs; this makes it possible to implement the Mixing Genetic Algorithm on massively parallel SIMD machines with limited memory. The Mixing Genetic Algorithm never loses diversity and cannot prematurely converge. We compare the Mixing Genetic Algorithm to EAX, one of the best inexact solvers for the Traveling Salesman Problems. For many problems the Mixing Genetic Algorithm finds optimal solutions using fewer recombinations than EAX.