Introduction to GP — DEAP 1.4.1 documentation (2024)

Symbolic regression is one of the best known problems in GP (seeReference). It is commonly used as a tuning problem for newalgorithms, but is also widely used with real-life distributions, where otherregression methods may not work. It is conceptually a simple problem, andtherefore makes a good introductory example for the GP framework in DEAP.

All symbolic regression problems use an arbitrary data distribution, and tryto fit the data with the most accurate symbolic formula available. Usually,measures like the RMSE (Root Mean Square Error) or MSE (Mean Squared Error) are used to measure anindividual’s fitness.

In this example, we use a classical distribution, the quartic polynomial\((x^4 + x^3 + x^2 + x)\), a one-dimension distribution. 20 equidistantpoints are generated in the range [-1, 1], and are used to evaluate thefitness.

Creating the primitives set

One of the most crucial aspect of a GP program is the choice of theprimitives set. They should make good building blocks for the individuals sothe evolution can succeed. In this problem, we use a classical set ofprimitives, which are basic arithmetic functions :

# Define new functionsdef protectedDiv(left, right): try: return left / right except ZeroDivisionError: return 1pset = gp.PrimitiveSet("MAIN", 1)pset.addPrimitive(operator.add, 2)pset.addPrimitive(operator.sub, 2)pset.addPrimitive(operator.mul, 2)pset.addPrimitive(protectedDiv, 2)pset.addPrimitive(operator.neg, 1)pset.addPrimitive(math.cos, 1)

The redefinition of the division is made to protect it against a zerodivision error (which would crash the program). The other functions aresimply a mapping from the Python operator module. The number followingthe function is the arity of the primitive, that is the number of entriesit takes.

On the last line, we declare an MetaEphemeral constant. This isa special terminal type, which does not have a fixed value. When the programappends an ephemeral constant terminal to a tree, the function it contains isexecuted, and its result is inserted as a constant terminal. In this case,those constant terminals can take the values -1, 0 or 1.

The second argument of PrimitiveSet is the number ofinputs. Here, as we have only a one dimension regression problem, there isonly one input, but it could have as many as you want. By default, thoseinputs are named “ARGx”, where “x” is a number, but you can easily renamethem :

pset.addPrimitive(math.sin, 1)

Creator

As any evolutionary program, symbolic regression needs (at least) two objecttypes : an individual containing the genotype and a fitness. We can easilycreate them with the creator :

pset.renameArguments(ARG0='x')

The first line creates the fitness object (this is a minimization problem, sothe weight is negative). The weights argument must be an iterable ofweights, even if there is only one fitness measure. The second line createsthe individual object itself. Very straightforward, we can see that it willbe based upon a tree, to which we add a fitness. If, for any reason, the userwould want to add any other attribute (for instance, a file in which theindividual will be saved), it would be as easy as adding this attribute ofany type to this line. After this declaration, any individual produced willcontain those wanted attributes.

Toolbox

Now, we want to register some parameters specific to the evolution process.In DEAP, this is done through the toolbox :

creator.create("Individual", gp.PrimitiveTree, fitness=creator.FitnessMin)toolbox = base.Toolbox()toolbox.register("expr", gp.genHalfAndHalf, pset=pset, min_=1, max_=2)toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.expr)toolbox.register("population", tools.initRepeat, list, toolbox.individual)toolbox.register("compile", gp.compile, pset=pset)def evalSymbReg(individual, points): # Transform the tree expression in a callable function func = toolbox.compile(expr=individual) # Evaluate the mean squared error between the expression # and the real function : x**4 + x**3 + x**2 + x sqerrors = ((func(x) - x**4 - x**3 - x**2 - x)**2 for x in points) return math.fsum(sqerrors) / len(points),toolbox.register("evaluate", evalSymbReg, points=[x/10. for x in range(-10,10)])toolbox.register("select", tools.selTournament, tournsize=3)toolbox.register("mate", gp.cxOnePoint)toolbox.register("expr_mut", gp.genFull, min_=0, max_=2)toolbox.register("mutate", gp.mutUniform, expr=toolbox.expr_mut, pset=pset)

First, a toolbox instance is created (in some problem types like coevolution,you may consider creating more than one toolbox). Then, we can register anyparameters. The first lines register how to create an individual (by callinggp.genHalfAndHalf with the previously defined primitive set), and how to createthe population (by repeating the individual initialization).

We may now introduce the evaluation function, which will receive anindividual as input, and return the corresponding fitness. This function usesthe compile function previously defined to transform the individual intoits executable form – that is, a program. After that, the evaluation is onlysimple maths, where the difference between the values produced by theevaluated individual and the real values are squared and summed to computethe MSE (Mean Squared Error), which is returned as the fitness of the individual.

Warning

Even if the fitness only contains one measure, keep in mind that DEAPstores it as an iterable. Knowing that, you can understand why theevaluation function must return a tuple value (even if it is a 1-tuple) :

def evalSymbReg(individual, points): # Transform the tree expression in a callable function func = toolbox.compile(expr=individual) # Evaluate the mean squared error between the expression # and the real function : x**4 + x**3 + x**2 + x sqerrors = ((func(x) - x**4 - x**3 - x**2 - x)**2 for x in points) return math.fsum(sqerrors) / len(points),

Returning only the value would produce strange behaviors and errors, asthe selection and stats functions relies on the fact that the fitness isalways an iterable.

Afterwards, we register the evaluation function. We also choose the selectionmethod (a tournament of size 3), the mate method (one point crossover withuniform probability over all the nodes), and the mutation method (a uniformprobability mutation which may append a new full sub-tree to a node).

Then, we decorate the mate and mutate method to limit the height of generatedindividuals. This is done to avoid an important draw back of geneticprogramming : bloat. Koza in his book on genetic programming suggest to use amax depth of 17.

At this point, any structure with an access to the toolbox instance will alsohave access to all of those registered parameters. Of course, the user couldregister other parameters basing on their needs.

Statistics

Although optional, statistics are often useful in evolutionary programming.DEAP offers a simple class which can handle most of the “boring work”. Inthis case, we want to compute the mean, standard deviation, minimum, andmaximum of both the individuals fitness and size. For that we’ll use aMultiStatistics object.

 hof = tools.HallOfFame(1) stats_fit = tools.Statistics(lambda ind: ind.fitness.values) stats_size = tools.Statistics(len) mstats = tools.MultiStatistics(fitness=stats_fit, size=stats_size) mstats.register("avg", numpy.mean) mstats.register("std", numpy.std)

Note that a simple Statistics object can be used, as inprevious examples when statistics over a single key are desired.

Launching the evolution

At this point, DEAP has all the information needed to begin the evolutionaryprocess, but nothing has been initialized. We can start the evolution bycreating the population and then calling a complete algorithm. In this case,we’ll use eaSimple().

 random.seed(318) mstats.register("max", numpy.max)

The hall of fame is a specific structure which contains the n bestindividuals (here, the best one only).

The complete examples/%sgp/symbreg.

Reference

John R. Koza, “Genetic Programming: On the Programming of Computers by Meansof Natural Selection”, MIT Press, 1992, pages 162-169.

Introduction to GP — DEAP 1.4.1 documentation (2024)
Top Articles
Latest Posts
Article information

Author: Rob Wisoky

Last Updated:

Views: 6440

Rating: 4.8 / 5 (68 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Rob Wisoky

Birthday: 1994-09-30

Address: 5789 Michel Vista, West Domenic, OR 80464-9452

Phone: +97313824072371

Job: Education Orchestrator

Hobby: Lockpicking, Crocheting, Baton twirling, Video gaming, Jogging, Whittling, Model building

Introduction: My name is Rob Wisoky, I am a smiling, helpful, encouraging, zealous, energetic, faithful, fantastic person who loves writing and wants to share my knowledge and understanding with you.