Boolean logic and intentions


Boolean logic

I’ve seen quite a few blog and forum posts about how developers “abuse” boolean logic.
The most common of these examples would probably be something like:

bool userIsAuthenticated = Foo || Bar;
if (userIsAuthenticated)
{
   return true;
}
else
{
   return false;
}

And people are arguing about how bad that is and that it can be shortened down to:

bool userIsAuthenticated = Foo || Bar;
return userIsAuthenticated;

And then the next guy comes along and says it can be shortened down even more to:

return Foo || Bar;

And the general consensus is that the developer who wrote the first piece of code don’t know how boolean logic works and should be fired etc, the standard internet forum user attitude.

I personally find this somewhat strange, because I don’t think the first version is that bad.
Sure, it is bloated and much bigger, but is that all that counts?

I’m not into the “less is more” mindset, I’m well aware that allot of code can be shrunken into short incomprehensible nested and recursive expressions.

I don’t want “small” code, I want readable code that clearly communicates the intentions of the developer.

I find the first version very explicit, it tells the reader the exact intentions of the developer.
You can easily add comments to that kind of structure.
And you can also set breakpoints on the two return clauses, making life easier while debugging.

Don’t get me wrong, I would most likely go for the 2nd version in most cases;
Do the assignment on one line and then return that result.
But in some special scenarios I might go for the first version, simply because it communicates the intentions better.

The first version also opens up ways for us to add alternative flows.
e.g. Special logging messages when the result is true.

And I have to say that the same goes for:

if ( Foo == false )

It might be bloated, but the intention is very clear here.
The code could be shorter, but it isn’t, just because I want you to see my intentions.
So, Am I crazy?
Or do you also think that verbose code sometimes express the intentions better?

Hill climbing, GP, GA, EP, ES


[Edit]
See Maarten’s comments for more info
[/Edit]

There have been allot of discussions about what sort of algorithm the EvoLisa app uses.
Quite a few have claimed that it is a hill climber.

So I think it’s time to set everything straight here.

This is the definition of Hill climbing from wikipedia:

Hill climbing attempts to maximize (or minimize) a function f(x), where x are discrete states. These states are typically represented by vertices in a graph, where edges in the graph encode nearness or similarity of a graph. Hill climbing will follow the graph from vertex to vertex, always locally increasing (or decreasing) the value of f, until a local maximum (or local minimum) xm is reached. Hill climbing can also operate on a continuous space: in that case, the algorithm is called gradient ascent (or gradient descent if the function is minimized).*.

If we look at the EvoLisa app; there are no vertices, there is no graph, there are no edges that encodes nearness.
There is no continious space.

There is an individual with its own set of DNA, and that individual can breed offspring that can be quite far from the parent.
e.g. Each polygon in a drawing could change shape and size, each polygon could change color.
(The likeliness that it happens might be extremely small, but it is not impossible)
There are no paths that can be followed or backtracked like in a graph.

So if we stick to the wikipedia definition of hill climbing, then we can safely say that EvoLisa does not use a hill climber.

So what about my claims about Genetic Programming?

The wikipedia definition of GP is:

GP evolves computer programs, traditionally represented in memory as tree structures. Trees can be easily evaluated in a recursive manner. Every tree node has an operator function and every terminal node has an operand, making mathematical expressions easy to evolve and evaluate. Thus traditionally GP favors the use of programming languages that naturally embody tree structures (for example, Lisp; other functional programming languages are also suitable).

Non-tree representations have been suggested and successfully implemented, such as linear genetic programming which suits the more traditional imperative languages [see, for example, Banzhaf et al. (1998)]. The commercial GP software Discipulus,[6] uses AIM, automatic induction of binary machine code[7] to achieve better performance.[8] MicroGP[9] uses a representation similar to linear GP to generate programs that fully exploit the syntax of a given assembly language

OK, the first statement says GP evolves computer programs, that is sort of fuzzy, a program can be allot more than computational expressions.
But the next highlighted part implies that GP should deal with computational AST’s only.

IMO, for loops, variable declarations and other statements could very well be used in GP .
An AST that contains statements for drawing polygons onto a canvas when evaluated is also a program.
I could very well create a parser for a language that generates that very same AST and everyone using it would agree that it is a language and that they are writing programs when using it.

But let’s ignore what I think, lets stick with the definition here.

If we accept that GP only deals with computational programs and not imperative execution or declarations, then EvoLisa is not GP.

So what would be the correct definition of the algorithm then?

The closest definition would be:
http://en.wikipedia.org/wiki/Evolutionary_programming
or
http://en.wikipedia.org/wiki/Evolution_strategy

EvoLisa fits the 1+1 Evolutionary Strategy.
But it also fits the μ + μ EP since it relies on an AST with mutable values.

Which one of those two it is depends on what definition you set for “a program”
When is an AST an AST and not just data?

e.g. are these statements data or code? “DrawPolygon ( … )” or “print ‘foo’” .

//Roger

Clustering evolution – The failures in EvoLisa


When posted about the Mona Lisa evolution demo I got a request to make it possible to run it in parallel.
At first I thought; “How hard can it be?”
But it turned out to be more than hard, I’m starting to think that it is impossible.

The first approach that everyone points to is:

1) Run separate islands in parallel.

The idea is to run a separate island per node in a cluster and let that island have it’s own population of evolving individuals.
Add one node and the problem is solved twice as fast, add another one and it’s solved three times as fast.

WRONG!

Let’s say that it takes about 10 to 15 minutes to solve the problem on a single machine, then it is very likely that it will take 10 to 15 minutes on other machines to.
So you will not gain much by adding nodes to this solution since each node will solve the problem in about 10 minutes at best.

No matter if you throw hundreds of nodes at it, the best node will still average in at about 10 minutes.

This is probably the worst solution.

The next solution looks fine in theory:

2) Run islands in parallel and report back to the root node after each generation.

This is a similar solution as the first one, but instead of running them in isolation, they report back to the root node after each generation and let the root decide what individuals to keep or discard.
Let’s say that one out of ten mutations are positive then we increase the chance to get a positive mutation by one tenth per node we add.
So if we have 10 nodes, then we are very likely to get a positive mutation each generation.

This should solve it, right?

NOPE! (sort of)

The problem with this solution is that it is extremely chatty, the performance gain you get from this is eaten by the overhead of the communication.
There is allot of data that needs to be passed around each generation.

This could work well if the fitness function takes allot more time than the communication, but in my case it doesn’t.

Attempt 3:

3) Partition the workload and use deterministic evolution per node.

I had some big hopes for this one.

The idea was to avoid sending as much data between nodes by using deterministic evolution per node (init the random generator with the same seed).
So that each node evolves the _same_ image, but only renders and calculates fitness for a small part of it and then reports back the partitioned fitness to the root node that then decides if the image should be kept or not.

This way, only the fitness value and an accept message would be passed between nodes and the root.

The communications overhead is sort of small here, so that is all nice.
So if I have two nodes, one would think that it would only take half the time to render each partition than if you only have one node.

However this turned out to be wrong too, since allot of polygons in the image covers the area for more than one node, those polygons have to be (partially) rendered by multiple nodes.

So instead of getting a 100% performance boost from this, the actual performance gain was only about 35%.

35% faster rendering minus the overhead of communication gives this approach a total perf boost of 25-30% per node.

Not that impressive at all, but it does work.

 

There are also variations on the above approaches, but all of the ones I’ve tried fails on either the communication overhead or bad understanding of probability theory.
One such example is “Don’t communicate that often, make a batch and select the best one from that batch” , that one falls under the probability theory part ;-)

Ideas ?

EvoLisa: Optimizations and Improved quality


This is a follow up to: http://rogeralsing.com/2008/12/07/genetic-programming-evolution-of-mona-lisa/

The last few days we have thrown quite a bit of changes onto the EvoLisa app.

Dan Byström provided an optimization that resulted in a 25 times performance improvement for the fitness function, completely crazy stuff :-)

Me and Mats Helander have been discussing ways to make EvoLisa paralellizable to make EvoLisa run on a big computation cluster, and Mats came up with a brilliant idea on how to partition it.
(More info on that later)

We have also been playing quite a bit with the rendering, added spline rendering and changed how the polygons mutate.

Just check this out:

monaeyes

How cool is that? ;-)

Evolutionary misconceptions


Since I posted my evolving Mona Lisa post I have received allot of questions and comments regarding the simulation.

One fairly common statement is: “This is goal oriented evolution, in real life there is no goal”

This is the first misconception, there is no “goal” in the application.
The “source image” is not the goal, it is the ENVIRONMENT in which the organisms live and try to survive.
It is that environment that determines if a specific individual is fit for reproduction or not.
The same way that all other evolution is also dependant on the environment, e.g. in real life the environment is much more complex and dynamic.

See it as a set of parameters that describes the world where the evolution takes place, the parameters just happen to be in the form of what we humans experience as an image.
It could just as well be a list of parameters such as temperature, gravity, resources, other organisms etc.

Of course _I_ had a goal for the application when I created it, but you have to differentiate between my goal and the goal of evolution.
My goal was to see if a specific problem could be solved, however, the application and the evolutionary process in it knows nothing about nor cares about that problem, that process has no goal, It simply lets the most fit individual reproduce and nothing more.

The second misconception is to see the “generated image” as how the evolved organism would look.
This is not the case, the “generated image” shows the LEVEL OF ADAPTATION of the organism, not it’s physical body.

A generated image that have a high resemblance to the source image simply means that the organism is highly adapted to the environment.
Not that we have an organism that looks like the goal of the simulation, that is an incorrect interpretation of the whole application.

The third misconception is that this can be used as some sort of bat against the ID / creationist movement,  this is also wrong.
This is wrong simply because I can never ever write a simulation where I myself is not a parameter in the simulation, so _my_ applications proves nothing on that front.

The only thing we can confirm from this application is that evolutionary approaches can be used to solve tricky problems that we ourselves does not know the solution to if we provide rules and an enviroment in which the problem can be solved.
We can not use this application to confirm or dismiss the presence of  God, Thor, Shiva or any other theological being.

Genetic Gallery


I will be collecting images generated by Evo-Lisa here and make a gallery out of it.
So if you have a portrait or other image that has been Evo-Lisa’ed that you want to show, please mail them to “Roger dot Alsing at Precio dot se”

For those of you who don’t know what I’m talking about, please see:
http://rogeralsing.com/2008/12/11/genetic-programming-mona-lisa-source-code-and-binaries/
And
http://rogeralsing.com/2008/12/07/genetic-programming-evolution-of-mona-lisa/

Art painted by evolution

I’ve already gotten the first image:

Mats Helander’s dog:

lillie

Bill Gates:

billgates

Moon man:

moonman

Scream:

scream

Darwin:

darwin

Lara:

lara

Tux:

tux

Opera House Day time:

operahouseday

Opera House Night time:

operahousenight