It is widely accepted that this task is NP-hard, so a deterministic algorithm cannot have a good runtime. Instead, a genetic algorithm is used: One solution is one individual. By combinating and mutating superiour individuals and removing inferiour ones, it is hoped that a close-to-optimum solution can be found in far less time than with a deterministic algorithm.
The term "evolutionary programming" was coined for algorithms that mimick mother nature - in the case of "genetic programming" the darwinian evolution theory. Genetic programming is therefore subbranch of evolutionary programming.
On hardware systems like those used in parallel computing this diameter is particulary important in respect to guaranteed response times. While for some structures like quadratic meshs the optimal solution is already known (H-trees), irregular graphed networks require customized solutions.
The process of minimizing. A given graph of processing nodes has usually the constraint that the number and length of links is immutable. So the only possible adjustment is to join links, hereby removing the node in the middle (e.g. n1-n2-n3, a 3-node link, becomes n1-n3, a two node link; while the waylength from n1 to n3 is optimized (2 becomes 1), n2 may, dependent on the surrounding graph, suffer from this join).
Finding an optimal solution to this problem is NP-hard, so it is not desireable to use a determistic algorithm. Instead, heuristic approaches may be better suited.
Individuals can mutate or interbreed, thus creating new solutions that are only slightly different from their parent(s). Each solution has a fitness rating (in this case: the diameter of the graph), and only the fittest survive; thus, a step-by-step improvement process is initiated.
The basic process may be explained by a pseudo-algorithm.
while ( <evolution should run> ) {
<kill everyone except the top 10% or so>
<interbreed them>
<let some of the children mutate>
}
It is easy to see that the top 10% of solutions never become worse than in the previous generation - good solutions are simply immortal and may only be superseded by better children or mutations.
While the breeding process combines the attributes of the parents at random, the mutation is far more random - virtually any configuration can be achieved by mutation. In Gen-netways the frequency and severeness of mutation is dependent on the radiation in the gene pool; this radiation changes over time.
As a conclusion, genetic algorithm need:
Annealing (cooling down) referes to the severeness of mutation. In the beginning, mutations tend to be big, thus reaching every region in the solution space; then, mutations begin to become smaller, heading for local minima (minima in the case of diameter reduction - normally maxima). This way, all local minima are reached, and one of them is hopefully the global minimum.
In Gen-netways this effect is (somewhat limited) incorporated by the principle of radiation. From time to time (random; factor can be customized) a nuclear outbreak occurs, raising the radiation to maximum; in the following time, radiation converges to the norm again.
It is hoped that the use of simulated annealing principles can reduce the concentration on local minima in the gene pool.
For fitness calculation, the graph will be represented as a distance
matrix M which is quadratic; each side has the number of nodes as length.
In this matrix, an upper triangle is build which represents the costs of
reaching a point from another. Since every edge is bidirectional, redundant
information can be erased.
Example: 0 means reachable at no cost, 1 means directly reachable with waylength 1, -1 means not directly reachable, x means "does not matter".
Path. The smallest unit in the system, a Path is simple a link from one node to another. It may have more nodes inbetween, though; these nodes indicate where the link lies on the graph (the link may be splitted at these nodes later on). Example: n1-n2 is a simple path of length 1; n1-n2-n3 is a path between n1 und n3 of length 1 but it may be splitted to form to paths (n1-n2 and n2-n3) of length 1 each. Paths are the "genes" of individuals.
Individual. Each individual is one solution; everyone contains a list of Paths (genes) that define the solution graph. The fitness of each individual is calculated using a adjacency matrix derived from the Path list. Individuals can breed by mixing their Path lists and mutate by making changes to the list.
GenePool. A gene pool consists of a number of individuals. Radiation and selection parameters are also properties of the gene pool. The number of individuals remains the same; in each evolution cycle a certain number of survivors (dependant on parameters, default 40%) is selected and the remaining slots are filled with children and mutations. A gene pool primarily serves as a collection of individuals and controller of their surroundings.
World. The world keeps the gene pool running. A world could, in future versions, also contain more than one gene pool. In Gen-Netways, the World serves as starting point and as model for the GUI (model-view-controller pattern, a software technique).
Additionally, to fulfill the idea of simulated annealing, higher radiation results in building "snakes" of path objects, thus creating bigger jumps (node-wise). All the time only if the result is still a spanning tree, of course. The slow-cool-down property of the radiation hopefully provides a simulated-annealing effect.
On the breakdown side, paths are chosen at random, and, when they have a length greater than the standard-2, broken at a random point. Since no connecting node is taken away, the resulting graph is always a spanning tree if it was one before the breakdown.
The first parent is taken as blueprint (copy) for the child, thereafter all other parents are searched for paths that have a length greater than the standard-2. Each of these longer paths has only a probability of 1/ParentCount to be taken into account of all.
If this "outstanding gene" is to be applied the child, it is checked if there is a substitute that can be removed to compensate the adding of a new path (e.g. when n1-n2-n3-n4 is to be inserted that must be paid for with removal of equivalent paths, one e.g. being the combination n1-n2-n3 and n3-n4). This ensures that only the ways that exists are used, and none is used twice. Even if there are substitutes, the result after the insertion must still be a spanning tree. If not, insertion is rejected.
It is to be noted that the first parent is in clear advantage - all of its outstanding genes are in the first blueprint of the child; on the other hand, its outstanding genes are more likely to be removed during the insertion process of partner genes.
From first tests it seems that mutation is far more powerful than breedings - maybe the simulated annealing principle is working better than the overall approach of genetic algorithms. But since interbreeding is faulty (see also 4.1), this may be the mistake.
Another idea is that there could be more than one gene pool. A number of "houses" would only interbreed among each other (with a few "interracial mixes" to improve gene pool quality), and from time to time wage nuclear wars, resulting in increased radiation on the others side. Unlike the real world, it is not recommended to kill the fittest individuals in senseless wars.
EOF (Jul:2001)