Genetic programming (GP) is an evolutionary computation approach to automatic programming; designing programs to solve a particular task through the use of an algorithm modelled upon processes and mechanisms of biological evolution. Genetic programming has several apparent advantages over better known automatic programming techniques. Unlike formal methods, GP requires no knowledge about the problem that it is attempting to solve other than a measure of how good a solution is, and once a GP run has been initiated it requires no human interaction. Unlike inductive logic programming, GP does not attempt to carry out an exhaustive search of the problem's solution space; but rather exploits its search history to identify those areas which are more likely to contain global optima.
However, GP is far from perfect. For much of its execution, it fails to effectively exploit its search history: since its model of biological recombination does not, on the whole, produce better programs from existing programs. GP has a bloat problem: the programs that it generates tend to become larger and larger at a rate which rises in line with a quadratic function; filling up the available space and making it near-impossible to find small, efficient solutions. Like other evolutionary computation approaches, GP has a scalability problem: increasing problem size leads to an unmanageably high increase in space and time resource requirements.
Nevertheless, the GP approach is not futile. It is still a young field of research, it is attracting a growing research community, and there are many avenues of research yet to be explored. Moreover, it already shows a lot of promise. GP has been used to solve a huge variety of problems: from designing robot controllers to discovering new quantum algorithms to understanding the role of genetic motifs and protein structures in biology. One of its particular strengths is its ability to find solutions to problems which a human would probably never consider: making it possible to solve problems in fields which humans do not thoroughly understand (like biology); or find hard to understand (like quantum computing); and to discover new, more efficient solutions to problems which humans have already solved (such as computer algorithms); not to mention its ability to inform humans through reverse-engineering of these solutions.
Modelling biology on computers has been a prevailing theme for much of the history of computation. It is the ambition of many computer scientists to create computers which are as powerful and flexible as animal brains. Accordingly, it is logical to draw information from systems which have already achieved this feat: biological systems. Nevertheless, the interests of computer scientists in biology are not limited to neural processes. Biological systems have many properties which are the envy of computer science; for example: growth, learning, reconfigurability, self-repair, fault tolerance, and reproduction - though of course there are many biological processes, mechanisms and artefacts which at the moment would have no conceivable benefit within the silicon environment of a computer.
Biological modelling gave birth to evolutionary computation, yet there are mixed views within the EC community regarding biological modelling within evolutionary computation. A number of researchers cite the `No Free Lunch' theory [Wolpert and Macready, 1995] whilst warning against the futility of introducing greater complexity to evolutionary algorithms: yet this same theory also predicts that evolutionary computation can be no better than random search across the whole spectrum of problem domains. Clearly, EC is considerably better than random search upon the kinds of problems that computer scientists actually want to solve. Other researchers have championed the value of biological modelling within EC; testifying to the obvious superiority of biological evolution over simulated evolution.
Evolvability is the relative capacity for something to exhibit appropriate change. For a biological organism, evolvability is the relative likelihood that random genetic change could lead to an improvement in the organism's fitness. For computer software, evolvability is the ease with which new functionality can be introduced by a human programmer. For a human language, evolvability is a measure of its ability to express new concepts and accept new constructs and methods of communication as required. These examples illustrate that evolvability is meaningful within many domains. In fact, the concept of evolvability can be thought of as a pattern which can be applied to any domain where an entity or group of entities is subject to a process of change; and particularly where a domain has some notion of a particular direction of change being more valuable than another.
What is perhaps less appreciated is that concepts of how to achieve evolvability can also be applied across domains. For example, in software systems modularity is seen as a source of evolvability since it limits the effect of change to a local region of the program. This limits the global impact of change within the program, making it easier to apply change, and making it less likely that any errors will propagate to other parts of the program. Likewise, biological organisms develop in a compartmental fashion, allowing individual compartments to be evolved separately and making it less likely that genetic errors will propagate within the whole organism [Conrad, 1990].
The work reported in this thesis is motivated to a large degree by the premise that principles of biological evolvability can be applied within non-biological artefacts in order to improve the way in which they evolve. This premise is motivated by a number of observations: (i) the way in which biological organisms are represented is believed to be a product of evolution and therefore presumably superior to many other potential representations; (ii) these representations are evidently capable of describing and supporting the evolution of highly complex systems i.e. biological organisms; (iii) these representations are capable of describing systems of computation; and (iv) many principles of evolvability are known to be applicable across domains. That biological representations can describe computation is perhaps not obvious but is supported by the developing community of computer scientists who use models of biological processes to carry out information processing and other overtly computational tasks.
The product of this work is a GP system called enzyme genetic programming. Enzyme genetic programming has many similarities with conventional GP systems: in that it uses a population based evolutionary algorithm; it applies recombination and mutation operators to create new programs; it specifies a problem using a fitness function, a function set and a terminal set; and it generates programs represented by parse trees. However, it does not use parse trees to represent programs during evolution but rather uses a new program representation modelled upon the way in which enzyme systems are represented in biology. The consequences of this new representation for evolution are examined at length in the pages to come.
This work makes the following principle contributions to knowledge:
This thesis is organised into three main segments. Chapters 1 to 5 introduce the concepts of biological representation, evolution, and evolvability. Chapters 6 and 7 review related work in genetic programming and evolutionary computation. Chapters 8 to 10 describe the novel contributions of this research. Specifically:
[Contents] [Next Chapter]