4. Advanced Features of the Resource Analysis

4.1. Elaboration Module

Input dependent bounds on resource usage of programs are only useful if they easily allow to distinguish large classes of inputs having roughly the same resource usage. Consider having a black box for a program that can compute the precise execution cost for any particular input. Even if this black box computes very fast, one still needs to examine all inputs one by one in order to determine the worst case or to establish an overview over cost-behaviour. Since the number of concrete inputs may be large or even infinite, this is generally infeasible.

The original amortised analysis technique as proposed by Tarjan, being very powerful, may generally produce such a precise ``black box'' cost oracle. This is not a hindrance for a manual technique, as the mathematician performing the method has direct control over the complexity and behaviour of the ``black box'' that is created. However, for an automated analysis we must ensure that the outcome is always simple enough to be understood and useful. The cost bounds produced by our automated version of the amortised analysis technique are always simple. Namely, they are linear in the sizes of the input. Of course, this is a byproduct of enabling an automated inference in the first place. This design guarantees that we can easily divide all possible inputs into large classes having a similar cost. For example, for a program processing two lists we might learn instantly from the result of our efficient analysis that the execution cost can be bounded by a constant times the length of the second list, thereby throwing all inputs which only differ in the first argument into the same cost class. Furthermore we immediately know the execution cost for infinitely many of such input classes.

We now exploit this even further to produce cost bounds expressed in natural language. Thus far, the cost bound had only been communicated to the user using type annotations. While these allowed a concise and comprehensive presentation of the derived bounds, they also required a fair amount of expertise to understand, despite most derived bounds being actually rather simple. The new elaboration module helps to interpret the annotated types by ignoring irrelevant information, summing up weights in equivalent positions and producing a commented cost-formula, parameterised over a program's input.

We now revisit the results for the red-black tree insertion function from Section Section 3.3.1. Performing heap analysis of the insert with the additional option --speak produces the following output:


ARTHUR3 typing for HumeHeapBoxed:
(int,tree[Leaf<20>|Node<18>:colour[Red|Black<10>],#,int,#]) -(0/0)-> 
                               tree[Leaf|Node:colour[Red|Black],#,int,#]

Worst-case Heap-units required to call rbInsert in relation to its input:
  20*X1 + 18*X2 + 10*X3
    where
        X1 = number of "Leaf" nodes at 1. position
        X2 = number of "Node" nodes at 1. position
        X3 = number of "Black" nodes at 1. position

This makes it easy to see that the number of black nodes is significant for the cost formula. Furthermore the cost formula 20*X1 + 18*X2 + 10*X3 is obviously much more compact and easy to understand. We are directly told that X1 corresponds to the number of leaves in the first tree argument (there is only one tree argument here); that X_2 corresponds to the number of all nodes and that X_3 corresponds to the number of black nodes. Note that this bound is inferior to the one shown in Section Section 3.3.1 and we will address this in the following subsection.

A further optimisation implemented in our elaboration module is the recognition of list-like types, that have the property that all constructors have precisely one self-recursive argument, except for a single constructor having none. In this case it is clear that each element of such a type must have precisely one such terminating constructor. Therefore the weight attached to the terminal constructor can be moved outwards, e.g.


(llist[Nill<1>|Consl<2>:ilist[Nil<3>|Cons<4>:int,#],#]) -(5/0)-> int]

Worst-case Heap-units required to call foo in relation to its input:
  6 + 5*X1 + 4*X2
    where
        X1 = number of "Consl" nodes at 1. position
        X2 = number of "Cons" nodes at 1. position

We see that the cost formula is much simpler than the annotated type, which features 5 non-zero annotations, whereas the cost formula has only three parameters. Note that this useful simplification naturally incurs a slight loss of information. The annotated type expresses that 3 resource units are needed whenever the end of the inner list is reached. However, the program may sometimes choose to abort processing a list to the very end, in which case those 3 resource units are not needed. While our analysis must produce a guarantee on the resource usage for all cases, this simplification means no loss for the bound produced. Nevertheless it is conceivable that a programmer might sometimes make use of this additional knowledge about the resource behaviour of the program. However, we believe that the majority of cases benefits from this simplification.

4.1.1. Optimizing the bound for red-black tree insertion

We again revisit the red-black tree example from Section Section 3.3.1, this time to show how the interactive optimisation of the solution works. Invoking our analysis for the heap space metric as follows art3 -H -a redblack11.art3 delivers the following prompt:


ARTHUR3 typing for HumeHeapBoxed:                                         
(int,tree[Leaf<20>|Node<18>:colour[Red|Black<10>],#,int,#]) -(0/0)-> 
                                tree[Leaf|Node:colour[Red|Black],#,int,#]
Worst-case Heap-units required in relation to input:                      
20*X1 + 18*X2 + 10*X3                                                     
    where                                                                 
        X1 = number of "Leaf" nodes at 1. position                        
        X2 = number of "Node" nodes at 1. position                        
        X3 = number of "Black" nodes at 1. position                       
                                                                          
Enter variable for weight adjustment or "label","obj" for more info:

We are unhappy with the high cost associated with the leaves of the tree, since it seems unreasonable to require such a high cost for processing an empty leaf. Therefore we ask the analysis to lower this value considerably, by increasing the penalty of the associated resource variable from 6 to 36.


Enter variable for weight adjustment or "label","obj" for more info: X1
Old objective weight: 6.0 Enter relative weight change: 30                
Setting CVar '351' to weight '36.0' in objective function.

(int,tree[Leaf|Node<10>:colour[Red|Black<18>],#,int,#]) -(20/0)-> 
                                tree[Leaf|Node:colour[Red|Black],#,int,#]
Worst-case Heap-units required in relation to input:
20 + 10*X1 + 18*X2
    where
        X1 = number of "Node" nodes at 1. position
        X2 = number of "Black" nodes at 1. position

This already results in the desired solution. The fixed costs increase from 0 to 20, the costs associated with all leaves drop from 20 to 0, and the cost of each red node decreases by 8. Since every tree contains at least one leaf, this bound is clearly better for all inputs.