We propose a framework for compiling and executing arbitrary depth algorithmic skeleton nesting in a fully automatic compiler that generates code that does not depend on the target architecture. It consists of translation rules (schemes) used at compile-time for the static analysis of the source program and at run-time to execute the nested skeletons. The run-time scheduling relies on the compile- time analysis and uses MPI's message-passing groups to run nested skeletons in parallel. The novelty is in allocating processors to skeletons through traversal of the higher order function nesting structure where processors are reallocated in a group to support the nested skeletons. This involves dividing the group into sub-groups for each of its immediately nested skeletons using a simple heuristic, and creating a top group which connects all sub-groups. The process then continues recursively for inner skeletons.
The experiments were conducted on the AP1000, Cray T3D and a network of Linux workstations for deeply nested HOFs combinations in solving arbitrary length number based matrix problems.The results suggest that good cross-platform portability and behavioural consistency can be achieved.