This page covers the C+MPI tutorials in the course F21DP (Haskell tutorials are here).
Sequential C
As background for the sequential C part, refer to the slides for Lecture 2: C Revision (Part 1) Lecture 3: C Revision (Part 2) and the online C Book. Pick up sample sources here.
Practicals
List exercises from lecture 2:
- Write a function node *append(node *x, node *y) that appends the elements of the the second list y to the end of the first list x (i.e. append([1,2],[3,4]) ==> [1,2,3,4]).
- How does this affect the list x?
- Write a second version that does not modify the input lists.
- Under which condition is it safe to use the first version?
- Write a function node *reverse(node *x) that reverses the elements in the list (i.e. reverse([1,2,3]) ==> [3,2,1])
Tree exercises from lecture 3:
- Write a function that checks, whether a tree is balanced.
- Write a function struct node *readTree(FILE *fin, int n) that reads n integer values from a file and builds a balanced tree.
- Write a function void insert(struct node *t, int n) that inserts a value n into a tree, whose leaves are sorted in ascending order.
- For an input array of length n, how much space does such a tree consume?
- Change the tree data structure, to use a more efficient representation of the tag.
Sample solutions:
Theory
For each of the following C definitions/declarations, describe the definition/declaration in English and calculate how much space is required for/allocated to an instance.
double a[7]; char b[5]; struct details {int age; float height;}; struct person1 {char name[4]; struct details d;}; struct person2 {char name[5]; struct details d;}; struct person1 people[15]; struct person2 teams[7][9]; struct person1 * persons[9]; struct person2 ** groups; struct cell {int val; struct cell * left,* right;}; struct cell cells[50]; struct cell * nodes[25];
Suppose cells, defined above, has been fully populated with appropriate values. Explain why it might not be sensible to:
- coerce its content to (char *) and send them as a byte stream over a channel to another processor;
- on the 2nd processor, put the byte stream into an array of char, which is then coerced to (cell *) and copied into an array of cell.
- assume a parallel tree search is implemented by sending left and right subtree, with the search element to different processor, using the above byte stream conversion; at which point would the the search fail?
NB space allocated to a structure includes padding to align on an appropriate word boundary. This depends on the architecture (32-bit or 64-bit).
C + MPI
Practicals
To go through the C+MPI tutorial, using the matrix multiplication example from class, follow this link to a FAQ.
- Hello world (
very easy ): Download and run the MPI hello world program from class. - Multiple guessers (
easy ): As a gentle introduction into C+MPI, extend the guesser example to several guesser, who pick random values in the known interval. The master needs to distinguish the guessers and announce the winner of the competition. - Seq matrix multiplication (
easy ): To get started, download and run the different parallel versions of matrix multiplication. - Parallel matrix multiplication (
moderately challenging ): Modify the naive parallel version (matrix4.c) to work with any number of processors. - Compare the performance of this version with the final parallel version (matrix8.c) on the large input data, and explain the difference in runtime.
- Parallel matrix multiplication (block-structured) (
challenging ): Extend matrix8.c (using row clustering), to a version with block clustering, by applying the same clustering approach when sending columns. - Parallel matrix multiplication (collectives) (
fairly challenging ): Devise a parallel matrix multiplication algorithm using MPI collective communication (MPI_Bcast, MPI_Scatter, MPI_Gather). - Exponential function (
moderately challenging ): Extend the partial solution to the parallel exponential function, to a general solution for all possible inputs, checking for convergence of the result and tuning the parallel performance, so that the final stages don't dominate the compute time. (seq version) -
Exponential function with GMP arithmetic (
a bit more challenging ): As above, but instead of starting from the naive, sequential implementation, start form this competitive sequential implementation of the exponential function: it uses the GNU GMP library for arbitrary precision arithmetic, so that the result (e^z) can be computed to an arbitrary precision. You supply z and the precision d on the command line (see the comments at the start of the program for explanations). When you study the code, you will see that the overall structure is the same as in the naive version, but all * and + etc operations are replaced by GMP primitivies. Test the program on some inputs with increasing size, and check results and runtime. The concrete task in this exercise, is to parallelise the power_e function, addressing the issues of load balancing, chunking etc as discussed in class. You can start from the sketch of a parallel solution above, but you will need to convert the internal representation of GMP variables, of type mpq_t etc, into a serialised data structure you can send.As a side remark, you might want to compare this C+GMP implementation with this sequential Haskell implementation implementing the same algorithm. Note, that the core of the implementation are 2 lines of Haskell (or one very long line ;-) compared to ca 100 lines in C+GMP.
This improved Haskell version uses circular data structures to avoid repeated multiplications when computing e^z and n! reducing the overall runtime significantly.