Calling C directly from Haskell

GOOD ADVICE: Because this stuff is not Entirely Stable as far as names and things go, you would be well-advised to keep your C-callery corraled in a few modules, rather than sprinkled all over your code. It will then be quite easy to update later on.

_ccall_ and _casm_: an introduction

The simplest way to use a simple C function

double fooC( FILE *in, char c, int i, double d, unsigned int u )

is to provide a Haskell wrapper:

fooH :: Char -> Int -> Double -> Word -> IO Double
fooH c i d w = _ccall_ fooC (“stdin”::Addr) c i d w

The function fooH unbox all of its arguments, call the C function fooC and box the corresponding arguments.

One of the annoyances about _ccall_s is when the C types don't quite match the Haskell compiler's ideas. For this, the _casm_ variant may be just the ticket (NB: no chance of such code going through a native-code generator):

import Addr
import CString

oldGetEnv name
  = _casm_ “%r = getenv((char *) %0);” name >>= \ litstring ->
    return (
        if (litstring == nullAddr) then
            Left ("Fail:oldGetEnv:"++name)
        else
            Right (unpackCString litstring)
    )

The first literal-literal argument to a _casm_ is like a printf format: %r is replaced with the “result,” %0%n-1 are replaced with the 1st–nth arguments. As you can see above, it is an easy way to do simple C casting. Everything said about _ccall_ goes for _casm_ as well.

The use of _casm_ in your code does pose a problem to the compiler when it comes to generating an interface file for a freshly compiled module. Included in an interface file is the unfolding (if any) of a declaration. However, if a declaration's unfolding happens to contain a _casm_, its unfolding will not be emitted into the interface file even if it qualifies by all the other criteria. The reason why the compiler prevents this from happening is that unfolding _casm_s into an interface file unduly constrains how code that import your module have to be compiled. If an imported declaration is unfolded and it contains a _casm_, you now have to be using a compiler backend capable of dealing with it (i.e., the C compiler backend). If you are using the C compiler backend, the unfolded _casm_ may still cause you problems since the C code snippet it contains may mention CPP symbols that were in scope when compiling the original module are not when compiling the importing module.

If you're willing to put up with the drawbacks of doing cross-module inlining of C code (GHC - A Better C Compiler :-), the option -funfold-casms-in-hi-file will turn off the default behaviour.

Literal-literals

The literal-literal argument to _casm_ can be made use of separately from the _casm_ construct itself. Indeed, we've already used it:

fooH :: Char -> Int -> Double -> Word -> IO Double
fooH c i d w = _ccall_ fooC (“stdin”::Addr) c i d w

The first argument that's passed to fooC is given as a literal-literal, that is, a literal chunk of C code that will be inserted into the generated .hc code at the right place.

A literal-literal is restricted to having a type that's an instance of the CCallable class, see the section called C-calling “gotchas” checklist for more information.

Notice that literal-literals are by their very nature unfriendly to native code generators, so exercise judgement about whether or not to make use of them in your code.

Using function headers

When generating C (using the -fvia-C directive), one can assist the C compiler in detecting type errors by using the -#include directive to provide .h files containing function headers.

For example,

typedef unsigned long *StgForeignObj;
typedef long StgInt;

void          initialiseEFS (StgInt size);
StgInt        terminateEFS (void);
StgForeignObj emptyEFS(void);
StgForeignObj updateEFS (StgForeignObj a, StgInt i, StgInt x);
StgInt        lookupEFS (StgForeignObj a, StgInt i);

You can find appropriate definitions for StgInt, StgForeignObj, etc using gcc on your architecture by consulting ghc/includes/StgTypes.h. The following table summarises the relationship between Haskell types and C types.

C type name Haskell Type
StgChar Char#
StgInt Int#
StgWord Word#
StgAddr Addr#
StgFloat Float#
StgDouble Double#
StgArray Array#
StgByteArray ByteArray#
StgArray MutableArray#
StgByteArray MutableByteArray#
StgStablePtr StablePtr#
StgForeignObj ForeignObj#

Note that this approach is only essential for returning floats (or if sizeof(int) != sizeof(int *) on your architecture) but is a Good Thing for anyone who cares about writing solid code. You're crazy not to do it.

Subverting automatic unboxing with “stable pointers”

The arguments of a _ccall_ automatically unboxed before the call. There are two reasons why this is usually the Right Thing to do:

It is possible to subvert the unboxing process by creating a “stable pointer” to a value and passing the stable pointer instead. For example, to pass/return an integer lazily to C functions storeC and fetchC might write:

storeH :: Int -> IO ()
storeH x = makeStablePtr x              >>= \ stable_x ->
           _ccall_ storeC stable_x

fetchH :: IO Int
fetchH x = _ccall_ fetchC               >>= \ stable_x ->
           deRefStablePtr stable_x      >>= \ x ->
           freeStablePtr stable_x       >>
           return x

The garbage collector will refrain from throwing a stable pointer away until you explicitly call one of the following from C or Haskell.

void freeStablePointer( StgStablePtr stablePtrToToss )
freeStablePtr :: StablePtr a -> IO ()

As with the use of free in C programs, GREAT CARE SHOULD BE EXERCISED to ensure these functions are called at the right time: too early and you get dangling references (and, if you're lucky, an error message from the runtime system); too late and you get space leaks.

And to force evaluation of the argument within fooC, one would call one of the following C functions (according to type of argument).

void     performIO  ( StgStablePtr stableIndex /* StablePtr s (IO ()) */ );
StgInt   enterInt   ( StgStablePtr stableIndex /* StablePtr s Int */ );
StgFloat enterFloat ( StgStablePtr stableIndex /* StablePtr s Float */ );

Nota Bene: _ccall_GC_ must be used if any of these functions are used.

Foreign objects: pointing outside the Haskell heap

There are two types that GHC programs can use to reference (heap-allocated) objects outside the Haskell world: Addr and ForeignObj.

If you use Addr, it is up to you to the programmer to arrange allocation and deallocation of the objects.

If you use ForeignObj, GHC's garbage collector will call upon the user-supplied finaliser function to free the object when the Haskell world no longer can access the object. (An object is associated with a finaliser function when the abstract Haskell type ForeignObj is created). The finaliser function is expressed in C, and is passed as argument the object:

void foreignFinaliser ( StgForeignObj fo )

when the Haskell world can no longer access the object. Since ForeignObjs only get released when a garbage collection occurs, we provide ways of triggering a garbage collection from within C and from within Haskell.

void GarbageCollect()
performGC :: IO ()

More information on the programmers' interface to ForeignObj can be found in the library documentation.

Avoiding monads

The _ccall_ construct is part of the IO monad because 9 out of 10 uses will be to call imperative functions with side effects such as printf. Use of the monad ensures that these operations happen in a predictable order in spite of laziness and compiler optimisations.

To avoid having to be in the monad to call a C function, it is possible to use unsafePerformIO, which is available from the IOExts module. There are three situations where one might like to call a C function from outside the IO world:

C-calling “gotchas” checklist

And some advice, too.