The Glasgow Haskell Compiler User's Guide, Version 4.06
Prev	Chapter 6. GHC Language Features	Next

Calling C directly from Haskell

GOOD ADVICE: Because this stuff is not Entirely Stable as far as names and things go, you would be well-advised to keep your C-callery corraled in a few modules, rather than sprinkled all over your code. It will then be quite easy to update later on.

`_ccall_` and `_casm_`: an introduction

The simplest way to use a simple C function

double fooC( FILE *in, char c, int i, double d, unsigned int u )

is to provide a Haskell wrapper:

fooH :: Char -> Int -> Double -> Word -> IO Double fooH c i d w = _ccall_ fooC (“stdin”::Addr) c i d w

The function fooH unbox all of its arguments, call the C function fooC and box the corresponding arguments.

One of the annoyances about _ccall_s is when the C types don't quite match the Haskell compiler's ideas. For this, the _casm_ variant may be just the ticket (NB: no chance of such code going through a native-code generator):

import Addr import CString oldGetEnv name = _casm_ “%r = getenv((char *) %0);” name >>= \ litstring -> return ( if (litstring == nullAddr) then Left ("Fail:oldGetEnv:"++name) else Right (unpackCString litstring) )

The first literal-literal argument to a _casm_ is like a printf format: %r is replaced with the “result,” %0–%n-1 are replaced with the 1st–nth arguments. As you can see above, it is an easy way to do simple C casting. Everything said about _ccall_ goes for _casm_ as well.

The use of _casm_ in your code does pose a problem to the compiler when it comes to generating an interface file for a freshly compiled module. Included in an interface file is the unfolding (if any) of a declaration. However, if a declaration's unfolding happens to contain a _casm_, its unfolding will not be emitted into the interface file even if it qualifies by all the other criteria. The reason why the compiler prevents this from happening is that unfolding _casm_s into an interface file unduly constrains how code that import your module have to be compiled. If an imported declaration is unfolded and it contains a _casm_, you now have to be using a compiler backend capable of dealing with it (i.e., the C compiler backend). If you are using the C compiler backend, the unfolded _casm_ may still cause you problems since the C code snippet it contains may mention CPP symbols that were in scope when compiling the original module are not when compiling the importing module.

If you're willing to put up with the drawbacks of doing cross-module inlining of C code (GHC - A Better C Compiler :-), the option -funfold-casms-in-hi-file will turn off the default behaviour.

Literal-literals

The literal-literal argument to _casm_ can be made use of separately from the _casm_ construct itself. Indeed, we've already used it:

fooH :: Char -> Int -> Double -> Word -> IO Double fooH c i d w = _ccall_ fooC (“stdin”::Addr) c i d w

The first argument that's passed to fooC is given as a literal-literal, that is, a literal chunk of C code that will be inserted into the generated .hc code at the right place.

A literal-literal is restricted to having a type that's an instance of the CCallable class, see the section called C-calling “gotchas” checklist for more information.

Notice that literal-literals are by their very nature unfriendly to native code generators, so exercise judgement about whether or not to make use of them in your code.

Using function headers

When generating C (using the -fvia-C directive), one can assist the C compiler in detecting type errors by using the -#include directive to provide .h files containing function headers.

For example,

typedef unsigned long *StgForeignObj; typedef long StgInt; void initialiseEFS (StgInt size); StgInt terminateEFS (void); StgForeignObj emptyEFS(void); StgForeignObj updateEFS (StgForeignObj a, StgInt i, StgInt x); StgInt lookupEFS (StgForeignObj a, StgInt i);

You can find appropriate definitions for StgInt, StgForeignObj, etc using gcc on your architecture by consulting ghc/includes/StgTypes.h. The following table summarises the relationship between Haskell types and C types.

C type name	Haskell Type
`StgChar`	`Char#`
`StgInt`	`Int#`
`StgWord`	`Word#`
`StgAddr`	`Addr#`
`StgFloat`	`Float#`
`StgDouble`	`Double#`
`StgArray`	`Array#`
`StgByteArray`	`ByteArray#`
`StgArray`	`MutableArray#`
`StgByteArray`	`MutableByteArray#`
`StgStablePtr`	`StablePtr#`
`StgForeignObj`	`ForeignObj#`

Note that this approach is only essential for returning floats (or if sizeof(int) != sizeof(int *) on your architecture) but is a Good Thing for anyone who cares about writing solid code. You're crazy not to do it.

Subverting automatic unboxing with “stable pointers”

The arguments of a _ccall_ automatically unboxed before the call. There are two reasons why this is usually the Right Thing to do:

C is a strict language: it would be excessively tedious to pass unevaluated arguments and require the C programmer to force their evaluation before using them.
Boxed values are stored on the Haskell heap and may be moved within the heap if a garbage collection occurs—that is, pointers to boxed objects are not stable.

It is possible to subvert the unboxing process by creating a “stable pointer” to a value and passing the stable pointer instead. For example, to pass/return an integer lazily to C functions storeC and fetchC might write:

storeH :: Int -> IO () storeH x = makeStablePtr x >>= \ stable_x -> _ccall_ storeC stable_x fetchH :: IO Int fetchH x = _ccall_ fetchC >>= \ stable_x -> deRefStablePtr stable_x >>= \ x -> freeStablePtr stable_x >> return x

The garbage collector will refrain from throwing a stable pointer away until you explicitly call one of the following from C or Haskell.

void freeStablePointer( StgStablePtr stablePtrToToss ) freeStablePtr :: StablePtr a -> IO ()

As with the use of free in C programs, GREAT CARE SHOULD BE EXERCISED to ensure these functions are called at the right time: too early and you get dangling references (and, if you're lucky, an error message from the runtime system); too late and you get space leaks.

And to force evaluation of the argument within fooC, one would call one of the following C functions (according to type of argument).

void performIO ( StgStablePtr stableIndex /* StablePtr s (IO ()) */ ); StgInt enterInt ( StgStablePtr stableIndex /* StablePtr s Int */ ); StgFloat enterFloat ( StgStablePtr stableIndex /* StablePtr s Float */ );

Nota Bene: _ccall_GC_ must be used if any of these functions are used.

Foreign objects: pointing outside the Haskell heap

There are two types that GHC programs can use to reference (heap-allocated) objects outside the Haskell world: Addr and ForeignObj.

If you use Addr, it is up to you to the programmer to arrange allocation and deallocation of the objects.

If you use ForeignObj, GHC's garbage collector will call upon the user-supplied finaliser function to free the object when the Haskell world no longer can access the object. (An object is associated with a finaliser function when the abstract Haskell type ForeignObj is created). The finaliser function is expressed in C, and is passed as argument the object:

void foreignFinaliser ( StgForeignObj fo )

when the Haskell world can no longer access the object. Since ForeignObjs only get released when a garbage collection occurs, we provide ways of triggering a garbage collection from within C and from within Haskell.

void GarbageCollect() performGC :: IO ()

More information on the programmers' interface to ForeignObj can be found in the library documentation.

Avoiding monads

The _ccall_ construct is part of the IO monad because 9 out of 10 uses will be to call imperative functions with side effects such as printf. Use of the monad ensures that these operations happen in a predictable order in spite of laziness and compiler optimisations.

To avoid having to be in the monad to call a C function, it is possible to use unsafePerformIO, which is available from the IOExts module. There are three situations where one might like to call a C function from outside the IO world:

Calling a function with no side-effects:
atan2d :: Double -> Double -> Double atan2d y x = unsafePerformIO (_ccall_ atan2d y x) sincosd :: Double -> (Double, Double) sincosd x = unsafePerformIO $ do da <- newDoubleArray (0, 1) _casm_ “sincosd( %0, &((double *)%1[0]), &((double *)%1[1]) );” x da s <- readDoubleArray da 0 c <- readDoubleArray da 1 return (s, c)

Calling a set of functions which have side-effects but which can be used in a purely functional manner. For example, an imperative implementation of a purely functional lookup-table might be accessed using the following functions.
empty :: EFS x update :: EFS x -> Int -> x -> EFS x lookup :: EFS a -> Int -> a empty = unsafePerformIO (_ccall_ emptyEFS) update a i x = unsafePerformIO $ makeStablePtr x >>= \ stable_x -> _ccall_ updateEFS a i stable_x lookup a i = unsafePerformIO $ _ccall_ lookupEFS a i >>= \ stable_x -> deRefStablePtr stable_x
You will almost always want to use ForeignObjs with this.

Calling a side-effecting function even though the results will be unpredictable. For example the trace function is defined by:
trace :: String -> a -> a trace string expr = unsafePerformIO ( ((_ccall_ PreTraceHook sTDERR{-msg-}):: IO ()) >> fputs sTDERR string >> ((_ccall_ PostTraceHook sTDERR{-msg-}):: IO ()) >> return expr ) where sTDERR = (“stderr” :: Addr)
(This kind of use is not highly recommended—it is only really useful in debugging code.)

C-calling “gotchas” checklist

And some advice, too.

For modules that use _ccall_s, etc., compile with -fvia-C. You don't have to, but you should. Also, use the -#include "prototypes.h" flag (hack) to inform the C compiler of the fully-prototyped types of all the C functions you call. (the section called Using function headers says more about this…) This scheme is the only way that you will get any typechecking of your _ccall_s. (It shouldn't be that way, but…). GHC will pass the flag -Wimplicit to gcc so that you'll get warnings if any _ccall_ed functions have no prototypes.
Try to avoid _ccall_s to C functions that take float arguments or return float results. Reason: if you do, you will become entangled in (ANSI?) C's rules for when arguments/results are promoted to doubles. It's a nightmare and just not worth it. Use doubles if possible. If you do use floats, check and re-check that the right thing is happening. Perhaps compile with -keep-hc-file-too and look at the intermediate C (.hc).

The compiler uses two non-standard type-classes when type-checking the arguments and results of _ccall_: the arguments (respectively result) of _ccall_ must be instances of the class CCallable (respectively CReturnable). Both classes may be imported from the module CCall, but this should only be necessary if you want to define a new instance. (Neither class defines any methods—their only function is to keep the type-checker happy.) The type checker must be able to figure out just which of the C-callable/returnable types is being used. If it can't, you have to add type signatures. For example,
f x = _ccall_ foo x
is not good enough, because the compiler can't work out what type x is, nor what type the _ccall_ returns. You have to write, say:
f :: Int -> IO Double f x = _ccall_ foo x
This table summarises the standard instances of these classes.

Type	CCallable	CReturnable	Which is probably…
`Char`	Yes	Yes	`unsigned char`
`Int`	Yes	Yes	`long int`
`Word`	Yes	Yes	`unsigned long int`
`Addr`	Yes	Yes	`void *`
`Float`	Yes	Yes	`float`
`Double`	Yes	Yes	`double`
`()`	No	Yes	`void`
`[Char]`	Yes	No	`char *` (null-terminated)
`Array`	Yes	No	`unsigned long *`
`ByteArray`	Yes	No	`unsigned long *`
`MutableArray`	Yes	No	`unsigned long *`
`MutableByteArray`	Yes	No	`unsigned long *`
`State`	Yes	Yes	nothing!
`StablePtr`	Yes	Yes	`unsigned long *`
`ForeignObjs`	Yes	Yes	see later

Actually, the Word type is defined as being the same size as a pointer on the target architecture, which is probably unsigned long int. The brave and careful programmer can add their own instances of these classes for the following types:

A boxed-primitive type may be made an instance of both CCallable and CReturnable. A boxed primitive type is any data type with a single unary constructor with a single primitive argument. For example, the following are all boxed primitive types:
Int Double data XDisplay = XDisplay Addr# data EFS a = EFS# ForeignObj#
instance CCallable (EFS a) instance CReturnable (EFS a)
Any datatype with a single nullary constructor may be made an instance of CReturnable. For example:
data MyVoid = MyVoid instance CReturnable MyVoid
As at version 2.09, String (i.e., [Char]) is still not a CReturnable type. Also, the now-builtin type PackedString is neither CCallable nor CReturnable. (But there are functions in the PackedString interface to let you get at the necessary bits…)

The code-generator will complain if you attempt to use %r in a _casm_ whose result type is IO (); or if you don't use %r precisely once for any other result type. These messages are supposed to be helpful and catch bugs—please tell us if they wreck your life.
If you call out to C code which may trigger the Haskell garbage collector or create new threads (examples of this later…), then you must use the _ccall_GC_ or _casm_GC_ variant of C-calls. (This does not work with the native code generator—use -fvia-C.) This stuff is hairy with a capital H!

Prev	Home	Next
Primitive arrays, mutable and otherwise	Up	Multi-parameter type classes