Leaky Interpreters

Posted on October 23, 2014

In Definitional interpreters for higher-order programming languages (here), John Reynolds considers the definition of a simple applicative programming language by means of an interpreter written in a similar language.

An important and frequently used method of defining a programming language is to give an interpreter for the language that is written in a second, hopefully better understood language. We will call these two languages the defined and defining languages, respectively. Reynolds, 1998.

Reynolds demonstrates how, without diligence, the semantics of the interpreter's implementation language may leak into the interpretation. Leaky interpreters are not equivalent to the defining semantics, and may exhibit incorrect side effects or return wrong values. Take Haskell, a language with non-strict semantics. The GHC compiler uses lazy evaluation as an implementation of these non-strict semantics. The GHCi Haskell interpreter honours these non-strict semantics in the same way.

A language called Drippy

Say hello to a simple functional programming language called Drippy Calculator, or Drippy for short. User defined functions are supported. Integer is the only type. The two constructs are + which adds two integers, and poison() which crashes a program when evaluated. Here's a simple Drippy program:

int add(int a, int b)
{ return(a + b); }

int main()
{ int x,y,z;
  x = 2;
  y = 3;
  z = add(x, y);
  return(z); }

An interpretation of this program prints 5. Drippy is a strictly evaluated language and variables are immutable. The expressions in Drippy are constants, variables and function calls. Every function is a sequence of statements and ends with a return(..). A few notes on Drippy evaluation (shared with Reynold's notes on his applicative language):

  1. Statements are evaluated from top to bottom. After each statement evaluation the environment is modified, which binds variables to values. The evaluation of a constant always gives the same value, regardless of its environment. The value from evaluating variables and function calls depend on their environment's context.

  2. An applicative expression takes the form r0(r1,..,rn). The expressions r0,r1,..,rn are evaluted in the same environment to obtain values f, a1,..,an.

  3. If f is not a function of n arguments, then an error stop occurs.

  4. Otherwise, the function f is applied to the arguments a1,..,an, and if this application produces a result, then the result is the value of the application expression.

Drippy front end

Let's implement a front-end for Drippy in with a BNF grammar.

entrypoints Program ;
separator Function "" ;
terminator Statement ";" ;
separator Expr "," ;
separator Argument "," ;
terminator VarDec ";" ;
separator nonempty Ident "," ;

Prog. Program ::= [Function] Main ;
Fn. Function  ::= Type Ident Params Body ;
FnBody. Body  ::= "{" [VarDec] [Statement] Return "}" ;
Mn. Main      ::= "int" "main" "(" ")" Body ;

ExprConstant. Expr ::= Integer ;
ExprId.       Expr ::= Ident ;
ExprFnCall.   Expr ::= FnCall ;

Prms. Params  ::= "(" [Argument] ")" ;
Arg. Argument ::= Type Ident ;

Var. VarDec ::= Type [Ident];
TypeInt. Type  ::= "int" ;

StmtAssign.   Statement ::= Assignment ;
StmtLangCall. Statement ::= LanguagePoisonFn ;

AssignId. Assignment ::= Ident "=" Expr ;
FnCallNArity.     FnCall ::= Ident "(" [Expr] ")" ;
LangFnCall.       FnCall ::= LanguageFn ;
LangPoisonFnCall. FnCall ::= LanguagePoisonFn ;

LangAdd.    LanguageFn       ::= Expr "+" Expr ;
LangPoison. LanguagePoisonFn ::= "poison" "(" ")" ;
Rtn. Return ::= "return" "(" Expr ")" ";" ;

Let's generate a lexer and parser from this, using the BNFC software.

$ bnfc -m --haskell Drippy.cf
$ make

A leaky interpreter for Drippy

Given Drippy's strict semantics, the following program should crash once the third argument in the add(x, y, poison()) is evaluated in the asssignment to z.

int add(int a, int b, int c)
{ return(a + b); }

int main()
{ int x,y,z;
  x = 2;
  y = 3;
  z = add(x, y, poison());
  return(z); }

Here's a Haskell implementation of DrippyI, a Drippy interpreter.

module Main where

import AbsDrippy
import ParDrippy
import ErrM
import Data.Map (Map)
import qualified Data.Map as Map
import Data.Maybe
import System.Environment (getArgs)

main :: IO ()
main = do
  result <- evalDrippy =<< readFile . head =<< getArgs
  print result

data Context = Context
  { localVars :: Map Ident (Maybe Integer)
  , funs :: Map Ident Function }

evalDrippy :: String -> IO Integer
evalDrippy s = do
    let Ok prog = pProgram (myLexer s)
    interpret prog

initVars :: Map Ident (Maybe Integer) -> VarDec -> Map Ident (Maybe Integer)
initVars lclVars (Var _varType idents) =
    foldl (\cxt ident -> Map.insert ident Nothing cxt) lclVars idents

interpret :: Program -> IO Integer
interpret (Prog fns mainF) = evalMain mainF fns

evalStmt :: Context -> Statement -> Context
evalStmt cxt (StmtAssign (AssignId ident expr)) =
    cxt { localVars = Map.insert ident (Just (evalExpr cxt expr)) (localVars cxt) }
evalStmt cxt (StmtLangCall LangPoison) = error "crash: poisonous statement."

evalLangCall :: Context -> LanguageFn -> Integer
evalLangCall cxt (LangAdd e1 e2) = evalExpr cxt e1 + evalExpr cxt e2

evalFnCall :: Context -> FnCall -> Integer
evalFnCall cxt (FnCallNArity ident exprs) = evalUserFn cxt ident exprs
evalFnCall cxt (LangFnCall langFun) = evalLangCall cxt langFun
evalFnCall cxt (LangPoisonFnCall LangPoison) = error "crash: poisonous function argument."

evalExpr :: Context -> Expr -> Integer
evalExpr _cxt (ExprConstant constant) = constant
evalExpr cxt (ExprId ident) = fromJust (fromJust (Map.lookup ident (localVars cxt)))
evalExpr cxt (ExprFnCall fn) = evalFnCall cxt fn

evalUserFn :: Context -> Ident -> [Expr] -> Integer
evalUserFn cxt ident args =
    let Fn _ _ (Prms params) (FnBody vrs stmts (Rtn returnStmt))
            = fromJust $ Map.lookup ident (funs cxt)
        xs = zipWith (\(Arg _type paramId) arg -> (paramId,Just (evalExpr cxt arg))) params args
        paramVals = Map.union (Map.fromList xs) (localVars cxt)
        lclVars = foldl initVars paramVals vrs
        localCxt = foldl evalStmt (cxt { localVars = lclVars }) stmts
    in evalExpr localCxt returnStmt

evalMain :: Main -> [Function] -> IO Integer
evalMain (Mn (FnBody vrs stmts (Rtn returnStmt))) fns = do
  let vars = foldl initVars Map.empty vrs
      lclFuns = Map.fromList $ map (\f@(Fn _ ident _ _) -> (ident,f)) fns
      cxt = Context vars lclFuns
      cxt' = foldl evalStmt cxt stmts
  return (evalExpr cxt' returnStmt)

Leaking into Drippy

Let's interpret our poisonous program to make sure it does indeed crash...

$ ./DrippyI my_application.drippy 
5

Oh dear, the non-strict semantics of Haskell have leaked into DrippyI. The poisonous() argument in the add(..) call in the z assignment has not been evaluated. The leak is in evalUserFn, and the poisonous expression is the last element in the args argument. Although evalExpr is used in zipWith in the definition of xs, it is never applied. The function body of add provides the clue -- the third argument is never needed to evaluate the expression in the return(a+b) call.

int add(int a, int b, int c)
{ return(a + b); }

Haskell's call-by-need semantics have leaked in, the environment has not bound the c variable to any (normal form) value.

Fixing the leak

To recover the strict call-by-value semantics of Drippy, all function arguments are evaluated to normal form (requirement (2)) with deepseq. This is achieved in the definition of paramVals in a revised evalUserFn.

evalUserFn :: Context -> Ident -> [Expr] -> Integer
evalUserFn cxt ident args =
    let Fn _ _ (Prms params) (FnBody vrs stmts (Rtn returnStmt))
            = fromJust $ Map.lookup ident (funs cxt)
        xs = zipWith (\(Arg _type paramId) arg -> (paramId,Just (evalExpr cxt arg))) params args
        paramVals = map snd xs `deepseq` Map.union (Map.fromList xs) (localVars cxt) -- the fix
        lclVars = foldl initVars paramVals vrs
        localCxt = foldl evalStmt (cxt { localVars = lclVars }) stmts
    in if length params /= length args
       then error $ show ident ++ " expected " ++ show (length params) ++ " args"
                    ++ " but " ++ show (length args) ++ " passed."
       else evalExpr localCxt returnStmt

This also implements requirement (3) of Drippy's evaluation semantics, which says that the number of function arguments must be the same as the number of expected arguments as per the user's function definition. If they are not same, an error must be thrown. This is checked in length params =/ length args if condition.

Let's try evaluating our poisonous program again:

$ ./DrippyI my_application.drippy 
Drippy: crash: poisonous function argument.

Great, the laziness leaked is sealed. Now let's test that requirement (3) is satisfied by instead trying to assign a value to z with add(x, y, poison(), 7)...

$ ./DrippyI my_application.drippy
Drippy: Ident "add" expected 3 args but 4 passed.

Hurrah, DrippyI leaks less!