In addition to the list below, you may also want to browse Heriot-Watt's list of my research outputs, the technical report archive of the Boston University Computer Science Department or the papers web page of the Church Project.
NOTE: If you want to compare this listing with databases on the web such as CiteSeer or DBLP, please be aware that my name as a paper author sometimes appears as “Joe B. Wells” or “Joe Wells” on the actual paper. Despite my best efforts to make my name as an author uniformly appear only as “J. B. Wells”, other people (coauthors or editors who mistakenly think they are being helpful) have sometimes changed it. Also, in addition to making the errors already mentioned, people writing bibliographies sometimes omit the initial of my 2nd given name, so I am sometimes listed as just “J. Wells”.
Set theory has long been viewed as a foundation of mathematics, is pervasive in mathematical culture, and is explicitly used by much written mathematics. Because arrangements of sets can represent a vast multitude of mathematical objects, in most set theories every object is a set. This causes confusion and adds difficulties to formalising mathematics in set theory. We wish to have set theory's features while also having many mathematical objects not be sets. A generalized set theory (GST) is a theory that has pure sets and may also have non-sets that can have internal structure and impure sets that mix sets and non-sets. This paper provides a GST-building framework. We show example GSTs that have sets and also (1) non-set ordered pairs, (2) non-set natural numbers, (3) a non-set exception object that can not be inside another object, and (4) modular combinations of these features. We show how to axiomatize GSTs and how to build models for GSTs in other GSTs.
Much mathematical writing exists that is, explicitly or implicitly, based on set theory, often Zermelo-Fraenkel set theory (ZF) or one of its variants. In ZF, the domain of discourse contains only sets, and hence every mathematical object must be a set. Consequently, in ZF with the usual encoding of an ordered pair ⟨a,b⟩, formulas like {a}∈⟨a,b⟩ have truth values, and operations like P(⟨a,b⟩) have results that are sets. Such `accidental theorems' do not match how people think about the mathematics and also cause practical difficulties when using set theory in machine-assisted theorem proving. In contrast, in a number of proof assistants, mathematical objects and concepts can be built of type-theoretic stuff so that many mathematical objects can be, in essence, terms of an extended typed lambda -calculus. However, dilemmas and frustration arise when formalizing mathematics in type theory.
Motivated by problems of formalizing mathematics with (1) purely set-theoretic and (2) type-theoretic approaches, we explore an option with much of the flexibility of set theory and some of the useful features of type theory. We present ZFP: a modification of ZF that has ordered pairs as primitive, non-set objects. ZFP has a more natural and abstract axiomatic definition of ordered pairs free of any notion of representation. This paper presents axioms for ZFP, and a proof in ZF (machine-checked in Isabelle/ZF) of the existence of a model for ZFP, which implies that ZFP is consistent if ZF is. We discuss the approach used to add this abstraction barrier to ZF.
The famous BNF grammar notation, as introduced and used in the Algol 60 report, was subsequently followed by numerous notational variants (EBNF, ABNF, RBNF, etc.), and later by a new formal ``grammars'' metalanguage used for discussing structured objects in Computer Science and Mathematical Logic. We refer to this latter offspring of BNF as MBNF (Math-BNF), and to aspects common to MBNF, BNF, and its notational variants as BNF-style. MBNF is sometimes called ``abstract syntax'', but we avoid that name because MBNF is in fact a concrete form and there is a more abstract form. What all BNF-style notations share is the use of production rules like (P) below which state that ``every instance of ◦_{i} for i ∈ 1, ..., n is also an instance of •'':
• ::= ◦_{1} | cdots | ◦_{n} eqno (P)
However, MBNF is distinct from all variants of BNF in the entities and operations it allows. Instead of strings, MBNF builds arrangements of symbols that we call math-text and allows ``syntax'' to be built by interleaving MBNF production rules and other mathematical definitions that can contain chunks of math-text. The differences between BNF (or its variant forms) and MBNF have not been clearly presented before. (There is also no clear definition of MBNF anywhere but this is beyond the scope of this paper.)
This paper reviews BNF and some of its variant forms as well as MBNF, highlighting the differences between BNF (including its variant forms) and MBNF. We show via a series of detailed examples that MBNF, while superficially similar to BNF, differs substantially from BNF and its variants in how it is written, the operations it allows, and the sets of entities it defines. We also argue that the entities MBNF handles may extend far outside the scope of rewriting relations on strings and syntax trees derived from such rewriting sequences, which are often used to express the meaning of BNF and its notational variants.
It is becoming increasingly important to verify safety and security of AI applications. While declarative languages (of the kind found in automated planners and model checkers) are traditionally used for verifying AI systems, a big challenge is to design methods that generate verified executable programs. A good example of such a “verification to implementation” cycle is given by automated planning languages like PDDL, where plans are found via a model search in a declarative language, but then interpreted or compiled into executable code in an imperative language. In this paper, we show that this method can itself be verified. We present a formal framework and a prototype Agda implementation that represent PDDL plans as executable functions that inhabit types that are given by formulae describing planning problems. By exploiting the well-known Curry-Howard correspondence, type-checking then automatically ensures that the generated program corresponds precisely to the specification of the planning problem.
Compilers for languages with type inference algorithms often produce confusing type error messages and give a single error location which is sometimes far away from the real location of the error. Attempts at solving this problem often (1) fail to include the multiple program points which make up the type error; (2) report tree fragments which do not correspond to any place in the user program; and (3) give incorrect type information/diagnosis which can be highly confusing. We present Skalpel, a type error slicing tool which solves these problems by giving the programmer all and only the information involved with a type error to significantly aid in diagnosis and repair of type errors. Skalpel relies on a simple and general constraint system, a sophisticated constraint generator which is linear in program size, and a constraint solver which is terminating. Skalpel’s constraint system can elegantly and efficiently handle intricate features such as SML’s open. We also show that the Skalpel tool is general enough to deal not only with one source code file and one single error, but highlights all and only the possible locations of the error(s) in all affected files and produces all the culprit multiple program slices.
There are two versions of type assignment in the lambda -calculus: Church-style, in which the type of each variable is fixed, and Curry-style (also called “domain free”), in which it is not. As an example, in Church-style typing, lambda x:A.x is the identity function on type A, and it has type A to A but not B to B for a type B different from A. In Curry-style typing, lambda x.x is a general identity function with type C to C for every type C. In this paper, we will show how to interpret in a Curry-style system every Pure Type System (PTS) in the Church-style without losing any typing information. We will also prove a kind of conservative extension result for this interpretation, a result which implies that for most consistent PTSs of the Church- style, the corresponding Curry-style system is consistent. We will then show how to interpret in a system of the Church-style (a modified PTS, stronger than a PTS) every PTS-like system in the Curry style.
The introduction of a general definition of function was key to Frege’s formalisation of logic. Self-application of functions was at the heart of Russell’s paradox. Russell introduced type theory to control the application of functions and avoid the paradox. Since, different type systems have been introduced, each allowing different functional power. Most of these systems use the two binders lambda and Pi to distinguish between functions and types, and allow beta -reduction but not Pi -reduction. That is, ( pi x:A.B)C to B[x:=C] is only allowed when pi is lambda but not when it is Pi . Since Pi -reduction is not allowed, these systems cannot allow unreduced typing. Hence types do not have the same instantiation right as functions. In particular, when b has type B, the type of ( lambda x:A.b)C is immediately given as B[x:=C] instead of keeping it as ( Pi x:A.B)C to be Pi -reduced to B[x:=C] later. Extensions of modern type systems with both beta - and Pi -reduction and unreduced typing have appeared in our previous work and lead naturally to unifying the lambda and Pi abstractions. The Automath system combined the unification of binders lambda and Pi with beta - and Pi -reduction together with a type inclusion rule that allows the different expressions that define the same term to share the same type. In this paper we extend the cube of 8 influential type systems with the Automath notion of type inclusion and study its properties.
Mathematical texts can be computerised in many ways that capture differing amounts of the mathematical meaning. At one end, there is document imag- ing, which captures the arrangement of black marks on paper, while at the other end there are proof assistants (e.g., Mizar, Isabelle, Coq, etc.), which capture the full mathematical meaning and have proofs expressed in a formal foundation of mathematics. In between, there are computer typesetting systems (e.g., L A TEX and Presentation MathML) and semantically oriented systems (e.g., Content MathML, OpenMath, OMDoc, etc.). In this paper we advocate a style of computerisation of mathematical texts which is flexible enough to connect the different approaches to computerisation, which allows various degrees of formalisation, and which is compatible with different logical frameworks (e.g., set theory, category theory, type theory, etc.) and proof systems. The basic idea is to allow a man-machine collaboration which weaves human input with machine computation at every step in the way. We propose that the huge step from informal mathematics to fully formalised mathematics be divided into smaller steps, each of which is a fully developed method in which human input is minimal.
Expansion is an operation on typings (pairs of type environments and result types) in type systems for the lambda -calculus. Expansion was originally introduced for calculating possible typings of a lambda -term in systems with intersection types.
This paper aims to clarify expansion and make it more accessible to a wider community by isolating the pure notion of expansion on its own, independent of type systems and types, and thereby make it easier for non-specialists to obtain type inference with flexible precision by making use of theory and techniques that were originally developed for intersection types.
We redefine expansion as a simple algebra on terms with variables, substitutions, composition, and miscellaneous constructors such that the algebra satisfies 8 simple axioms and axiom schemas: the 3 standard axioms of a monoid, 4 standard axioms or axiom schemas of substitutions (including one that corresponds to the usual ``substitution lemma'' that composes substitutions), and 1 very simple axiom schema for expansion itself. Many of the results on the algebra have also been formally checked with the Coq proof assistant.
We then take System E, a lambda -calculus type system with intersection types and expansion variables, and redefine it using the expansion algebra, thereby demonstrating how a single uniform notion of expansion can operate on both typings and proofs. Because we present a simplified version of System E omitting many features, this may be independently useful for those seeking an easier-to-comprehend version.
Expansion is a crucial operation for calculating principal typings in intersection type systems. Because the early definitions of expansion were complicated, E-variables were introduced in order to make the calculations easier to mechanise and reason about. Recently, E-variables have been further simplified and generalised to also allow calculating other type operators than just intersection. There has been much work on semantics for type systems with intersection types, but none whatsoever before our work, on type systems with E-variables. In this paper we expose the challenges of building a semantics for E-variables and we provide a novel solution. Because it is …unclear how to devise a space of meanings for E-variables, we develop instead a space of meanings for types that is hierarchical. First, we index each type with a natural number and show that although this intuitively captures the use of E-variables, it is difficult to index the universal type ω with this hierarchy and it is not possible to obtain completeness of the semantics if more than one E-variable is used. We then move to a more complex semantics where each type is associated with a list of natural numbers and establish that both ω and an arbitrary number of E-variables can be represented without losing any of the desirable properties of a realisability semantics.
Reducibility, despite being quite mysterious and inflexible, has been used to prove a number of properties of the λ-calculus and is well known to offer general proofs which can be applied to a number of instantiations. In this paper, we look at two related but different results in λ-calculi with intersection types. 1. We show that one such result (which aims at giving reducibility proofs of Church-Rosser, standardisation and weak normalisation for the untyped λ-calculus) faces serious problems which break the reducibility method. We provide a proposal to partially repair the method. 2. We consider a second result whose purpose …is to use reducibility for typed terms in order to show the Church-Rosser of β-developments for the untyped terms (and hence the Church-Rosser of β-reduction). In this second result, strong normalisation is not needed. We extend the second result to encompass both βI- and βη-reduction rather than simply β-reduction.
Expansion is an operation on typings (i.e., pairs of typing environments and result types) defined originally in type systems for the lambda -calculus with intersection types in order to obtain principal (i.e., most informative, strongest) typings. In a type inference scenario, expansion allows postponing choices for whether and how to use non-syntax-driven typing rules (e.g., intersection introduction) until enough information has been gathered to make the right decision. Furthermore, these choices can be equivalent to inserting uses of such typing rules at deeply nested positions in a typing derivation, without needing to actually inspect or modify (or even have) the typing derivation. Expansion has in recent years become simpler due to the use of expansion variables (e.g., in System E).
This paper extends expansion and expansion variables to systems with forall -quantifiers. We present System Fs, an extension of System F with expansion, and prove its main properties. This system turns type inference into a constraint solving problem; this could be helpful to design a modular type inference algorithm for System F types in the future.
Shape types are a general concept of process types which work for many process calculi. We extend the previously published POLY star system of shape types to support name restriction. We evalu- ate the expressiveness of the extended system by showing that shape types are more expressive than an implicitly typed pi -calculus and an explicitly typed Mobile Ambients. We demonstrate that the extended system makes it easier to enjoy advantages of shape types which include polymorphism, principal typings, and a type inference implementation.
This paper formalizes and proves correct a compilation scheme for mutually-recursive definitions in call-by-value functional languages. This scheme supports a wider range of recursive definitions than previous methods. We formalize our technique as a translation scheme to a lambda-calculus featuring in-place update of memory blocks, and prove the translation to be correct.
Meta star is a generic process calculus that can be instantiated by supplying rewriting rules defining an operational semantics to make numerous process calculi such as the pi -calculus, the system of Mobile Ambients, and many of their variants. Poly star is a generic type system that makes a sound type system with principal types and a type inference algorithm for any instantiation of Meta star . Poly star provides a generic notion of shape types which describe behavior of processes by a direct description of allowed syntactic configurations.
This paper evaluates the expressiveness of generic process shape types by comparing Poly star with three quite dissimilar type/static analysis systems in the literature. The first comparison is with Turner's type system for the pi -calculus without type annotations (which is essentially Milner's system of sorts). The second comparison is with an explicitly typed version of Mobile Ambients by Cardelli and Gordon. Finally, the third comparison is with a static analysis for BioAmbients developed by Nielson, Nielson, Priami, and Rosa. We instantiate Meta star to the process calculi in question and compare expressiveness of Poly star shape types with the predicates provided by the three systems. We show that Poly star shape types can express much more precise information than the previous type/static analysis systems and can also express essentially the same information as the previous systems.
To do the comparisons, we needed to alter how Meta star handles alpha -conversion in order to develop a new method for handling name restriction in Poly star .
Meta star is a generic process calculus that can be instantiated by supplying rewriting rules defining an operational semantics to make numerous process calculi such as the pi -calculus, the system of Mobile Ambients, and many of their variants. Poly star is a generic type system that makes a sound type system with principal types and a type inference algorithm for any instantiation of Meta star .
This paper evaluates the expressiveness of Poly star by comparing it with two quite dissimilar static analysis systems in the literature. One comparison is with a typed version of Mobile Ambients defined by Cardelli and Gordon; we name this system TMA. The other comparison is with a flow analysis for BioAmbients by Nielson, Nielson, Priami, and Rosa; we name this flow analysis SABA. We instantiate Meta star and Poly star to the two process calculi (TMA and BioAmbients) and compare the types provided by Poly star with the predicates provided by the previous static analysis systems (the typing judgments of TMA and the flow analysis results of SABA). We show that Poly star types can express essentially the same information as the previous static analysis systems and can also express more precise information.
To do the comparisons, we needed to alter how Meta star handles alpha -conversion in order to develop a new method for handling name restriction in Poly star .
SML (a higher-order function-oriented imperative programming language) depends on automated inference of sophisticated type information. Existing implementations have confusing type error messages. Type error slicing gives the programmer more helpful type error information: (1) It identifies all program points that contribute to a type error rather than blaming just one point. (2) It exhibits a slice of the original program's syntax instead of showing an internal representation of some program subtree which may have been substantially transformed. (3) It avoids showing internal details of inferred types which were not written by the programmer. A type error slice contains all information needed to understand the error, and avoids irrelevant information.
Similar to other previous work for better handling of type errors, type error slicing was initially developed for a very tiny toy language: the formal work was for a core barely larger than the lambda-calculus, and the implementation had only a few additional primitives. Extending type error slicing to a full programming language is extremely challenging, and to do this for SML we needed a number of innovations. Some issues would be faced for any language, and some are SML-specific but representative of the complexity of language-specific issues likely to be faced by other languages. As an example of a generic issue, the known ways of generating constraints that avoid an explosion in the number of constraints conflict with the need to exclude program parts that are irrelevant to errors, so we developed new ways of handling constraints. As an example of a SML-specific issue, the syntactic class of identifiers in SML depends on earlier declarations, but passing an environment as input makes it more difficult to ensure that error slices cover the right amount of program text, so we developed context sensitive constraints that avoid needing to know identifier syntactic class. Other issues include such things as how to indicate (ir)relevance of various program details, handling also context-sensitive syntax errors, and integration with a program editor. A bonus is that our methods in effect also provide a compositional (bottom up) type inference algorithm for the SML language, which may be useful for reasons other than type error explanation.
Expansion is an operation on typings (pairs of type environments and result types) in type systems for the lambda -calculus. Expansion was originally introduced for calculating possible typings of a term in systems with intersection types. Unfortunately, definitions of expansion (even the most modern ones) have been difficult for outsiders to absorb. This paper aims to clarify expansion and make it more accessible to non-specialists by isolating the pure notion of expansion on its own, independent of type systems and types. We show how expansion can be seen as a simple algebra on terms with variables, substitutions, composition, and miscellaneous constructors such that the algebra satisfies 8 simple axioms and axiom schemas: the 3 standard axioms of a monoid, 4 standard axioms or axiom schemas of substitutions (including one that corresponds to the usual ``substitution lemma''), and 1 axiom schema for expansion itself. This presentation should help make more accessible to a wider community theory and techniques involving intersection types and type inference with flexible precision.
Mathematical texts can be computerized in many ways that capture differing amounts of the mathematical meaning. At one end, there is document imaging, which captures the arrangement of black marks on paper, while at the other end there are proof assistants (e.g., Mizar, Isabelle, Coq, etc.), which capture the full mathematical meaning and have proofs expressed in a formal foundation of mathematics. In between, there are computer typesetting systems (e.g., LaTeX and Presentation MathML) and semantically oriented systems (e.g., Content MathML, OpenMath, OMDoc, etc.).
The MathLang project was initiated in 2000 by Fairouz Kamareddine and Joe Wells with the aim of developing an approach for computerizing mathematical texts and knowledge which is flexible enough to connect the different approaches to computerization, which allows various degrees of formalization, and which is compatible with different logical frameworks (e.g., set theory, category theory, type theory, etc.) and proof systems. The approach is embodied in a computer representation, which we call MathLang, and associated software tools, which are being developed by ongoing work. Three Ph.D. students (Manuel Maarek (2002/2007), Krzysztof Retel (since 2004), and Robert Lamar (since 2006)) and over a dozen master's degree and undergraduate students have worked on MathLang. The project's progress and design choices are driven by the needs for computerizing real representative mathematical texts chosen from various branches of mathematics.
Currently, MathLang supports entry of mathematical text either in an XML format or using the TeX _{MACS} editor. Methods are provided for adding, checking, and displaying various information aspects. One aspect is a kind of weak type system that assigns categories (term, statement, noun (class), adjective (class modifier), etc.) to parts of the text, deals with binding names to meanings, and checks that a kind of grammatical sense is maintained. Another aspect allows weaving together mathematical meaning and visual presentation and can associate natural language text with its mathematical meaning. Another aspect allows identifying chunks of text, marking their roles (theorem, definition, explanation, example, section, etc.), and indicating relationships between the chunks (A uses B, A contradicts B, A follows from B, etc.). Software tool support can use this aspect to check and explain the overall logical structure of a text. Further aspects are being designed to allow adding additional formality to a text such as proof structure and details of how a human-readable proof is encoded into a fully formalized version (so far this has only been done for Mizar and started for Isabelle). A number of mathematical texts have been computerized, helping with the development of these aspects, and indicating what additional work is needed for the future. This paper surveys the past and future work of the MathLang project.
Methods for computerised mathematics have found little appeal among mathematicians because they call for additional skills which are not available to the typical mathematician. We herein propose to reconcile computerised mathematics to mathematicians by restoring natural language as the primary medium for mathematical authoring. Our method associates portions of text with grammatical argumentation roles and computerises the informal mathematical style of the mathematician. Typical abbreviations like the aggregation of equations a=b>c, are not usually accepted as input to computerised languages. We propose specific annotations to explicate the morphology of such natural language style, to accept input in this style, and to expand this input in the computer to obtain the intended representation (i.e., a=b and b>c). We have named this method syntax souring in contrast to the usual syntax sugaring. All results have been implemented in a prototype editor developed on top of TeX _{MACS} as a GUI for the core grammatical aspect of MathLang, a framework developed by the ULTRA group to computerise and formalise mathematics.
We explain in this paper the gradual computerisation process of an ordinary mathematical text into more formal versions ending with a fully formalised Mizar text. The process is part of the MathLang-Mizar project and is divided into a number of steps (called aspects). The first three aspects (CGa, TSa, and DRa) are the same for any MathLang-TP project where TP is any proof checker (e.g., Mizar, Coq, Isabelle, etc). These first three aspects are theoretically formalised and implemented and provide the mathematician and/or TP user with useful tools/automation. Using TSa, the mathematician edits his mathematical text just as he would use LaTeX , but at the same time he sees the mathematical text as it appears on his paper. TSa also gives the mathematician easy editing facilities to help assign to parts of the text, grammatical and mathematical roles and to relate different parts through a number of mathematical, rhetorical and structural relations. MathLang would then automatically produce CGa and DRa versions of the text, checks its grammatical correctness and produce a dependency graph between the parts of the text. At this stage, work of the first three aspects is complete and the computerised versions of the text, as well as the dependency graph are ready to be processed further. In the MathLang-Mizar project, we create from the dependency graph, the roles of the nodes of the graph, and the (preamble) of the CGa encoding, a Mizar Formal Proof Sketch (FPS) skeleton. The stage at which the text is transformed into a Mizar FPS skeleton has only been explained through transformation hints, and is yet to be theoretically developed into an aspect that can be implemented and developed into a partially-automated tool. Finally, the Mizar FPS skeleton of the text is transformed (currently by hand as any Mizar expert would do and without any computerised tools) into a correct Mizar FPS and then into a fully formalised Mizar version of the text. Although we have tested our process on a number of examples, we chose to illustrate it in this paper using Barendregts's version of the proof of Pythagoras' theorem. We show how this example text is transformed into its fully formalised Mizar version by passing through the first three computerised aspects and the transformation hints to obtain Mizar FPS version. This version is then developed by hand into a fully formalised Mizar version.
There are many styles for the narrative structure of a mathematical document. Each mathematician has its own conventions and traditions about labeling portions of texts (e.g., chapter, section, theorem or proof) and identifying statements according to their logical importance (e.g., theorem is more important than lemma). Such narrative/structuring labels guide the reader's navigation of the text and form the key components in the reasoning structure of the theory reflected in the text.
We present in this paper a method to computerise the narrative structure of a text which includes the relationships between labeled text entities. These labels and relations are input by the user on top of their natural language text. This narrative structure is then automatically analysed to check its consistency. This automatic analysis consists of two phases: (1) checking the correct usage of labels and relations (i.e., that a ``proof'' justifies a ``theorem'' but cannot justify an ``axiom'') and (2) checking that the logical precedences in the document are self-consistent.
The development of this method was driven by the experience of computerising a number of mathematical documents (covering different authoring styles). We illustrate how such computerised narrative structure could be used for further manipulations, i.e. to build a skeleton of a formal document in a formal system like Mizar, Coq, or Isabelle.
In only few decades, computers have changed the way we approach documents. Throughout history, mathematicians and philosophers had clarified the relationship between mathematical thoughts and their textual and symbolic representations. We discuss here the consequences of computer-based formalisation for mathematical authoring habits and we present an overview of our approach for computerising mathematical texts.
Computerizing mathematical texts to allow software access to some or all of the texts' semantic content is a long and tedious process that currently requires much expertise. We believe it is useful to support computerization that adds some structural and semantic information, but does not require jumping directly from the word-processing level (e.g., LaTeX ) to full formalization (e.g., Mizar, Coq, etc.). Although some existing mathematical languages are aimed at this middle ground (e.g., MathML, OpenMath, OMDoc), we believe they miss features needed to capture some important aspects of mathematical texts, especially the portion written with natural language. For this reason, we have been developing MathLang, a language for representing mathematical texts that has weak type checking and support for the special mathematical use of natural language. MathLang is currently aimed at only capturing the essential grammatical and binding structure of mathematical text without requiring full formalization.
The development of MathLang is directly driven by experience encoding real mathematical texts. Based on this experience, this paper presents the changes that yield our latest version of MathLang. We have restructured and simplified the core of the language, replaced our old notion of ``context'' by a new system of blocks and local scoping, and made other changes. Furthermore, we have enhanced our support for the mathematical use of nouns and adjectives with object-oriented features so that nouns now correspond to classes, and adjectives to mixins.
We present a procedure to infer a typing for an arbitrary lambda -term M in an intersection-type system that translates into exactly the call-by-name (resp., call-by-value) evaluation of M. Our framework is the recently developed System E which augments intersection types with expansion variables. The inferred typing for M is obtained by setting up a unification problem involving both type variables and expansion variables, which we solve with a confluent rewrite system. The inference procedure is compositional in the sense that typings for different program components can be inferred in any order, and without knowledge of the definition of other program components. Using expansion variables lets us achieve a compositional inference procedure easily. Termination of the procedure is generally undecidable. The procedure terminates and returns a typing iff the input M is normalizing according to call-by-name (resp., call-by-value). The inferred typing is exact in the sense that the exact call-by-name (resp., call-by-value) behaviour of M can be obtained by a (polynomial) transformation of the typing. The inferred typing is also principal in the sense that any other typing that translates the call-by-name (resp., call-by-value) evaluation of M can be obtained from the inferred typing for M using a substitution-based transformation.
The operation of expansion on typings was introduced at the end of the 1970s by Coppo, Dezani, and Venneri for reasoning about the possible typings of a term when using intersection types. Until recently, it has remained somewhat mysterious and unfamiliar, even though it is essential for carrying out compositional type inference. The fundamental idea of expansion is to be able to calculate the effect on the final judgement of a typing derivation of inserting a use of the intersection-introduction typing rule at some (possibly deeply nested) position, without actually needing to build the new derivation. Recently, we have improved on this by introducing expansion variables (E-variables), which make the calculation straightforward and understandable. E-variables make it easy to postpone choices of which typing rules to use until later constraint solving gives enough information to allow making a good choice. Expansion can also be done for type constructors other than intersection, such as the ! of Linear Logic, and E-variables make this easy. There are no significant new technical results in this paper; instead this paper surveys and explains the technical results of a quarter of a century of work on expansion.
Many different mobile process calculi have been invented, and for each some number of type systems has been developed. Soundness and other properties must be proved separately for each calculus and type system. We present the generic polymorphic type system Poly star which works for a wide range of mobile process calculi, including the pi -calculus and Mobile Ambients. For any calculus satisfying some general syntactic conditions, well-formedness rules for types are derived automatically from the reduction rules and Poly star works otherwise unchanged. The derived type system is automatically sound (i.e., has subject reduction) and often more precise than previous type systems for the calculus, due to Poly star 's spatial polymorphism. We present an implemented type inference algorithm for Poly star which automatically constructs a typing given a set of reduction rules and a term to be typed. The generated typings are principal with respect to certain natural type shape constraints.
A mixin module is a programming abstraction that simultaneously generalizes lambda -abstractions, records, and mutually recursive definitions. Although various mixin module type systems have been developed, no one has investigated principal typings or developed type inference for first-class mixin modules, nor has anyone added Milner's let-polymorphism to such a system.
This paper proves that typability is NP-complete for the naive approach followed by previous mixin module type systems. Because a lambda -calculus extended with record concatenation is a simple restriction of our mixin module calculus, we also prove the folk belief that typability is NP-complete for the naive early type systems for record concatenation.
To allow feasible type inference, we present Martini, a new system of simple types for mixin modules with principal typings. Martini is conceptually simple, with no subtyping and a clean and balanced separation between unification-based type inference with type and row variables and constraint solving for safety of linking and field extraction. We have implemented a type inference algorithm and we prove its complexity to be O(n^{2}), or O(n) given a fixed bound on the number of field labels. (Technically, there is also a factor of alpha (n), but alpha (n) leq 4 for n<10^{10^100} (a ``googolplex'').) To prove the complexity, we need to present an algorithm for row unification that may have been implemented by others, but which we could not find written down anywhere. Because Martini has principal typings, we successfully extend it with Milner's let-polymorphism.
For use in earlier approaches to automated module interface adaptation, we seek a restricted form of program synthesis. Given some typing assumptions and a desired result type, we wish to automatically build a number of program fragments of this chosen typing, using functions and values available in the given typing environment. We call this problem term enumeration. To solve the problem, we use the Curry-Howard correspondence (propositions-as-types, proofs-as-programs) to transform it into a proof enumeration problem for an intuitionistic logic calculus. We formally study proof enumeration and counting in this calculus. We prove that proof counting is solvable and give an algorithm to solve it. This in turn yields a proof enumeration algorithm.
Previous type systems for mobility calculi (the original Mobile Ambients, its variants and descendants, e.g., Boxed Ambients and Safe Ambients, and other related systems) offer little support for generic mobile agents. Previous systems either do not handle communication at all or globally assign fixed communication types to ambient names that do not change as an ambient moves around or interacts with other ambients. This makes it hard to type examples such as a ``messenger'' ambient that uses communication primitives to collect a message of non-predetermined type and deliver it to a non-predetermined destination.
In contrast, we present our new type system PolyA. Instead of assigning communication types to ambient names, PolyA assigns a type to each process P that gives upper bounds on (1) the possible ambient nesting shapes of any process P' to which P can evolve, (2) the values that may be communicated at each location, and (3) the capabilities that can be used at each location. Because PolyA can type generic mobile agents, we believe PolyA is the first type system for a mobility calculus that provides type polymorphism comparable in power to polymorphic type systems for the lambda -calculus. PolyA is easily extended to ambient calculus variants. A restriction of PolyA has principal typings.
Previous type systems for mobility calculi (the original Mobile Ambients, its variants and descendants, e.g., Boxed Ambients and Safe Ambients, and other related systems) offer little support for generic mobile agents. Previous systems either do not handle communication at all or globally assign fixed communication types to ambient names that do not change as an ambient moves around or interacts with other ambients. This makes it hard to type examples such as a ``messenger'' ambient that uses communication primitives to collect a message of non-predetermined type and deliver it to a non-predetermined destination.
In contrast, we present our new type system PolyA. Instead of assigning communication types to ambient names, PolyA assigns a type to each process P that gives upper bounds on (1) the possible ambient nesting shapes of any process P' to which P can evolve, (2) the values that may be communicated at each location, and (3) the capabilities that can be used at each location. Because PolyA can type generic mobile agents, we believe PolyA is the first type system for a mobility calculus that provides type polymorphism comparable in power to polymorphic type systems for the lambda -calculus. PolyA is easily extended to ambient calculus variants. A restriction of PolyA has principal typings.
System E is a recently designed type system for the lambda -calculus with intersection types and expansion variables. During automatic type inference, expansion variables allow postponing decisions about which non-syntax-driven typing rules to use until the right information is available and allow implementing the choices via substitution.
This paper uses expansion variables in a unification-based automatic type inference algorithm for System E that succeeds for every beta -normalizable lambda -term. We have implemented and tested our algorithm and released our implementation publicly. Each step of our unification algorithm corresponds to exactly one beta -reduction step, and vice versa. This formally verifies and makes precise a step-for-step correspondence between type inference and beta -reduction. This also shows that type inference with intersection types and expansion variables can, in effect, carry out an arbitrary amount of partial evaluation of the program being analyzed.
System E is a recently designed type system for the lambda -calculus with intersection types and expansion variables. During automatic type inference, expansion variables allow postponing decisions about which non-syntax-driven typing rules to use until the right information is available and allow the choices to be implemented via substitution.
This paper shows how expansion variables enable a unification-based automatic type inference algorithm for System E that finds a typing for every beta -normalizable lambda -term. We have implemented and tested our algorithm and made our implementation publicly available. We show that each step of our unification algorithm corresponds to exactly one beta -reduction step, and vice versa. This formally verifies and makes precise a correspondence between type inference and beta -reduction that before was only known informally at an intuitive level and only among those who work with intersection types.
Types are often used to control and analyze computer programs. Intersection types give a type system great flexibility, but have been difficult to implement. The ! operator, used to distinguish between linear and non-linear types, has good potential for improving tracking of resource usage, but has not been as flexible as one might want and has been difficult to use in compositional analysis. We introduce System E, a type system with expansion variables, linear intersection types, and the ! type constructor for creating non-linear types. System E is designed for maximum flexibility in automatic type inference and for ease of automatic manipulation of type information. Expansion variables allow postponing the choice of which typing rules to use until later constraint solving gives enough information to allow making a good choice. System E removes many limitations and technical difficulties that expansion variables had in the earlier System I and extends expansion variables to work with ! in addition to the intersection type constructor. We present subject reduction results for call-by-need evaluation and discuss approaches for implementing program analysis in System E.
Previous methods have generally identified the location of a type error as a particular program point or the program subtree rooted at that point. We present a new approach that identifies the location of a type error as a set of program points (a slice) all of which are necessary for the type error. We identify the criteria of completeness and minimality for type error slices. We discuss the advantages of complete and minimal type error slices over previous methods of presenting type errors. We present and prove the correctness of algorithms for finding complete and minimal type error slices for implicitly typed higher-order languages like Standard ML.
Mixin modules are a framework for modular programming that supports code parameterization, incremental programming via late binding and redefinitions, and cross-module recursion. In this paper, we develop a language of mixin modules that supports call-by-value evaluation, and formalize a reduction semantics and a sound type system for this language.
This paper reports on refinements and extensions to the MathLang framework that add substantial support for natural language text. We show how the extended framework supports multiple views of mathematical texts, including natural language views using the exact text that the mathematician wants to use. Thus, MathLang now supports the ability to capture the essential mathematical structure of mathematics written using natural language text. We show examples of how arbitrary mathematical text can be encoded in MathLang without needing to change any of the words or symbols of the texts or their order. In particular, we show the encoding of a theorem and its proof that has been used by Wiedijk for comparing many theorem prover representations of mathematics, namely the irrationality of sqrt 2 (originally due to Pythagoras). We encode a 1960 version by Hardy and Wright, and a more recent version by Barendregt.
In this paper we report on the design of a new mathematical language and our method of designing it, driven by the encoding of mathematical texts. MathLang is intended to provide support for checking basic well-formedness of mathematical text without requiring the heavy and difficult-to-use machinery of full type theory or other forms of full formalization. At the same time, it is intended to allow the addition of fuller formalization to a document as time and effort permits. MathLang is intended to, ultimately, be useful in providing better software support for authoring mathematics, reading mathematics, and organizing and distributing mathematics. The preliminary language presented in this paper is intended only for machine manipulation and for debugging of the design of MathLang.
Principality of typings is the property that for each typable term, there is a typing from which all other typings are obtained via some set of operations. Type inference is the problem of finding a typing for a given term, if possible. We define an intersection type system which has principal typings and types exactly the strongly normalizable lambda -terms. More interestingly, every finite-rank restriction of this system (using Leivant's first notion of rank) has principal typings and also has decidable type inference. This is in contrast to System F where the finite rank restriction for every finite rank at 3 and above has neither principal typings nor decidable type inference. Furthermore, the notion of principal typings for our system involves only one operation, substitution, rather than several operations (not all substitution-based) as in earlier presentations of principality for intersection types (without rank restrictions). In our system the earlier notion of expansion is integrated in the form of expansion variables, which are subject to substitution as are ordinary variables. A unification-based type inference algorithm is presented using a new form of unification, beta -unification.
Many different mobile process calculi have been invented, and for each some number of type systems has been developed. Soundness and other properties must be proved separately for each calculus and type system.
We present the generic polymorphic type system Poly star which works for a wide range of mobile process calculi. For any calculus satisfying some general syntactic conditions, well-formedness rules for types are derived automatically from the reduction rules and Poly star works otherwise unchanged. The derived type system is automatically sound and often more precise than previous type systems for the calculus, due to Poly star 's spatial polymorphism.
We present an implemented type inference algorithm for Poly star which automatically constructs a typing given a set of reduction rules and a term to be typed. The generated typings are principal with respect to certain natural type shape constraints.
We present an automatic type inference algorithm for PolyA, a type system for Mobile Ambients presented in earlier work by us together with Torben Amtoft. We present not only a basic inference algorithm, but also several optimizations to it aimed at reducing the size of the inferred types. The final algorithm has been implemented and verified to work on small examples. We discuss some small problems that still exist and methods for solving them.
For use in earlier approaches to automated module interface adaptation, we seek a restricted form of program synthesis. Given some typing assumptions and a desired result type, we wish to automatically build a number of program fragments of this chosen typing, using functions and values available in the given typing environment. We call this problem term enumeration. To solve the problem, we use the Curry-Howard correspondence (propositions-as-types, proofs-as-programs) to transform it into a proof enumeration problem for an intuitionistic logic calculus. We formally study proof enumeration and counting in this calculus. We prove that proof counting is solvable and give an algorithm to solve it. This in turn yields a proof enumeration algorithm.
Previous methods have generally identified the location of a type error as a particular program point or the program subtree rooted at that point. We present a new approach that identifies the location of a type error as a set of program points (a slice) all of which are necessary for the type error. We describe algorithms for finding minimal type error slices for implicitly typed higher-order languages like Standard ML.
This paper formalizes and proves correct a compilation scheme for mutually-recursive definitions in call-by-value functional languages. This scheme supports a wider range of recursive definitions than standard call-by-value recursive definitions. We formalize our technique as a translation scheme to a lambda-calculus featuring in-place update of memory blocks, and prove the translation to be faithful.
Functional languages encourage the extensive use of recursive fonctions and data structures. It is therefore important that they efficiently implement recursion. In this paper, we formalize and improve a known implementation technique for recursion. The original technique was introduced by Cousineau and Mauny as the ``in-place updating trick''. Consider a list of mutually recursive definitions. The technique consists in allocating a dummy, uninitialized heap block for each recursive definition. The size of these blocks is guessed from the shape of each definition. Then, the right-hand sides of the definitions are computed. Recursively-defined identifiers thus refer to the corresponding dummy blocks. This leads, for each definition, to a block of the expected size. Eventually, the contents of the obtained blocks are copied to the dummy blocks, updating them in place. The only change we propose to this scheme is to update the dummy blocks as soon as possible, immediately after each definition is computed, thus making it available for further use. At the source language level, the improvement allows to extend the class of expressions allowed as right-hand sides of recursive definitions, usually restricted to syntactic functions. We formalize our technique as a translation scheme to a lambda-calculus featuring in-place updating of memory blocks, and prove the translation to be faithful.
Module systems are important for software engineering: they facilitate code reuse without compromising the correctness of programs. However, they still lack some flexibility: first, they do not allow mutually recursive definitions to span module boundaries ; second, definitions inside modules are bound early, and cannot be overridden later, as opposed to inheritance and overriding in class-based object-oriented languages, which follow the late binding semantics. This paper examines an alternative, hybrid idea of modularization concept, called mixin modules. We develop a language of call-by-value mixin modules with a reduction semantics, and a sound type system for it, guaranteeing that programs will run correctly.
Principality of typings is the property that for each typable term, there is a typing from which all other typings are obtained via some set of operations. Type inference is the problem of finding a typing for a given term, if possible. We define an intersection type system which has principal typings and types exactly the strongly normalizable lambda -terms. More interestingly, every finite-rank restriction of this system (using Leivant's first notion of rank) has principal typings and also has decidable type inference. This is in contrast to System F where the finite rank restriction for every finite rank at 3 and above has neither principal typings nor decidable type inference. Furthermore, the notion of principal typings for our system involves only one operation, substitution, rather than several operations (not all substitution-based) as in earlier presentations of principality for intersection types (without rank restrictions). In our system the earlier notion of expansion is integrated in the form of expansion variables, which are subject to substitution as are ordinary variables. A unification-based type inference algorithm is presented using a new form of unification, beta -unification.
This paper presents an abstract framework and multiple diagram-based methods for proving meaning preservation, i.e., that all rewrite steps of a rewriting system preserve the meaning given by an operational semantics based on a rewriting strategy. While previous rewriting-based methods have generally needed the treated rewriting system as a whole to have such properties as, e.g., confluence, standardization, and/or termination or boundedness of developments, our methods can work when all of these conditions fail, and thus can handle more rewriting systems. We isolate the new lift/project with termination diagram as the key proof idea and show that previous rewriting-based methods (Plotkin's method based on confluence and standardization and Machkasova and Turbak's method based on distinct lift and project properties) implicitly use this diagram. Furthermore, our framework and proof methods help reduce the proof burden substantially by, e.g., supporting separate treatment of partitions of the rewrite steps, needing only elementary diagrams for rewrite step interactions, excluding many rewrite step interactions from consideration, needing weaker termination properties, and providing generic support for using developments in combination with any method.
This paper presents an abstract framework and multiple diagram-based methods for proving meaning preservation, i.e., that all rewrite steps of a rewriting system preserve the meaning given by an operational semantics based on a rewriting strategy. While previous rewriting-based methods have generally needed the treated rewriting system as a whole to have such properties as, e.g., confluence, standardization, and/or termination or boundedness of developments, our methods can work when all of these conditions fail, and thus can handle more rewriting systems. We isolate the new lift/project with termination diagram as the key proof idea and show that previous rewriting-based methods (Plotkin's method based on confluence and standardization and Machkasova and Turbak's method based on distinct lift and project properties) implicitly use this diagram. Furthermore, our framework and proof methods help reduce the proof burden substantially by, e.g., supporting separate treatment of partitions of the rewrite steps, needing only elementary diagrams for rewrite step interactions, excluding many rewrite step interactions from consideration, needing weaker termination properties, and providing generic support for using developments in combination with any method.
There are many calculi for reasoning about concurrent communicating processes which have locations and are mobile. Examples include the original Ambient Calculus and its many variants, the Seal Calculus, the MR-calculus, the M-calculus, etc. It is desirable to use such calculi to describe the behavior of mobile agents. It seems reasonable that mobile agents should be able to follow non-predetermined paths and to carry non-predetermined types of data from location to location, collecting and delivering this data using communication primitives. Previous type systems for ambient calculi make this difficult or impossible to express, because these systems (if they handle communication at all) have always globally mapped each ambient name to a type governing the type of values that can be communicated locally or with adjacent locations, and this type can not depend on where the ambient has traveled.
We present a new type system where there are no global assignments of types to ambient names. Instead, the type of an ambient process P not only indicates what can be locally communicated but also gives an upper bound on the possible ambient nesting shapes of any process P' to which P can evolve, as well as the possible capabilities and names that can be exhibited or communicated at each location. Because these shapes can depend on which capabilities and names are actually communicated, the types support this with explicit dependencies on communication. This system is thus the first type system for an ambient calculus which provides type polymorphism of the kind that is usually present in polymorphic type systems for the lambda -calculus.
We describe the design and methods of a tool that, based on behavioral specifications in interfaces, automatically generates simple adaptation functors to overcome minor incompatibilities between ML modules. The transformations that get generated can be expressed in a small recursion-free sublanguage of ML, namely a typed lambda-calculus with function and product types, ML polymorphism, first order type functions and ML equality types. The transformations are correct, they transform type- and behaviorally correct implementations of an interface A into type- and behaviorally correct implementations of an interface B.
The hope is that a tool of this kind can be useful for automatically overcoming incompatibilities that result from certain common interface changes in the evolution of software systems. Examples are changes that generalize functions by adding extra parameters, or changes that give functions different, but isomorphic, types. Such changes alter the types. The behavioral specifications, on the other hand, remain similar, and in many cases the similarities can be automatically recognized and the differences automatically overcome. Another possible application domain is in automatic library retrieval based on semantic specifications.
A program analysis is compositional when the analysis result for a particular program fragment is obtained solely from the results for its immediate subfragments via some composition operator. This means the subfragments can be analyzed independently in any order. Many commonly used program analysis techniques (in particular, most abstract interpretations and most uses of the Hindley/Milner type system) are not compositional and require the entire text of a program for sound and complete analysis.
System I is a recent type system for the pure lambda -calculus with intersection types and the new technology of expansion variables. System I supports compositional analysis because it has the principal typings property and an algorithm based on the new technology of beta -unification has been developed that finds these principal typings. In addition, for each natural number k, typability in the rank-k restriction of System I is decidable, so a complete and terminating analysis algorithm exists for the rank-k restriction.
This paper presents new understanding that has been gained from working with multiple implementations of System I and beta -unification-based analysis algorithms. The previous literature on System I presented the type system in a way that helped in proving its more important theoretical properties, but was not as easy for implementers to follow as it could be. This paper provides a presentation of many aspects of System I that should be clearer as well as a discussion of important implementation issues.
Let S be some type system. A typing in S for a typable term M is the collection of all of the information other than M which appears in the final judgement of a proof derivation showing that M is typable. For example, suppose there is a derivation in S ending with the judgement A |- M : tau meaning that M has result type tau when assuming the types of free variables are given by A. Then (A, tau ) is a typing for M.
A principal typing in S for a term M is a typing for M which somehow represents all other possible typings in S for M. It is important not to confuse this notion with the weaker notion of principal type often mentioned in connection with the Hindley/Milner type system. Previous definitions of principal typings for specific type systems have involved various syntactic operations on typings such as substitution of types for type variables, expansion, lifting, etc.
This paper presents a new general definition of principal typings which does not depend on the details of any particular type system. This paper shows that the new general definition correctly generalizes previous system-dependent definitions. This paper explains why the new definition is the right one. Furthermore, the new definition is used to prove that certain polymorphic type systems using forall -quantifiers, namely System F and the Hindley/Milner system, do not have principal typings.
Let S be some type system. A typing in S for a typable term M is the collection of all of the information other than M which appears in the final judgement of a proof derivation showing that M is typable. For example, suppose there is a derivation in S ending with the judgement A |- M : tau meaning that M has result type tau when assuming the types of free variables are given by A. Then (A, tau ) is a typing for M.
A principal typing in S for a term M is a typing for M which somehow represents all other possible typings in S for M. It is important not to confuse this notion with the weaker notion of principal type often mentioned in connection with the Hindley/Milner type system. Previous definitions of principal typings for specific type systems have involved various syntactic operations on typings such as substitution of types for type variables, expansion, lifting, etc.
This paper presents a new general definition of principal typings which does not depend on the details of any particular type system. This paper shows that the new general definition correctly generalizes previous system-dependent definitions. This paper explains why the new definition is the right one. Furthermore, the new definition is used to prove that certain polymorphic type systems using forall -quantifiers, namely System F and the Hindley/Milner system, do not have principal typings.
Although systems with intersection types have many unique capabilities, there has never been a fully satisfactory explicitly typed system with intersection types. We introduce lambda ^{B} with branching types and types which are quantified over type selectors to provide an explicitly typed system with the same expressiveness as a system with intersection types. Typing derivations in lambda ^{B} effectively squash together what would be separate parallel derivations in earlier systems with intersection types.
Although systems with intersection types have many unique capabilities, there has never been a fully satisfactory explicitly typed system with intersection types. We introduce and prove the basic properties of lambda ^{B}, a typed lambda -calculus with branching types and types with quantification over type selection parameters. The new system lambda ^{B} is an explicitly typed system with the same expressiveness as a system with intersection types. Typing derivations in lambda ^{B} use branching types to squash together what would be separate parallel derivations in earlier systems with intersection types.
We present lambda ^{CIL}, a typed lambda -calculus which serves as the foundation for a typed intermediate language for optimizing compilers for higher-order polymorphic programming languages. The key innovation of lambda ^{CIL} is a novel formulation of intersection and union types and flow labels on both terms and types. These flow types can encode polyvariant control and data flow information within a polymorphically typed program representation. Flow types can guide a compiler in generating customized data representations in a strongly typed setting. Since lambda ^{CIL} enjoys confluence, standardization, and subject reduction properties, it is a valuable tool for reasoning about programs and program transformations.
The CIL compiler for core Standard ML compiles whole SML programs using a novel typed intermediate language that supports the generation of type-safe customized data representations. In this paper, we present empirical data comparing the relative efficacy of several different customization strategies for function representations. We develop a cost model to interpret dynamic counts of operations required for each strategy. One of our results is data showing that compiling with a function representation strategy that makes customization decisions based on the presence or absence of free variables of a function gives a 26% improvement over a uniform closure representation. We also present data on the relative effectiveness of various representation pollution removal strategies. We find that for the types of representation pollution detected by our compiler, pollution removal rarely provides any additional benefit for the representation strategies and benchmarks considered.
The CIL compiler for core Standard ML compiles whole programs using the CIL typed intermediate language with flow labels and intersection and union types. Flow labels embed flow information in the types and intersection and union types support precise polyvariant type and flow information, without the use of type-level abstraction or quantification.
Compile-time representations of CIL types and terms are potentially large compared to those for similar types and terms in systems based on quantified types. The listing-based nature of intersection and union types, together with flow label annotations on types, contribute to the size of CIL types. The CIL term representation duplicates portions of the program where intersection types are introduced and union types are eliminated. This duplication makes it easier to represent type information and to introduce multiple representation conventions, but incurs a compile-time space cost.
This paper presents empirical data on the compile-time space costs of using CIL. These costs can be made tractable using standard hash-consing techniques. Surprisingly, the duplicating nature of CIL has acceptable compile-time space performance in practice on the benchmarks and flow analyses that we have investigated. Increasing the precision of flow analysis can significantly reduce compile-time space costs. There is also preliminary evidence that the flow information encoded in CIL's type system can effectively guide data customization.
The CIL compiler for core Standard ML compiles whole programs using a novel typed intermediate language (TIL) with intersection and union types and flow labels on both terms and types. The CIL term representation duplicates portions of the program where intersection types are introduced and union types are eliminated. This duplication makes it easier to represent type information and to introduce customized data representations. However, duplication incurs compile-time space costs that are potentially much greater than are incurred in TILs employing type-level abstraction or quantification. In this paper, we present empirical data on the compile-time space costs of using CIL as an intermediate language. The data shows that these costs can be made tractable by using sufficiently fine-grained flow analyses together with standard hash-consing techniques. The data also suggests that non-duplicating formulations of intersection (and union) types would not achieve significantly better space complexity.
Cyclic data structures can be tricky to create and manipulate in declarative programming languages. In a declarative setting, a natural way to view cyclic structures is as denoting regular trees, those trees which may be infinite but have only a finite number of distinct subtrees. This paper shows how to implement the unfold (anamorphism) operator in both eager and lazy languages so as to create cyclic structures when the result is a regular tree as opposed to merely infinite lazy structures. The usual fold (catamorphism) operator when used with a strict combining function on any infinite tree yields an undefined result. As an alternative, this paper defines and show how to implement a cycfold operator with more useful semantics when used with a strict function on cyclic structures representing regular trees. This paper also introduces an abstract data type (cycamores) to simplify the use of cyclic structures representing regular trees in both eager and lazy languages. Implementions of cycamores in both SML and Haskell are presented.
We introduce a method to associate calculi of proof terms and rewrite rules with cut elimination procedures for logical deduction systems (i.e., Gentzen-style sequent calculi) in the case of intuitionistic logic. We illustrate this method using two different versions of the cut rule for a variant of the intuitionistic fragment of Kleene's logical deduction system G3. Our systems are in fact calculi of explicit substitution, where the cut rule introduces an explicit substitution and the left- to rule introduces a binding of the result of a function application. Cut propagation steps of cut elimination correspond to propagation of explicit substitutions and propagation of weakening (to eliminate it) corresponds to propagation of index-updating operations. We prove various subject reduction, termination, and confluence properties for our calculi.
Our calculi improve on some earlier calculi for logical deduction systems in a number of ways. By using de Bruijn indices, our calculi qualify as first-order term rewriting systems (TRS's), allowing us to correctly use certain results for TRS's about termination. Unlike in some other calculi, each of our calculi has only one cut rule and we do not need unusual features of sequents.
We show that the substitution and index-updating mechanisms of our calculi work the same way as the substitution and index-updating mechanisms of Kamareddine and Ríos' lambda s and lambda t, two well known systems of explicit substitution for the standard lambda -calculus. By a change in the format of sequents, we obtain similar results for a known lambda -calculus with variables and explicit substitutions, Rose's lambda bxgc.
The CIL compiler for core Standard ML compiles whole programs using the CIL typed intermediate language with flow labels and intersection and union types. Flow labels embed flow information in the types and intersection and union types support precise polyvariant type and flow information, without the use of type-level abstraction or quantification.
Compile-time representations of CIL types and terms are potentially large compared to those for similar types and terms in systems based on quantified types. The listing-based nature of intersection and union types, together with flow label annotations on types, contribute to the size of CIL types. The CIL term representation duplicates portions of the program where intersection types are introduced and union types are eliminated. This duplication makes it easier to represent type information and to introduce multiple representation conventions, but incurs a compile-time space cost.
This paper presents empirical data on the compile-time space costs of using CIL. These costs can be made tractable using standard hash-consing techniques. Surprisingly, the duplicating nature of CIL has acceptable compile-time space performance in practice on the benchmarks and flow analyses that we have investigated. Increasing the precision of flow analysis can significantly reduce compile-time space costs. There is also preliminary evidence that the flow information encoded in CIL's type system can effectively guide data customization.
This paper presents two worked applications of a general framework that can be used to support reasoning about the operational equality relation defined by a programming language semantics. The framework, based on Combinatory Reduction Systems, facilitates the proof of standardization theorems for programming calculi. The importance of standardization theorems to programming language semantics was shown by Plotkin: standardization together with confluence guarantee that two terms equated in the calculus are semantically equal. The framework is applied to the lambda _{v}-calculus and to an untyped version of the lambda ^{CIL}-calculus. The latter is a basis for an intermediate language being used in a compiler.
Modules and linking are usually formalized by encodings which use the lambda -calculus, records (possibly dependent), and possibly some construct for recursion. In contrast, we introduce the m-calculus, a calculus where the primitive constructs are modules, linking, and the selection and hiding of module components. The m-calculus supports smooth encodings of software structuring tools such as functions ( lambda -calculus), records, objects ( varsigma -calculus), and mutually recursive definitions. The m-calculus can also express widely varying kinds of module systems as used in languages like C, Haskell, and ML. We prove the m-calculus is confluent, thereby showing that equational reasoning via the m-calculus is sensible and well behaved.
The error reports produced by compilers for languages with polymorphic type inference are often uninformative. Several attempts have been made to produce improved error reports. We define a manifesto for good error reports and use it to evaluate both the traditional algorithms, and several improved algorithms, including two of our own devising.
Principality of typings is the property that for each typable term, there is a typing from which all other typings are obtained via some set of operations. Type inference is the problem of finding a typing for a given term, if possible. We define an intersection type system which has principal typings and types exactly the strongly normalizable lambda -terms. More interestingly, every finite-rank restriction of this system (using Leivant's first notion of rank) has principal typings and also has decidable type inference. This is in contrast to System F where the finite rank restriction for every finite rank at 3 and above has neither principal typings nor decidable type inference. This is also in contrast to earlier presentations of intersection types where the status (decidable or undecidable) of these properties is unknown for the finite-rank restrictions at 3 and above. Furthermore, the notion of principal typings for our system involves only one operation, substitution, rather than several operations (not all substitution-based) as in earlier presentations of principality for intersection types (without rank restrictions). In our system the earlier notion of expansion is integrated in the form of expansion variables, which are subject to substitution as are ordinary variables. A unification-based type inference algorithm is presented using a new form of unification, beta -unification.
We investigate finite-rank intersection type systems, analyzing the complexity of their type inference problems and their relation to the problem of recognizing semantically equivalent terms. Intersection types allow something of type tau _{1} land tau _{2} to be used in some places at type tau _{1} and in other places at type tau _{2}. A finite-rank intersection type system bounds how deeply the land can appear in type expressions. Such type systems enjoy strong normalization, subject reduction, and computable type inference, and they support a pragmatics for implementing parametric polymorphism. As a consequence, they provide a conceptually simple and tractable alternative to the impredicative polymorphism of System F and its extensions, while typing many more programs than the Hindley-Milner type system found in ML and Haskell.
While type inference is computable at every rank, we show that its complexity grows exponentially as rank increases. Let K(0,n)=n and K(t+1,n)=2^{K(t,n)}; we prove that recognizing the pure lambda -terms of size n that are typable at rank k is complete for DTIME[K(k-1,n)]. We then consider the problem of deciding whether two lambda -terms typable at rank k have the same normal form, generalizing a well-known result of Statman from simple types to finite-rank intersection types. We show that the equivalence problem is DTIME[K(K(k-1,n),2)]-complete. This relationship between the complexity of typability and expressiveness is identical in well-known decidable type systems such as simple types and Hindley-Milner types, but seems to fail for System F and its generalizations. The correspondence gives rise to a conjecture that if T is a predicative type system where typability has complexity t(n) and expressiveness has complexity e(n), then t(n)= Omega ( log ^{*}e(n)).
We introduce a method to associate calculi of proof terms and rewrite rules with cut elimination procedures for logical deduction systems (i.e., Gentzen-style sequent calculi) in the case of intuitionistic logic. We illustrate this method using two different variations of the cut rule for a variant of the intuitionistic fragment of Kleene's logical deduction system G3. Our systems are in fact calculi of explicit substitution, where the cut rule introduces an explicit substitution and the left- to rule introduces a binding of the result of a function application. Cut propagation steps of cut elimination correspond to propagation of explicit substitutions and propagation of weakening (to eliminate it) corresponds to propagation of index-updating operations. We prove various subject reduction, termination, and confluence properties for our calculi.
Our calculi improve on some earlier calculi for logical deduction systems in a number of ways. By using de Bruijn indices, our calculi qualify as first-order term rewriting systems (TRS's), allowing us to correctly use certain results for TRS's about termination. Unlike in some other calculi, each of our calculi has only one cut rule and we do not need unusual features of sequents.
We show that the substitution and index-updating mechanisms of our calculi work the same way as the substitution and index-updating mechanisms of Kamareddine and Ríos' lambda s and lambda t, two well known systems of explicit substitution for the standard lambda -calculus. By a change in the format of sequents, we obtain a similar result for a known lambda -calculus with variables and explicit substitutions, Rose's lambda bxgc.
Girard and Reynolds independently invented System F (a.k.a. the second-order polymorphically typed lambda calculus) to handle problems in logic and computer programming language design, respectively. Viewing F in the Curry style, which associates types with untyped lambda terms, raises the questions of typability and type checking. Typability asks for a term whether there exists some type it can be given. Type checking asks, for a particular term and type, whether the term can be given that type. The decidability of these problems has been settled for restrictions and extensions of F and related systems and complexity lower-bounds have been determined for typability in F, but this report is the first to resolve whether these problems are decidable for System F.
This report proves that type checking in F is undecidable, by a reduction from semi-unification, and that typability in F is undecidable, by a reduction from type checking. Because there is an easy reduction from typability to type checking, the two problems are equivalent. The reduction from type checking to typability uses a novel method of constructing lambda terms that simulate arbitrarily chosen type environments. All of the results also hold for the lambda I-calculus.
Modules and linking are usually formalized by encodings which use the lambda -calculus, records (possibly dependent), and possibly some construct for recursion. In contrast, we present the m-calculus, a calculus where the primitive constructs are modules, linking, and the selection and hiding of module components. In addition to supporting equational reasoning about modules and linking, the m-calculus allows smooth encodings of software structuring tools such as the lambda -calculus, mutually recursive definitions, records (including operations like extension and concatenation), and objects. The m-calculus is extremely well behaved --- we show not only that the m-calculus is confluent but also that it satisfies the strong finite developments property.
Extending the lambda -calculus with either explicit substitution or generalized reduction has been the subject of extensive research recently, and still has many open problems. This paper is the first investigation into the properties of a calculus combining both generalized reduction and explicit substitutions. We present a calculus, lambda gs, that combines a calculus of explicit substitution, lambda s, and a calculus with generalized reduction, lambda g. We believe that lambda gs is a useful extension of the lambda -calculus, because it allows postponement of work in two different but complementary ways. Moreover, lambda gs (and also lambda s) satisfies properties desirable for calculi of explicit substitutions and generalized reductions. In particular, we show that lambda gs preserves strong normalization, is a conservative extension of lambda g, and simulates beta -reduction of lambda g and the classical lambda -calculus. Furthermore, we study the simply typed versions of lambda s and lambda gs, and show that well-typed terms are strongly normalizing and that other properties, such as typing of subterms and subject reduction, hold. Our proof of the preservation of strong normalization (PSN) is based on the minimal derivation method. It is, however, much simpler, because we prove the commutation of arbitrary internal and external reductions. Moreover, we use one proof to show both the preservation of lambda -strong normalization in lambda s and the preservation of lambda g-strong normalization in lambda gs. We remark that the technique of these proofs is not suitable for calculi without explicit substitutions (e.g., the preservation of lambda -strong normalization in lambda g requires a different technique).
We present a new framework for transforming data representations in a strongly typed intermediate language. Our method allows both value producers (sources) and value consumers (sinks) to support multiple representations, automatically inserting any required code. Specialized representations can be easily chosen for particular source/sink pairs. The framework is based on these techniques:
As an instance of our framework, we provide a function representation transformation that encompasses both closure conversion and inlining. Our framework is adaptable to data other than functions.
- Flow annotated types encode the ``flows-from'' (source) and ``flows-to'' (sink) information of a flow graph.
- Intersection and union types support (a) encoding precise flow information, (b) separating flow information so that transformations can be well typed, (c) automatically reorganizing flow paths to enable multiple representations.
Optimizing compilers for function-oriented and object-oriented languages exploit type and flow information for efficient implementation. Although type and flow information (both control and data flow) are inseparably intertwined, compilers usually compute and represent them separately. Partially, this has been a result of the usual polymorphic type systems using forall and exists quantifiers, which are difficult to use in combination with flow annotations.
In the Church Project, we are experimenting with intermediate languages that integrate type and flow information into a single flow type framework. This integration facilitates the preservation of flow and type information through program transformations. In this paper we describe lambda ^{CIL}, an intermediate language supporting polymorphic types and polyvariant flow information and describe its application in program optimiziation. We are experimenting with this intermediate language in a flow and type-directed compiler for a functional language. The types of lambda ^{CIL} encode flow information (1) via labels that approximate sources and sinks of values and (2) via intersection and union types, finitary versions of universal and existential types that support type polymorphism and (in combination with the labels) polyvariant flow analysis. Accurate flow types expose opportunities for a wide range of optimizing program transformations.
We present a typed intermediate language lambda ^{CIL} for optimizing compilers for function-oriented and polymorphically typed programming languages (e.g., ML). The language lambda ^{CIL} is a typed lambda calculus with product, sum, intersection, and union types as well as function types annotated with flow labels. A novel formulation of intersection and union types supports encoding flow information in the typed program representation. This flow information can direct optimization.
System F is the well-known polymorphically-typed lambda -calculus with universal quantifiers (`` forall ''). F+ eta is System F extended with the eta rule, which says that if term M can be given type tau and M eta -reduces to N, then N can also be given the type tau . Adding the eta rule to System F is equivalent to adding the subsumption rule using the subtyping (``containment'') relation that Mitchell defined and axiomatized. The subsumption rule says that if M can be given type tau and tau is a subtype of type sigma , then M can be given type sigma . Mitchell's subtyping relation involves no extensions to the syntax of types, i.e., no bounded polymorphism and no supertype of all types, and is thus unrelated to the system F_ leq (``F-sub'').
Typability for F+ eta is the problem of determining for any term M whether there is any type tau that can be given to it using the type inference rules of F+ eta . Typability has been proven undecidable for System F (without the eta rule), but the decidability of typability has been an open problem for F+ eta . Mitchell's subtyping relation has recently been proven undecidable, implying the undecidability of ``type checking'' for F+ eta . This paper reduces the problem of subtyping to the problem of typability for F+ eta , thus proving the undecidability of typability. The proof methods are similar in outline to those used to prove the undecidability of typability for System F, but the fine details differ greatly.
Church devised the lambda calculus intending it to be a foundation for mathematics instead of set theory, but the lambda calculus has been most useful as a model of computation. Types have been added to the lambda calculus to enforce meaningful computations. Girard and Reynolds independently invented the second-order polymorphically-typed lambda calculus, known as System F, to handle problems in logic and computer programming language design, respectively. Viewing F in the ``Curry style'', associating types with untyped lambda terms, raises the questions of typability and type checking. Typability asks for a term whether there exists some type it can be given. Type checking asks whether a term can be given a particular type. The decidability of these problems has been settled for restrictions and extensions of F and related systems and complexity lower-bounds have been determined for typability in F, but until now decidability for F has remained unknown.
Subtyping relates types so that terms of one type automatically belong to other types. Mitchell devised a simple subtyping system for F, one without bounded polymorphic quantification or a distinguished supertype of all types. Mitchell showed extending F with his subtyping is the same as adding the eta rule to produce System F plus eta, which allows eta-equivalent terms to have the same types. Until recently, the decidability of Mitchell's subtyping has been unknown as has been the decidability of type checking and typability for F plus eta.
This dissertation proves these problems undecidable:
These are the only proofs for 1, 2, and 5. The proofs for 1 and 2 also work for the special case of the lambda-I calculus. There is another proof for 3 which preceded the proof within by a month, but which uses more complex and completely different methods.
- Type checking for F, by a reduction from semi-unification.
- Typability for F, by a reduction from type checking.
- Mitchell's subtyping system, by a reduction from semi-unification.
- Type checking for F plus eta, implied by 3.
- Typability for F plus eta, by a reduction from type checking.
Two notions of reduction for terms of the lambda -calculus are introduced and the question of whether a lambda -term is beta -strongly normalizing is reduced to the question of whether a lambda -term is merely normalizing under one of the notions of reduction. This gives a method to prove strong beta -normalization for typed lambda -calculi. Instead of the usual semantic proof style based on Tait's realizability or Girard's ``candidats de réductibilité'', termination can be proved using a decreasing metric over a well-founded ordering. This proof method is applied to the simply-typed lambda -calculus and the system of intersection types, giving the first non-semantic proof for a polymorphic extension of the lambda -calculus.
Mitchell defined and axiomatized a subtyping relationship (also known as containment, coercibility, or subsumption) over the types of System F (with `` to '' and `` forall ''). This subtyping relationship is quite simple and does not involve bounded quantification. Tiuryn and Urzyczyn quite recently proved this subtyping relationship to be undecidable. This paper supplies a new undecidability proof for this subtyping relationship. First, a new syntax-directed axiomatization of the subtyping relationship is defined. Then, this axiomatization is used to prove a reduction from the undecidable problem of semi-unification to subtyping. The undecidability of subtyping implies the undecidability of type checking for System F extended with Mitchell's subtyping, also known as ``F plus eta''.
We examine the problem of type inference for a family of polymorphic type systems containing the power of Core-ML. This family comprises the levels of the stratification of the second-order lambda -calculus (system F) by ``rank'' of types. We show that typability is an undecidable problem at every rank k geq 3. While it was already known that typability is decidable at rank 2, no direct and easy-to-implement algorithm was available. We develop a new notion of lambda -term reduction and use it to prove that the problem of typability at rank 2 is reducible to the problem of acyclic semi-unification. We also describe a simple procedure for solving acyclic semi-unification. Issues related to principal types are discussed.
The problems of typability and type checking exist for the Girard/Reynolds second-order polymorphic typed lambda -calculus (also known as ``system F'') when it is considered in the ``Curry style'' (where types are derived for pure lambda -terms). Until now the decidability of these problems for F itself has remained unknown. We first prove that type checking in F is undecidable by a reduction from semi-unification. We then prove typability in F is undecidable by a reduction from type checking. Since the reduction from typability to type checking in F is already known, the two problems in F are equivalent (reducible to each other). The results hold for both the usual lambda K-calculus and the more restrictive lambda I-calculus.
We study the problem of type inference for a family of polymorphic type disciplines containing the power of Core-ML. This family comprises all levels of the stratification of the second-order lambda -calculus by ``rank'' of types. We show that typability is an undecidable problem at every rank k geq 3 of this stratification. While it was already known that typability is decidable at rank leq 2, no direct and easy-to-implement algorithm was available. To design such an algorithm, we develop a new notion of reduction and show how to use it to reduce the problem of typability at rank 2 to the problem of acyclic semi-unification. A by-product of our analysis is the publication of a simple solution procedure for acyclic semi-unification.