Abstract: The Computerisation of Information: The challenges and the Pitfalls

Fairouz Kamareddine

The twentieth century saw the invention of computation machines and languages. Then, information engines, the world wide web and electronic search tools have changed the way we store, use and manipulate all aspects of knowledge. The 21st century will continue the search for the best computerisation tools and methods. However, we need to pose first and look at the challenges and pitfalls of the 20th century. In this talk, as a working example throughout, I concentrate on the computerisation of mathematical texts in the MathLang project. However, one can follow the same techniques to computer any other aspect of knowledge. The MathLang project aims at computerizing mathematical texts according to various degrees of formalisations, and without any prior commitment to a particular logical framework (e.g., having to choose either set theory or category theory or type theory, etc.) or to a particular proof checker (e.g., having to choose Mizar or Isabelle or Coq, etc.). Instead, MathLang keeps the choices of the logical framework and proof checker open depending on the taste and expertise of the user. Furthermore, MathLang allows useful computerizations of mathematical texts at much lower levels where the emphasis is not on full formalization as is done in the foundations of mathematics (e.g., as initiated by Frege and Russell) or on proof checking (e.g., as initiated by de Bruijn's Automath).

During computerization, first, the mathematical text is input into the computer exactly as it was written and then one or more MathLang aspects are applied to the text to provide extended versions of the text that can be checked for different levels of correctness. One basic aspect is to extend the text with categorical information (term, noun, adjective, statement, etc) and to automatically check the correctness of the text at this categorical level. This guarantees coherence of the text (e.g., variables are declared before being used and the text constitutes a well structured book). Another aspect is to divide the text into parts annotated with relations (e.g., Corollary A uses Theorem B) and to automatically derive from these relations a number of structures that represent some dependencies in the text which help explain the logical structure of the text. These dependencies are used in a further aspect where a version of the text is transformed into another which shows the holes in the proofs. Other aspects will transform this version into a fully formalized version (say in Mizar or Isabelle).

MathLang was created in 2000 by Fairouz Kamareddine and J.B. Wells as an experience driven project where the computerisation of different texts taken from various branches of mathematics, is the basis for the design and implementation of the MathLang aspects. So far, a number of mathematical texts have been computerised, some of which have been gradually transformed through MathLang aspects into full Mizar. Other proof checkers (e.g., Isabelle and Coq) are envisioned for the near future.

In this talk, the MathLang framework, its developments and its current and future aspects, as well as examples of computerization from original mathematical texts to the fully formalised Mizar versions are given. For each aspect, emphasis will be on its design, formalisation, implementation, the automation available for this aspect and the correctness or trustworthiness of these processes. Then, we discuss how the computerisation path from the original mathematical text to full Mizar will look if Isabelle was the checker chosen instead of Mizar and show that a number of aspects and computerised versions of the original text are common between both path. We also discuss at which stage a commitment to a certain logical framework and a certain proof checker can be made on the path from the original mathematical text to the version fully formalised in that proof checker.

The MathLang project started in 2000 by Fairouz Kamareddine and Joe Wells and has had since 2002 four PhD students (Manuel Maarek, Krzysztof Retel, Robert Lamar and Christoph Zengler) and a number of MSc and BSc students all collaborating and contributing to the design and implementation of the various aspects and to the computerization of mathematical texts.