\documentclass[a4paper,11pt]{report}

\addtolength{\topmargin}{-1in}

\setlength{\textwidth}{6.0in}

\setlength{\textheight}{9.5in}

\addtolength{\oddsidemargin}{-0.7in}

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

\usepackage[english]{babel}

\usepackage{graphicx}

\usepackage{color}

\usepackage{amssymb}

%\usepackage[dvips]{changebar}

%\renewcommand{\baselinestretch}{1.5}

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

\newenvironment{code}{\small \begin{quote}}{\end{quote} \normalsize}

\newenvironment{tblenv} {\begin{figure}[htbp] \begin{center}
\small} {\normalsize \end{center} \end{figure}}

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

\newcommand{\book}[4]{#1. {\em #2}. #3, #4.}

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

\newcommand{\lt}{$<$}

\newcommand{\gt}{$>$}

\newcommand{\back}{$\backslash$}

\newcommand{\home}{\~{}}

\newcommand{\da}{$\doublearrow$}

\newcommand{\dad}{$\rightarrow$}

\newcommand{\underlines}{\underline{\ }\ \underline{\ }\ \underline{\ 
}\ }

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

\newcommand{\lz}{$l_0$}

\newcommand{\lo}{$l_1$}

\newcommand{\uz}{$u_0$}

\newcommand{\uo}{$u_1$}

\newcommand{\ut}{$u_2$}

\newcommand{\Tz}{$\mathsf{T_0}$}

\newcommand{\To}{$\mathsf{T_1}$}

\newcommand{\Tt}{$\mathsf{T_2}$}

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

\newcommand{\lispt}{{\tt t}}

\newcommand{\lispnil}{{\tt nil}}

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

\newcommand{\lu}[1]{{\sffamily #1}}

\newcommand{\luele}{\lu{Element}}

\newcommand{\luheader}{\lu{Header}}

\newcommand{\luelename}{\lu{Element~name}}

\newcommand{\luattlist}{\lu{Attribute~list}}

\newcommand{\luatt}{\lu{Attribute}}

\newcommand{\luattname}{\lu{Attribute~name}}

\newcommand{\luattvalue}{\lu{Attribute~value}}

\newcommand{\luchardata}{\lu{CharData}}

\newcommand{\lupi}{\lu{Processing~Instruction}}

\newcommand{\lupitarget}{\lu{Processing~Instruction~Target}}

\newcommand{\lupibody}{\lu{Processing~Instruction~Body}}

\newcommand{\lucomment}{\lu{Comment}}

\newcommand{\luintdtd}{\lu{Internal~DTD}}

\newcommand{\luentref}{\lu{Entity~reference}}

\newcommand{\logu}{logical unit}

\newcommand{\logus}{logical units}

\newcommand{\Logu}{Logical unit}

\newcommand{\Logus}{Logical units}

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

\newcommand{\logl}{logical line}

\newcommand{\ebuf}{Ebuffer}

\newcommand{\etree}{Etree}

\newcommand{\traveller}{climbing transparency}

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

\newcommand{\key}[1]{\framebox{#1}}

\newcommand{\kl}{\key{$\leftarrow$}}

\newcommand{\kr}{\key{$\rightarrow$}}

\newcommand{\ku}{\key{$\uparrow$}}

\newcommand{\kd}{\key{$\downarrow$}}

\newcommand{\ra}{$\rightarrow$}

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

\newcommand{\C}{\color{red}}

\newcommand{\todo}[1]{{\color{blue} \Large {\bf TODO:} \normalsize
#1}}

\newcommand{\todoweb}[1]{{\color{blue} \Large {\bf TODOWEB:}
\normalsize #1}}

\newcommand{\jbw}[1]{{\color{blue} \Large {\bf JBW:} \normalsize
#1}}

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

\title{Emaxml: the Plan {\Large }\\ \textbf{\large An evolving
requirement and design specification.}\large }

\author{Paolo Debetto\\ Supervisor: Dr. Joe Wells}

\date{Academic year 2001/02}

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

\begin{document}
\maketitle
\newpage
\tableofcontents


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Introduction}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{quote}
Emaxml is an extension of Emacs, written in Emacs Lisp, to edit
XML documents. Major Emacs modes for editing SGML and XML already
exist; this is different in that it allows viewing the document as
a tree structure, both visually and logically.
\end{quote}

%================================================================
\section{Authoring XML with a non-XML-aware editor}
%================================================================

An XML document is often generated automatically by an
application. Nevertheless, in many occasions XML code is edited
directly by a human author. When a normal text editor (i.e. one
with no XML-specific editing facilities) is used to this end, the
author's creativity has to deal with the XML document at three
levels:

\begin{enumerate}

	\item At the {\bf contents level}, the author is concerned
	with what the document is about, the actual information or
	concepts.

	\item At the {\bf structure level}, the author organises the
	document hierarchically, according to the rules set by the DTD
	for that particular class of documents.

	For non-trivial documents the overhead activity involved with
	keeping the structure in order or with changing the current
	structure can be very expensive.

	Moreover, the author has to be concerned with indentation or
	some other means to visually see the structure of the
	document.

	However, this activity is related to the conceptual contents
	of the document.

	\item At the {\bf syntactic level}, the author is concerned
	with getting the XML syntactic sugar right. This activity is
	strictly XML-related and has nothing to do with the topic of
	the document. It is an error-prone activity and the overhead
	involved can be very expensive.

\end{enumerate}

%================================================================
\section{Emaxml's approach}
%================================================================

Obviously most of the work mentioned in the previous section can
be automated to various degrees by an editor with XML editing
facilities, to the purpose of letting the author concentrating on
the contents and the structure of the document abstractly.

The approach of Emaxml is that of taking care of the XML syntax
and providing means of seeing and manipulating the structure of
the document effectively, by displaying the document in a
tree-like fashion.

Figure \ref{fig:Emaxml} shows Emaxml at work.

%================================================================
\section{Emaxml as an open source project}
%================================================================

Consistently with the spirit of Emacs, Emaxml is an open source
project, which implies that the code must be designed and written
such that it can be improved as easily as possible by another
programmer at any time.

To this end, the documentation plays an important role in the
project. In particular, this document aims to provide a growing,
up-to-date documentation of what is been done and what the target
is. Here a programmer should find all the information needed to
extend Emaxml. See Section \ref{sec:Plan} for more detail.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Background}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%================================================================
\section{Emacs}
%================================================================

%------------------------------------------------------------
\subsection{Editing modes}
%------------------------------------------------------------

Editing any particular type of document requires very often a
specialized editor that allows the user to perform some peculiar
processing, or that automatically changes the document layout, or
that displays the text in a way which is different from how the
text is actually stored.

For instance, editing C code and editing a text document are two
very different activities. Paragraph structure is not important
when editing code; indenting each line according to its syntax is
not important when writing a letter.

Emacs is not simply an editor, it is a code editor, a text editor,
a \LaTeX\ editor, a structured outline editor, a directory editor,
a tar file editor, an email editor, and a hundred others, not
least an SGML (and XML) editor. Emacs deals with each type of
document by being in the appropriate editing {\bf mode}.

The basic Emacs core consists of a set of capabilities such as
managing buffers, windows, files, the cursor, etc., plus a Lisp
interpreter, and was written in C. Emacs modes are extensions to
the core, and are written in Lisp, or, rather, in Emacs Lisp, the
Emacs dialect of Lisp.

%------------------------------------------------------------
\subsection{\C Emacs concepts}
%------------------------------------------------------------

\todo{\C Add explanations of Emacs Lisp primitives and
primitive data types which are related to the project. Purpose:
reference for readers}

I assume the reader of this document to be averagely familiar with
the functioning of Emacs from a user's point of view. A few key
Emacs characteristic features that are particularly relevant to my
project are briefly defined here. For detailed information refer
to the Emacs Info manual by pressing 'F1 i' in Emacs; the
following definitions are mainly summarized from there.

\begin{itemize}

	\item {\bf Buffer}: the basic editing unit; one buffer
     	corresponds to one text being edited. There may be several
     	buffers, but at any time only one is being edited, the
     	`selected' buffer,

	\item {\bf Frame}: an X Window System window in which Emacs is
		running. (The following definition for an {\em Emacs
		window} refers to subdivisions of one frame.)

	\item {\bf Window}: Emacs can split a frame into two or many
		windows.  Multiple windows can display parts of different
		buffers, or different parts of one buffer.

	\item {\bf Point}: the location at which editing commands will
		take effect. In the current buffer, the cursor shows where
		point is.

		If several files are being edited in Emacs, each in its
		own buffer, each buffer has its own point location.

		A buffer that is not currently displayed remembers where
		point is in case it is displayed again later.  If the same
		buffer appears in more than one window, each window has
		its own position for point in that buffer.

		The point is also one end of the {\em region} (see below).

	\item {\bf Cursor}: The cursor is the rectangle on the
     	selected buffer that indicates the position of the
     	point. The cursor is on the character that follows point.
     	Often people speak of `the cursor' when, strictly
     	speaking, they mean `point'.

	\item {\bf Mark}: an abstract pointer to a position in the
     	text. The user can set it to specify one end of the region
     	(see below), point being the other end. Each buffer has
     	its own mark.

	\item {\bf Marker}: a specialized Emacs internal data
	  	structure that defines a location in a buffer in terms of
	  	a pair $(\mathit{buffer},\mathit{location})$. It is worth
	  	noting that a marker follows the text as editing changes
	  	are made. Specifically, if text is deleted or inserted
	  	before the marker, the marker's position (an offset from
	  	the beginning of the buffer) is adjusted.

	\item {\bf Mark Ring}: used to hold several recent previous
     	locations of the mark, just in case the user wants to move
     	back to them.  Each buffer has its own mark ring; in
     	addition, there is a single global mark ring.

	\item {\bf Region}: The region is the text between point and
     	the mark.  Many commands operate on the text of the
     	region. If a portion of text is highlighted with the
     	mouse, that becomes the region and point and the mark are
     	updated accordingly.

\end{itemize}

All these concepts, plus many others, are common to all Emacs
applications and an Emacs user will expect Emacs to behave
consistently in a new mode. Thus, the features of Emaxml must be
designed to meet what a typical Emacs user would instinctively try
to do in order to accomplish a task.

%================================================================
\section{XML}
%================================================================

I assume the reader to have a basic knowledge of XML. The purpose
of this section is not to give an exhaustive description of XML,
but to point out a few XML features that are important in the
discussion of the details of Emaxml. For further information refer
to \cite{w3c}, \cite{nut}.

A good web page for quick basic information is at {\em
http://www.w3.org/XML/1999/XML-in-10-points}.
\begin{itemize}

	\item XML is a {\em syntax} which uses tags to allow tree
	structures to be written as a sequence of characters.

	\item An XML document is basically a {\em tree} structure,
	composed of a {\em root element} and (possibly) its {\em
	children}.

	\item The information in an XML document is meant to be
	processed by a {\em client application}, which may be very
	specific to a particular task. For example, a library managing
	application may keep the data base for books, users, etc. in
	XML format.

	\item The set of XML tags is not finite. A set of tags and
	their dependencies can be defined for each class of documents.

	\item Text consists of intermingled character data and markup.

	{\bf Markup} takes the form of start-tags, end-tags,
	empty-element tags, entity references, character references,
	comments, CDATA section delimiters, document type
	declarations, processing instructions, XML declarations, text
	declarations, and any white space that is at the top level of
	the document entity (that is, outside the document element and
	not inside any other markup).

	All text that is not markup constitutes the {\bf character
	data} ({\bf CharData}) of the document.

	\item Some characters such as the '\lt' sign are not allowed
	in an XML document other than for markup and in comments and
	CDATA sections. Where these characters are needed, special
	sequences called {\bf character references} are used. For
	example, '\&lt;' is the character reference for '\lt'.

\end{itemize}

As a matter of fact, XML was born and is mainly used to store
structured information, even though its flexibility makes it also
a highly powerful format for professional text editing.

%================================================================
\section{Existing Emacs modes for editing XML}
%================================================================

In Emacs 20.7.1, XML documents are edited under {\em SGML mode},
SGML being a predecessor of XML.

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=12cm]{deliv1/fig-sgml_mode-12x8.eps}
\end{center}
\caption{Screenshot of SGML mode}
\label{fig:SGML}
\end{figure}

In Fig.\ref{fig:SGML} an XML file is being edited. The approach
used by SGML mode is that of syntax highlighting. Tag names,
attribute names, attribute values and the actual information
(i.e. the {\em CharData}) are in different colors.

The tree structure of the document is not taken in account by
Emacs, and the indenting is left to the user. For instance, the
TAB key pressed in the middle of a line does not perform automatic
indentation as typical in other Emacs modes, and pressed at the
beginning of a line indents it as the previous line.

Many facilities are provided for manipulating elements.

Another Emacs mode suitable for editing XML documents is PSGML. It
has additional features such as support for indentation which
corresponds to the logical structure of an XML document.

PSGML is not part of the standard distribution of Emacs 20.7.1,
but must be obtained separately.

%================================================================
\section{Similar systems}
%================================================================

Emaxml (as it should be when developed further) falls in the
software category of non-proprietary XML editors.

In particular, other similar existing systems may or may not offer
a graphical view of the tree, and may or may not check that the
user is building a tree consistent with the DTD.

Creating Emaxml is in my opinion justified by the fact that it
would give Emacs a visual XML mode, which means that Emaxml is not
just another XML editor, but an integrated part of the most
powerful editor around. The quantity and quality of documentation
available about programming Emacs also mean that Emaxml could
possibly be perfected by anyone feeling so.

I have examined three similar programs, all from the open source
community.

%------------------------------------------------------------
\subsection{Conglomerate editor}
%------------------------------------------------------------

Conglomerate\footnote{Conglomerate home page is at
www.conglomerate.org\ .} is not a simple XML editor. Actually, XML
is not even mentioned in the web pages, from which the following
description is extracted:

\begin{quote}

	``Conglomerate is a complete system for working with
	documents. It lets the user create, revise, archive, search,
	convert and publish information in several media, using a
	single source document.''

\end{quote}

This project has ambitious goals:

\begin{quote}

	``To [reach out to] a wide audience, from would-be Word users
	to techies.

	Simplify simultaneous publishing of information in a range of
	output formats (print and online) from a single source.

	Replace the WYSIWYG document processing paradigm with a
	separated structure/appearance approach, even for simple
	tasks.''

\end{quote}

From Conglomerate I have taken the idea for the expanded view in
Emaxml, as can be seen from figure~\ref{fig:conglo}.

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=4cm]{deliv1/fig-conglo-10x4.eps}
\end{center}
\caption{Conglomerate frontend}
\label{fig:conglo}
\end{figure}

%------------------------------------------------------------
\subsection{GETOX}
%------------------------------------------------------------

\begin{quote}
	``This software aims at giving users the ability to write XML
	files without having advanced knowledge of XML concepts. It
	should also allow users to produce valid documents at any
	time.''
\end{quote}

At the present stage of development, GETOX\footnote{GETOX home
page is at http://idx-getox.idealx.org/index.html} allows the user
to add and remove tags according to the DTD, edit text in PCDATA,
and other editing operations. It does not support yet
cutting/pasting parts of the XML tree and editing attributes.

%------------------------------------------------------------
\subsection{XED}
%------------------------------------------------------------

\begin{quote}
	``XED\footnote{XED home page is at
	http://www.ltg.ed.ac.uk/\home ht/xed} is a text editor for XML
	document instances. It is designed to support hand-authoring
	of small-to-medium size XML documents, and is optimised for
	keyboard input. It works very hard to ensure that you cannot
	produce a non-well-formed document. Although it does not
	validate, the results of offline validation can be accessed,
	and it does read DTDs and keep track of your document
	structure, and provides context-based accelerators to make
	element and attribute entry fast and easy.''
\end{quote}

XED offers facilities for editing the raw XML code.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{\C Topography of Emaxml}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The main goal of Emaxml is to provide the Emacs user with a view
of an XML document as a tree, and with a set of facilities for
manipulating it handily.

This can be achieved by creating the Emaxml mode, which should
hide the XML syntax by displaying the document in a
pseudo-graphical, customizable, hierarchical fashion and automate
the most common or most tedious actions involved in the editing of
an XML document.

{\C The definitions given in {\bf bold} in this and later chapters do
not refer to standard terminology, but are specific to Emaxml.}

%================================================================
\section{\C The look of Emaxml}
%================================================================

Figure \ref{fig:Emaxml} shows what an XML document should look
like when it is being edited with Emaxml\footnote{For comparison,
the document displayed is the same as Fig.\ref{fig:SGML}.}. It
also shows the terminology for the logical units, which are the
subject of section \ref{sec:lu}.

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=16cm]{deliv1/fig-Emaxml_lu-16x20.eps}
\end{center}
\caption{Screenshot of Emaxml, with the indication of the Logical
Units. The document displayed is the same as Fig.\ref{fig:SGML}.}
\label{fig:Emaxml}
\end{figure}

The visual effect of a tree structure is obtained by using one
color for each level of depth {\C (a finite number of colors
is used, and when Emaxml runs out of colors it starts from the
first one again)}. Children are enclosed inside their
parent, and attribute names and values are put together with the
element name in the header lines for that element.

An element (and the relative subtree rooted at it) can be
displayed {\bf outline} or {\bf inline} (that is, vertically or
horizontally) and {\bf expanded} or {\bf collapsed} (that is,
completely visible or displayed as the element name only).

These characteristics are independent so there are four ways of
displaying a subtree (see Fig.\ref{fig:ilol}), called {\bf display
modes}: outline-expanded, outline-collapsed, inline-expanded,
inline-collapsed.

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=18cm]{deliv1/fig-tree_views-24x12.eps}
\end{center}
\caption{Display modes}
\label{fig:ilol}
\end{figure}

The {\bf display state} of a subtree is defined by the display
mode of all the elements it is formed by.

%================================================================
\section{\C \Logus}\label{sec:lu}
%================================================================

A {\bf logical unit} is a portion of an Emaxml buffer that
corresponds to some XML syntactic component.  At any moment during
editing point will be in one or more logical units. Also, a
particular location in the text, such as that defined by the mark,
can be said to be in a particular logical unit.

The response of Emaxml to a user's input will vary depending on
what type of logical unit the cursor is in at any moment, and the
editing operations will have different meaning depending on
whether they are performed on a region which is completely
included in an instance of a logical unit or not.

Logical units\footnote{Throughout this document, logical unit
names are in \lu{Sans Serif} font} are listed in Table
\ref{tbl:lus}.

\begin{tblenv}
\begin{tabular}{|p{6cm}|p{8cm}|}

	\hline

	\luele & A whole element, including its header and its
	children \\

	\hline

	\luheader & An element's name and its attributes \\

	\hline

 	\luelename & The first word of a \luheader \\

	\hline

	\luattlist & The set of attributes of an element and their
	values \\

	\hline

	\luatt & An attribute's name and its value \\

	\hline

	\luattname & An attribute's name \\

	\hline

	\luattvalue & An attribute's value \\

	\hline

	\luchardata & \C The plain text. In \luchardata\ \logus,
	illegal characters ('"\lt\gt\&) are edited and displayed
	normally. However, they are codified to entity references when
	the XML file is written.\\

	\hline

	\lupi & A processing instruction \\

	\hline

	\lupitarget & The target of a processing instruction, i.e. the
	first word of a \lupi \\

	\hline

	\lupibody & The command of a processing instruction \\

	\hline

	\lucomment & A comment text \\

	\hline

	\luintdtd & The text relative to the definition of the
	internal DTD \\

	\hline

	\luentref & The text of an entity reference \\

	\hline

\end{tabular}
\caption{Logical Units} \label{tbl:lus}
\end{tblenv}



%================================================================
\section{\C Categories of \logus}
%================================================================

Subsets of \Logus\ are grouped into categories according to some
properties, as described in Table \ref{tbl:lucat}.

\begin{tblenv}
\begin{tabular}{|p{3cm}|p{3cm}|p{8cm}|}

	\hline

	{\bf Category} & {\bf Property} & {\bf \Logus} \\

	\hline \hline

	{\bf Elementary} & Composed of characters only. & \luelename,
	\luattname, \luattvalue, \luchardata, \lupitarget, \lupibody,
	\lucomment, \luintdtd, \luentref. \\

	\hline

	{\bf Compound} & Composed of elementary {\C and
	compound} \logus. & \luele, \luheader, \luattlist, \luatt,
	\lupi. \\
	
	\hline

	{\bf Leaf} & {\C Are effectively the main building blocks of
	the tree from a high level point of view, i.e. the user's.} &
	\luele, \luchardata, \lupi, \lucomment, \luentref, \luintdtd \\

	\hline

	{\bf Special} & Leaf units which cannot have children &
	\luchardata, \lupi, \lucomment, \luentref. \\

	\hline

	{\bf Multiline} & May contain {\em newline} characters. &
	\luintdtd, \lucomment, \luchardata, {\C \luattvalue}.\\

	\hline

	{\bf Monoline} & May not contain {\em newline} characters. &
	\luelename, \luattname, \luattvalue, \lupitarget, \lupibody,
	\luentref. \\

	\hline
		
\end{tabular}
\caption{Categories of \Logus} \label{tbl:lucat}
\end{tblenv}

%================================================================
\section{\C Space terminology.}
%================================================================

Not all locations in the display belong to a \logu. For instance, the
colored spaces at the left of an \luelename\ or the colon following an
\luattname\ are not part of any logical unit.

All such locations and their contents are said to be {\bf automatic},
because they are managed by Emaxml and cannot be directly modified by
the user. Non-automatic locations of the buffer form the {\bf user
space} of the buffer. Any user location belongs to one or more \logu.

The user may place the cursor on some automatic locations, namely
where such a location is immediately on the right of a user
location. These locations are therefore not completely automatic, are
called {\bf ubiquitous}, and allow the insertion of new text. For
example, the colon in an \luatt\ is ubiquitous and when the cursor is
over it any text inserted is appended to the relative \luattname.

%================================================================
\section{\C Logical lines}
%================================================================

A {\bf logical line} is either one line (up to a newline) of a
multiline \logu\ or an entire monoline \logu.

The {\bf previous} \logl\ with respect to a \logl\ is the first
\logl\ that is encountered by going left and up {\C in the
display}.  The {\bf following} \logl\ with respect to a \logl\ is
the \logl\ that is encountered by going right and down.

%================================================================
\section{The seed element and the \luintdtd}
%================================================================

The topmost element is called the {\bf seed
element}\footnote{Because it comes before the root element...},
and does not correspond to an actual XML element; it contains
information relative to the document, such as that contained in
the XML declaration. From the user's point of view, the header of
the seed element is treated as a normal header, whose attributes
are the said information.  None of the fields contained in this
header are compulsory\footnote{ In fact, the BNF production that
defines the prolog is
\begin{center}
[22] $prolog$ ::= $XMLDecl$? $Misc$* ($doctypedecl$\ $Misc$*)?
\end{center}
}, although they are very common. So they will appear by default
when visiting a new document, but, if left blank, they will not be
written in the file. The only exception to this policy is the
'standalone' attribute, whose value is inferred by Emaxml
depending on whether the 'External DTD' attribute has been left
blank or not.

There may only be one instance of a \luintdtd\ and it must be a
child of the seed element. In the text of the \luintdtd, Emaxml lets
the user write whatever s/he wants, since it is not parsed.

%================================================================
\section{Buffer units}
%================================================================

A portion of an Emaxml buffer delimited by two locations \lz\ and
\lo\ both in user space is called a {\bf unit}\footnote{The term
{\em unit} is used instead of {\em region} for consistency with
the term {\em logical unit}.} and can be:

\begin{itemize}

	\item a {\bf string}, if \lz\ and \lo\ are both inside the
	same elementary \logu;

	\item a logical unit, as said, if \lz\ and \lo\ are the first
	and the last location respectively of the same \logu;

	\item an {\bf illogical unit}\footnote{The adjective {\em
	illogical} is here chosen on purpose, to emphasize the
	apparent weirdness of a type of structure that instead may
	sometimes come quite handy.}, if \lz\ and \lo\ belong to two
	different \logus.

\end{itemize}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{\C The behavior of Emaxml from the user's point of view}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

A very high number of useful editing functionalities can be devised
for Emaxml mode. Moreover, an XML editor can enforce a certain degree
of control over the semantics of the tree, i.e. can be aware or not of
the DTD for the document being edited and help the user in a number of
ways by exploiting this knowledge. In this chapter I arbitrarily set
the initial target for my project in terms of functionalities and in
terms of the level of control implemented, respectively. However, it
is a strong requirement that the core of the mode be coded and
documented in a way to facilitate later extension in both directions.

In editing an XML document, a basic operation (such as yanking or
killing\footnote{{\em Killing} and {\em yanking} are Emacs jargon
for the more common {\em cutting} and {\em pasting}}) can be
performed at three different levels:

\begin{itemize}

	\item At the {\bf character level} the operation involves an
	Emaxml string.

	\item At the {\bf logical level} the operation is performed
	over an entire logical unit, for example the deletion of an
	element.

	\item At the {\bf illogical level} the operation is performed
	on an illogical unit.

\end{itemize}

The next sections specifies Emaxml requirements, which are subdivided
into two main parts: the description of what the common editing
operations mean under Emaxml mode and the definition of the new
editing operations specific to XML manipulation.

%================================================================
\section{Standard Emacs editing operations {\em \`{a} la} Emaxml}
\label{sec:edop}
%================================================================

The typical Emacs user will expect a number of standard editing
facilities to be available in Emaxml mode, and their behavior to
parallel that of other modes.

As said, my project has as a target the redefinition of a finite
set of operations only. This is a subset of the list obtained by
pressing 'F1 b' in a {\em Fundamental} buffer in Emacs. It is not
as important to implement a large number of functionalities
immediately as it is to design the code in a modular fashion and
document it properly, and choose at least a few operations from
each of the following categories:

\begin{itemize}

	\item {\bf PM}: point movement;

	\item {\bf ID}: insertion and deletion;

	\item {\bf MK}: mark operations;

	\item {\bf RE}: region setting;

	\item {\bf SR}: search \& replacement;

	\item {\bf GE}: general common operations (e.g. 'undo').

\end{itemize}

Appendix~\ref{app:stdcmd} lists the minimum set of commands to be
implemented for Emaxml as part of this project.

%------------------------------------------------------------
\subsection{Visiting a file}
%------------------------------------------------------------

Emaxml mode will be activated on visiting a file with extension
{\em .xml} and when a buffer containing a file with extension
other than {\em .xml} is saved with extension {\em
.xml}\footnote{In Emacs, if a buffer is saved with 'C-x C-w'
('Save Buffer As...' in the Files menu) with a different
extension, its major mode changes accordingly.}. In the latter
case, the buffer needs be re-displayed also.  In case the file
does not already exist, the buffer will contain the seed element,
blank, as described in section \ref{sec:create}.

%------------------------------------------------------------
\subsection{\C Saving a file}\label{sec:savingAFile}
%------------------------------------------------------------

When a file needs saving, the Writer is invoked to produce the XML
code from the internal representation of the document.

Also, the \luintdtd\ is syntactically checked before the file is
saved, because it may contain sequences of characters that would
make the written file illegal\footnote{Consider for example an
\luintdtd\ containing {\tt ...]]> <root-element>...}. If this is
written without being checked, the file will be subsequently
unparsable.}. In the case that the \luintdtd\ is recognised as
incorrect, the file will be saved without it, and the user will be
advised so that s/he can take whichever action s/he thinks
appropriate.

%------------------------------------------------------------
\subsection{\C Highlighting the region} \label{sec:high}
%------------------------------------------------------------

When using the mouse to highlight the region, or in Transient Mark
mode\footnote{In Transient Mark mode, when the mark is active, the
region is highlighted.}, only the locations of user space included in
the region will be highlighted.

However, when a leaf \logu\ is completely included in the highlighted
region, all of its physical space will be colored, including the
non-user space, otherwise only the included user space will be
colored.

In figure \ref{fig:ata}-(0) the region highlighted is {\em illogical},
i.e. it starts in an elementary \logu\ and terminates in a different
one.  The comment and the first meta element are examples of the first
case, the image, properties and the last meta elements are examples of
the second case.

Two useful commands are implemented in Emaxml: mark-more and
mark-less.

\begin{description}

\item[Mark more] is to expand the region to the next bigger \logu\
  containing the region (or containing point if mark is undefined).

\item[Mark less] does the opposite. Calls to mark-less always select
the first child when there is more than one.

\end{description}

For example, if the region is currently a string in an \luattvalue,
mark-more sets region to the containing \luatt. Further calls to
mark-more set region to the containing \luattlist, then to the
\luheader, then to the \luele, and so on up to the entire
buffer. 


%------------------------------------------------------------
\subsection{\C Text search and replacement}
%------------------------------------------------------------

Text search and replacement in Emaxml will operate on the user
space only.  Both string and regular expression search and
replacement will be implemented.

\jbw{a comment I don't understand: ``what about replacing element
trees?''} 

%------------------------------------------------------------
\subsection{\C Moving point} \label{sec:move}
%------------------------------------------------------------

Movement in Emaxml acquires different meanings (and names)
depending on which level it is performed at:

\begin{itemize}

	\item At character level: {\bf sliding}.

	\begin{itemize}

		\item {\bf Horizontal sliding}: moving by one character.

		Point can slide inside the current \logl\ one character
		backward or forward.

		Sliding point backward when at the beginning of a \logl\
		moves it to the end of the previous \logl.

		Sliding point forward when at the end of a \logl\ moves it
		to the beginning of the previous \logl.

		\item {\bf Vertical sliding}: moving by \logl s.

		When performing a vertical sliding movement
		(e.g. 'next-line'), point is expected to behave as closely
		as possible to the usual behavior.  A movement that starts
		at the $l$th location of a \logl, and ends at a different
		\logl, will end at the $l$th location of the arrival
		\logl\ if that has at least $l$ characters, or at its last
		location otherwise.  However, $l$ is remembered for
		successive vertical movements.  This Emacs standard
		property I will call {\bf \traveller}, because it is
		proved for an editor by demonstrating that performing a
		vertical movement in one direction followed immediately by
		a vertical movement in the opposite direction, point will
		always return to the initial location.

		Point can slide one \logl\ up or down with \traveller.

	\end{itemize}

	\item At the logical level: {\bf traversing}.

	Traversing (the XML tree) refers to moving from one XML
	component to another, hence the \logus\ involved are:
	\luheader, \luchardata, \lupi, \lucomment, \luintdtd.

	Traversing will be implemented with no \traveller; point is
	always moved to the first character of the arrival \logu.

	\begin{itemize}

		\item {\bf Horizontal traversing}\footnote{The choice of
		adjectives {\em horizontal} and {\em vertical} for
		traversing is due to the fact that in an Emaxml buffer the
		parent of a \logu\ can always be thought of as being ``on
		the left'', while a peer is always up or down.}: moving
		hierarchically.

		Point can traverse left from one of the said \logus\ to
		its XML parent.

		Traversing right can only be done from a \luheader\ to the
		first child of the current element, if any.

		\item {\bf Vertical traversing}: moving to peers.

		Point can traverse from one of the said \logus\ (except
		the seed element, which has no peers) to the next or
		previous instance of the same \logu\ found in the buffer,
		regardless of them being sibling.

	\end{itemize}

\end{itemize}

%------------------------------------------------------------
\subsection{\C Mark and the mark ring}
%------------------------------------------------------------

The mark ring will be implemented in Emaxml, consistently with its
standard definition and functionality. Obviously, the internals
relative to such implementation will be specific to Emaxml;
markers have to be objects of a different data structure, since
they must describe a location in terms of the tree.

An important property of a marker that must be preserved is that
it ``moves'' with the position in the buffer to which it is
pointing, i.e. if some text is added or deleted before that
position, the marker always points to the same character.

%------------------------------------------------------------
\subsection{\C Killing and yanking}
%------------------------------------------------------------

{\em Killing} and {\em yanking} in Emacs jargon mean {\em cutting}
and {\em pasting} respectively.

In standard Emacs operation when a portion of the buffer is killed it
is deleted from the buffer and stored in the kill ring for later use.
One property (I shall call it {\bf yanking transparency}) of an Emacs
buffer is that if some text is killed and immediately yanked, the
buffer does not change. Point may be in a different location, though.

Yanking is an insertion operation; the portion of buffer being
yanked is inserted before point.

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -
\subsubsection{\C Killing in Emaxml} \label{sec:kill}
% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Many Emacs commands exist for killing. The ones considered here are:
kill-line, kill-region, kill-word, kill-sexp, kill-buffer. Their names
are self-explanatory, (a part from kill-sexp, maybe, which in Emaxml
kills the current element). Kill-line kills a \logl\
instead of a normal line.

Since the object of the editing in Emaxml is a tree, the killing
and yanking operations must be redefined in terms of \logus.

Let us consider a portion $p$ of an Emaxml buffer being killed
that starts from location \lz\ in leaf \logu\ \uz\ and ends at
location \lo\ in leaf \logu\ \uo.

When $p$ is killed, it is stored in the kill ring.

If $p$ is a string, it is stored as such, whith no information
regarding wich \logu\ it was part of. On the other hand, Emaxml stores
killed \logus\ and il\logus\ as {\em trees}.

This stored tree is:

\begin{itemize}

\item If $p$ is an entire \logu, the tree rooted at \uz, including all
  its children.

\item If $p$ is an il\logu, the tree rooted at the smallest \logu\ $s$
  that include all the \logus\ totally or partially in $p$, and that
  does not include the children of $s$ which are not in $p$ and the
  portions of user space of $s$ not in $p$.

  This implies that the stored tree may have some blank \logus\ in it.

\end{itemize}

Killing $p$ has the following effects:

\begin{itemize}

\item[(1)] If $p$ is a string, point is left at \lz.

\item[(2)] If $p$ is a leaf \logu\, then point is left:

  \begin{itemize}

  \item if $p$ has a peer $q$ below it, at the first character of
    $q$'s user space;

  \item otherwise if $p$ has a peer $q$ above it, at the first
    character of $q$'s user space;

  \item otherwise at the first character of $p$'s parent's user space.

  \end{itemize}

\item[(3)] If $p$ is a non-leaf \logu, then it is one of the
  elementary \logus\ in an \luele\ or in a \lupi; a blank \logu\ of
  the same type as $p$ is inserted in place of $p$ with point at its
  first location.

\item[(4)] If $p$ is an il\logu, then point is left as in policies
  (1), (2), (3) but substituting ``the portion of \uz\ in $p$'' for
  $p$.

  When killing an il\logu, Emaxml rebuilds the tree as follows, in
  order\footnote{For an example, see Appendix~\ref{app:killre}}:

  \begin{itemize}

  \item[(i)] all leaf \logus\ entirely included in $p$ are pruned;

  \item[(ii)] all non-leaf \logus\ entirely included in $p$ are
    blanked and possibly removed;

  \item[(iii)] all characters in $p$ are eliminated.

  \end{itemize}

\end{itemize}

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -
\subsubsection{\C Yanking in Emaxml} \label{sec:yank}
% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The following are the main general properties of yanking in Emaxml:

\begin{itemize}

\item The object $p$ to be yanked is either a string or a subtree in
  the kill ring.

\item Yanking leaves point after the last location of the user space
  of the yanked unit, i.e. the cursor is left under the first
  character of the user space after the yanked unit.

\item If $p$ is a string, it is yanked at point. This involves
  checking the result for consistency with the rules
  \todo{e.g. PItarget can be one word only}

\end{itemize}

Yanking a non-string unit involves somehow tree manipulation. The only
limitation posed by Emaxml is that non-leaf \logus\ (such as
\luattname\ or \lupitarget) cannot be yanked other than in an
appropriate compound \logu.

Since both logical and illogical units are complete subtrees, there is
no difference in yanking them.

Let us consider a unit $p$ stored by Emaxml in the kill ring.

If $p$ is a leaf \logu, then yanking transparency does not hold
\footnote{See Appendix~\ref{app:yankee} for a discussion of this.}, so
$p$ must be explicitly yanked as a peer or as a child. If yanked as a
peer it is inserted before the current leaf, if yanked as a child it
is inserted as its last child.

The Emaxml policies for yanking $p$ into some \logu\ \ut\ are
described in Fig.\ref{fig:yankingcriteria}.

\begin{tblenv}

  \begin{tabular}{l|l|p{5cm}}
  
    \hline
    
	{\bf $p$ is a} & {\bf \ut\ is a} & {\bf Yanking $p$ into \ut} \\

	\hline

	string & elementary \logu & $p$ can be yanked anywhere in \ut
	\\

	leaf \logu & \luheader & $p$ can be yanked either as child or
	as peer of the element which \ut\ belongs to\\

	leaf \logu & leaf \logu & $p$ yanked as a peer of \ut \\

	\luattlist\ or \luatt & \luheader & $p$ inserted at end of
	\ut's \luattlist \\

	\hline
      
  \end{tabular}

  \caption{Yanking criteria for $p$ not
    illogical.}\label{fig:yankingcriteria}

\end{tblenv}


%================================================================
\section{Emaxml specialized editing operations}
%================================================================

Appendix~\ref{app:newcmd} lists the minimum set of new commands
specific to Emaxml to be implemented as part of this project.
They are described in the following sections.

%------------------------------------------------------------
\subsection{\C Creating a new instance of a logical unit} \label{sec:create}
%------------------------------------------------------------

Creating a new instance inserts a {\bf blank \logu}. A blank \logu\ is
an instance of a \logu\ with the strings referring to the user space
empty. For example, an \luatt\ is of the form:

$$
\overbrace{\mbox{\tt \underline{name}}}^\mathcal{U}
\underbrace{\mbox{\tt :}\Box}_{\mathcal{A}}
\overbrace{\mbox{\tt value}}^{\mathcal{U}}
$$

where `$\Box$' indicates a space, `$\mathcal{U}$' indicates user space
and `$\mathcal{A}$' indicates automatic/ubiquitous space. A blank
\luatt\ is therefore displayed as a colon followed by a space (which
cannot be seen) and represented internally\footnote{See section ``Data
structure'' for details on internal representation} as

\begin{code}
\begin{verbatim}
(attribute (attName "") (attValue ""))
\end{verbatim}
\end{code}

In the example, the user is able to move with the cursor over the
colon and insert characters which are taken as part of the attribute
name, or to place the cursor after the space to insert characters of
the attribute value.

Specifically, the blank \logus\ are described in terms of both displaying
and internal representation in Fig.\ref{tbl:blanklogus}.

\begin{tblenv}
\begin{tabular}{|p{3cm}|p{4cm}|p{8cm}|}

  \hline

      {\bf \logu} & {\bf Display} & {\bf Internal representation} \\

      \hline
      \hline

      \luattname & & {\tt (attName "")} \\

      \hline

      \luattvalue & & {\tt (attValue "")} \\

      \hline

      \luatt & {\tt :$\Box$} & {\tt (attribute (attName "") (attValue
      ""))} \\

      \hline

      \luattlist & {\tt :$\Box$} & {\tt (attList (attribute (attName "") (attValue
      "")))} \\

      \hline

      \luelename & $\Box$ & {\tt (eleName "")} \\

      \hline
      
      \luheader & \underline{$\Box$}\underlines & {\tt (header
      (eleName ""))} \\

      \hline

      \luele & \underline{$\Box$}\underlines & {\tt (element (header (eleName
      "")))} \\

      \hline

      \luchardata & Three blank lines (one as top ``margin'', one for
      the text and one as bottom ``margin'') & {\tt (charData "")} \\

      \hline
      
      \lupitarget & & {\tt (PITarget "")} \\

      \hline

      \lupibody & & {\tt (PIBody "")} \\

      \hline

      \lupi & \underline{\tt $<$?$\Box$:$\Box$}\underlines & {\tt (PI
      (PITarget "") (PIBody ""))} \\

      \hline
   
      \lucomment & \underline{\tt $<$!--}$\Box$ & {\tt (comment "")}
      \\

      \hline

      \luintdtd & \underline{\tt $\Box$[}$\Box$ & {\tt (intDTD "")}
      \\

      \hline

      \luentref & \underline{\tt $\Box$\&$\Box\Box$;$\Box$}
      & {\tt (entRef "")} \\

      \hline

\end{tabular}
\caption{Blank \logus. `$\Box$' indicates a space, underlined text
  indicates a colored background.}
\label{tbl:blanklogus}
\end{tblenv}

When a new \logu\ is created, point is placed at its first ubiquitous
location, since there are no available user locations in a blank
\logu.

Inserting a new \logu\ does not always make sense, for example
creating a new \luattname\ alone, or creating a \lupi\ when the cursor
is on the middle of a \luchardata. The meaning of such commands must
be interpreted guessing what the user may be wanting when s/he issues
them. That depends on where point is and what type of \logu\ the user
wants to create. The following is the set of rules that governs the
insertion of new \logus.

\begin{itemize}

%A new child is inserted as the first child of the element, if point is
%on the \luheader\ of the element, while a
%sibling is inserted in place of the leaf component.


%\item Creating a leaf \logu\ results in a new instance of such a
%  \logu\ being inserted as a sibling of the current leaf, immediately
%  before that, unless the current leaf is the \luseed, in which case
%  the new \logu\ is inserted after the seed (if such a \logu\ is
%  allowed, see Fig.\ref{tbl:XD:classes}).

\item A leaf \logu\ can be created as a child of an element or as a sibling
of the current leaf \logu. Different commands are provided for these
two operations. 

\item Creating a \luattlist, a \luattname\ or a \luattvalue\ is
equivalent to creating a \luattribute.

\item Creating a \lupitarget\ or a \lupibodi\ is equivalent to
  creating a \lupi.

\end{itemize}

\todo{maybe to be made better?}

%------------------------------------------------------------
\subsection{Adjusting the displaying of the tree structure}
%------------------------------------------------------------

The following statements define the manipulation of the visual
tree:

\begin{itemize}

	\item The tree structure is by default displayed entirely
	outline-expanded when the document is initially visited.

	\item Making an outline subtree inline makes all its children
	temporarily inline, and does not change its or its children's
	expansion mode.

	\item Expanding a collapsed subtree brings it back to the
	display status it was before being collapsed (i.e. all its
	elements return to their previous display mode).

	\item The root element and the seed element cannot be made
	inline or collapsed.

	\item The \luheader\ of a collapsed or inline element has no
	user space.

\end{itemize}

An optional further development may be that the entire document
display status be saved along with the file (e.g. encoded somehow
inside the document) and restored the next time the document is
visited in Emaxml mode.

%================================================================
\section{\C Whitespace handling}
%================================================================

A piece of whitespace is a sequence of one or more spaces, tab
characters, carriage return characters or linefeed characters.

Whitespace can be in an XML file for two reasons: 

\begin{itemize}

	\item Because it is part of the character data (e.g. spaces
	between words, or for indentation such as in code or
	poetry). I will call this {\bf blackspace}. Blackspace read in
	from an XML file must be preserved.

	\item Because it is used to make the markup more
	human-readable (e.g. blank lines, or tabs used for
	indentation). I will call this {\bf metaspace}. Metaspace does
	not affect the semantics of an XML file.  

\end{itemize}

The interpretation of a piece of whitespace as either blackspace
or metaspace is strictly application-dependent.

Emaxml presently treats all whitespace as blackspace, but can be
expanded to process whitespace according to precise needs.

\jbw{Could this policy be changed to something like: charData
objects which contain whitespace only are discarded as metaspace,
the rest is kept as blackspace?}

%------------------------------------------------------------
\subsection{\C Whitespace in parsing}
%------------------------------------------------------------

%------------------------------------------------------------
\subsection{\C Whitespace in writing}
%------------------------------------------------------------

%------------------------------------------------------------
\subsection{\C Whitespace in displaying}
%------------------------------------------------------------


%================================================================
\section{Control issues} \label{sec:ctrl}
%================================================================

Emaxml, at the stage of development I set as target for my
project, will not enforce control over the tree structure created
by the user or read from an XML file in terms of the DTD.

Emaxml will see the document ``simply'' as a syntactic structure
that must comply with the grammar set out by the BNF rules in
\cite{w3c}. Note that the internal DTD is {\em not} parsed. The
user will be able to manipulate the tree as s/he wants without
being bothered with indentation. The ``less-thans'', quotes, and
other syntactic sugar of XML will be hidden. This should hopefully
let the user concentrate more on the contents, but, on the other
end, the user will also be able to create documents that are not
well-formed, or invalid.

However, the actual code implementing the editing mode must be
designed so that adding semantic awareness and control should be
easy.

Examples of events that may trigger a contents check or other
semantics-related operations are:

\begin{itemize}

	\item the contents of a non-leaf elementary \logu\
	(i.e. \luelename, \luattname, \luattvalue) is changed at the
	character level;

	\item point leaves an elementary \logu.

	\item the tree is changed; this includes creation of instances
	of any \logu, killing and yanking at the logical and illogical
	levels,

	\item the contents of the seed element are changed;

	\item the contents of the \luintdtd\ are changed.

\end{itemize}

These and other such events must be easily recognizable and
exploitable from the programmer's point of view.

%------------------------------------------------------------
\subsection{Low-level control}
%------------------------------------------------------------

Some \logus, by their nature, have limitations on what can be inserted
in them. For instance, no spaces may be inserted in a \lupitarget,
since the target of a processing instruction can only be one word.

Emaxml enforces low-level control on the text inserted by the user, by
matching the contents of the current elementary \logu\ with a suitable
regular expression.

By {\em low-level control} I mean that only the wellformedness is
checked, as opposed to checking the validity as well, which would
involve ensuring that what the user inserts is also consistent with
the DTD declarations (for instance, that a referenced entity has been
declared in the DTD).

Low-level control can be achieved in two ways:

\begin{description}

\item[by constant monitoring:] the check is triggered by a change in
  the \logu, by means of hooking a function to the appropriate text
  property or overlay 

following a set of rules called {\bf L-rules}, detailed in
Appendix~\ref{app:lrules}. When a document is read in from a file,
these rules are implicitly applied by the Parser.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{\C Implementation}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The project is to be implemented as an extension to Emacs, i.e. as an
Emacs mode. Thus, it is to be coded in Emacs Lisp.

The following priorities have to be kept in mind during coding:

\begin{itemize}

\item \underline{Consistency} with the Xml specification given in
  \cite{w3c}, in particular with the BNF definitions numbered in
  square brackets ({\bf BNF-defs} for short).

\item \underline{Modularity}, so that functions and constants can be
  re-used in different contexts and the set of actual parsing
  functions can be extended easily.

\item \underline{Readability} of the code, to facilitate future
  possible improvement.

\end{itemize}

%================================================================
\section{Data structures}
%================================================================

The object of the editing, an XML file, will be seen by Emaxml in
two ways at the same time: as an Emacs buffer (the {\bf \ebuf}),
used to display the visual representation of the tree, and as a
list (the {\bf \etree}), structured as to allow an abstract,
logical view of the document and appropriate manipulation.

An interface to manipulate the \etree\ and its components is
provided as the {\em XD data model} described below.

%------------------------------------------------------------
\subsection{\C The XD data model}\label{sec:theXDDataModel}
%------------------------------------------------------------

The {\bf XML Document data model} ({\bf XD}) is composed of:

\begin{itemize}

	\item a hierarchy of classes ({\bf XD-classes}) that reflect
	Emaxml \logus;

	\item a set of functions ({\bf XD-functions}) to manipulate
	the objects in the data model ({\bf XD-objects}).

	\item a set of constants ({\bf XDRE Toolkit}) that reflect the
	BNF-defs;

\end{itemize}

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -
\subsubsection{\C The XD-classes}
% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

An XD-object belongs to one of the XD-classes listed in
Fig.\ref{tbl:XD:classes}, and represents an instance of a \logu\
in the display.

In concrete terms an XD-object $p$ is a list $(C\; s_1\; [s_2
\cdots])$ whose first element $C$ is a symbol denoting the
XD-class of $p$ and whose other element(s) $s_i$ may be branches
of the tree generating from $p$ (that is, $p$'s {\bf children},
which are XD-objects themselves) or a string which refers to the
part of the user space connected with $p$.

An example of an XD-object of class `attribute' may be:

\begin{code}
\begin{verbatim}
(attribute (attName "length")
           (attValue "25.52cm"))
\end{verbatim}
\end{code}

The XD-classes are listed in Fig.\ref{tbl:XD:classes} together
with the children they may have.  The notation used is:

\begin{itemize}

	\item[] $\to$: ``has as children'';

	\item[] *: ``zero or more'';

	\item[] ?: ``zero or one'';

	\item[] +: ``one or more'';

	\item[] $\mid$: indicates alternative children.

\end{itemize}

\todo{put table and notation in an appendix}

\begin{tblenv}
\begin{tabular}{|lll|}

\hline

seed & $\to$ & header intDTD? (comment $\mid$ PI)* element
(comment $\mid$ PI)* \\

intDTD & $\to$ & {\em string} \\

element & $\to$ & header (element $\mid$ comment $\mid$ PI $\mid$
entRef $\mid$ charRef $\mid$ charData)*\\

header & $\to$ & eleName attList* \\

eleName & $\to$ & {\em string} \\

attList & $\to$ & attribute+ \\

attribute & $\to$ & attName attValue \\

attName & $\to$ & {\em string} \\

attValue & $\to$ & ({\em string} $\mid$ entRef $\mid$ charRef)* \\

comment & $\to$ & {\em string} \\

PI & $\to$ & PITarget PIBody \\

PITarget & $\to$ & {\em string} \\

PIBody & $\to$ & {\em string} \\

charRef & $\to$ & {\em string} \\

entRef & $\to$ & {\em string} \\

charData & $\to$ & {\em string} \\

\hline

\end{tabular}
\caption{Objects in the XD data model. The notation used is
explained in section \ref{sec:theXDDataModel}}
\label{tbl:XD:classes}
\end{tblenv}

A `seed' object is a special kind of 'element' object: it may have
only four attributes in its header (namely ``version'',
``encoding'', ``standalone'' and ``external DTD''), does not have
an element name and its children are limited as defined in
Fig.\ref{tbl:XD:classes}.

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -
\subsubsection{\C The XD-functions}
% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The XD data model provides functions for the manipulation of its
objects. Their names start with {\tt XD-} and follow these naming
conventions:

\begin{itemize}

	\item {\tt XD-\lt...\gt} refers to a function that performs an
	operation on a child, for example {\tt (XD-\lt get\gt\ etree
	'header)} returns the entire object of class {\em header} of
	the object {\tt etree};

	\item {\tt XD-\gt...\lt} refers to a function that performs an
	operation on the contents of a child, for example {\tt (XD-\gt
	get\lt\ etree 'seed 'header 'eleName)} returns the string
	associated with the element name in the header of the seed of
	the object {\tt etree};

	\item {\tt XD-\{...\}} refers to a function that returns a
	list of objects, for example {\tt (XD-\{getall\} elt 'PI
	'comment)} returns a list of all the comments and processing
	instructions contained in the element {\tt elt};

	\item {\tt XD-...-p} indicates a predicate function, as for
	standard Lisp convention, i.e. a function that checks some
	condition and returns \lispnil\ or \lispt.

	\item \todo{...and many more}

\end{itemize}

The functions of the XD data model are described in
Fig. \ref{tbl:XD:functions}.

\todo{put table in an appendix}

\begin{tblenv}
\begin{tabular}{|lll|}
\hline
... & ... & ... \\
\hline
\end{tabular}
\caption{Functions in the XD data model.} \label{tbl:XD:functions}
\end{tblenv}

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -
\subsubsection{\C The XDRE toolkit}\label{sec:theXDREToolkit}
% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The {\bf XDRE} toolkit is a set of string constants which are
regular expressions that match some of the basic building blocks
of XML, defined by the BNF-defs\footnote{Not all the BNF-defs can
be translated into regular expressions, mostly because there is no
trivial way of translating a BNF difference construct, such as in
`(Char - ']')*', which indicates a sequence of zero or more
instances of the BNF production `Char' which are not `]'.}. Their
purpose is to be used in the parsing functions instead of literal
regexps, for readability.

Each constant's name is of the form {\tt XDRE-component} , where
{\tt component} reflects the name of a rule in the BNF-defs.

In constructing the regexp, the symbols {\tt `\lt\lt', `\gt\gt',
`||', `**', `++', `--'} are used in place of {\tt `\back \back (',
`\back \back )', `\back \back |', `*', `+', `?'} respectively.

The table in Appendix \ref{app:XDRE_constants} describes what each
XDRE represents.

%------------------------------------------------------------
\subsection{\C The \etree}
%------------------------------------------------------------

The \etree\ is the structural representation of the file being
edited. It is maintained and manipulated through the facilities
provided by the XD data model

The \etree\ is practically a `seed' XD-object, that is, a list of
objects which are lists themselves. A simple example of an \etree\
object may be:

\begin{code}
\begin{verbatim}
(seed (header (eleName "simple.xml") 
              (attList (attribute (attName "version") 
                                         (attValue "1.0")) 
                              (attribute (attName "encoding") 
                                         (attValue "UTF-8")) 
                              (attribute (attName "standalone") 
                                         (attValue "no")) 
                              (attribute (attName "extDTD") 
                                         (attValue "SYSTEM \"dtdfile.dtd\""))))) 
       (comment "Simple document") 
       (element (header (eleName "root") 
                        (attList (attribute (attName "att1") 
                                            (attValue "val1")))) 
                (charData "This is some character data") 
                (element (header (eleName "child"))) 
                (PI (PITarget "aTarget") 
                    (PIBody "aBody"))))
\end{verbatim}
\end{code}

\todo{change the example etree according to whitespace policy}

This may be extracted from an XML document that looks like:

\begin{code}
\begin{verbatim}
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE root SYSTEM "dtdfile.dtd">

<!-- Simple document-->

<root att1="val1">
This is some character data
   <child/>
   <?aTarget aBody?>
</root>
\end{verbatim}
\end{code}


%------------------------------------------------------------
\subsection{The \ebuf}
%------------------------------------------------------------


%================================================================
\section{Software components}
%================================================================




%------------------------------------------------------------
\subsection{\C The Parser}
%------------------------------------------------------------

The {\bf XMLDoc Parser} ({\bf XDP} for short) takes as input an
Emacs buffer in Fundamental mode (i.e. not in SGML mode, hence
with no meta-information about syntax highlighting) containing an
XML document and extracts the relative \etree.  The Parser checks
the syntax of the document and gives an indication of the error in
case it is not correct.

XDP has been implemented already, and tested informally, so the
following is a description rather than a design specification. The
idea behind XDP is to provide an independent, reusable tool for
parsing XML files according to the XD data model. In some respect,
XDP and the XD data model are quite limited, since they do not
cover all the aspects of an XML document to a high degree of
detail, but they can be useful for any application that, like
Emaxml, needs to represent and manipulate the skeleton of an XML
document for practical purposes.

Emaxml parses an XML document by moving point in the buffer which
contains the XML file. At the current position of point, the
parser expects to find a sequence of characters that corresponds
to one of a series of possible XD-object, according to the
BNF-defs. If such a sequence is found, the relative XD-object is
built ({\bf object extraction}), and point is advanced, otherwise
the parsing is unsuccessful.

The parsing process is a recursive one, so at the end of the day
it consists of placing point at the beginning of the buffer and
trying to extract a `seed' object.

XDP is coded in the {\tt \home /tesi/XDP/XDP.el} file and consists
of:

\begin{itemize}

	\item a set of auxiliary functions (the {\bf XDP Toolkit}),
	that carry out general operations related to parsing;

	\item a set of extracting functions (the {\bf
	XDC-functions}), each of which is concerned with parsing an
	XML component.

\end{itemize}

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -
\subsubsection{\C XDP Toolkit}
% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Parsing and in particular object extraction involve some
elementary operations, provided by the XDP toolkit, that fall in
one of the following categories:

\begin{itemize}

	\item {\bf Matching and skipping}

		The parser often needs to check if the text starting at
		point matches a particular regexp. It may need to retrieve
		it or ignore it. Functions like {\tt XDC-match-minus} or
		{\tt XDC-skip} provide such operations.


	\item {\bf BNF construct handling}

		The objects to be extracted derive from the BNF-defs,
		which are composed of {\em conjuctions} (sequences), {\em
		disjunctions} (selections, `$\mid$') and {\em repetitions}
		(`*', `+', `?'). Functions in this category (such as {\tt
		XDP-and} or {\tt XDP-*}) provide these features.

	\item {\bf Object manipulation}

		Functions in this category provide operations that are
		object-specific such as translating a standard entity
		reference to the corresponding character, or extracting
		information from the prolog of the XML document, or
		building an object from its components.

\end{itemize}
		

Generally speaking, XDP functions try to match the contents of the
buffer at point with something (a regular expression, the result
of one or more other XDP functions, ...) and return what matched.

A return value of \lispt\ means that the requested match was not
found but the function is successful anyway. For example, when
trying to match '0 or more instancies of something', a non-match
is a success nonetheless.

All XDP functions are expected to leave point at the end of what
they matched, or where it was if nothing was matched.

See Appendix \ref{app:XDP_functions} for the list and details of
the XDP functions.

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -
\subsubsection{\C XDC parsing functions}
% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

{\C Every XD-class has a corresponding XDC-function that parses
the text at point and returns an object of that class if one was
there, or \lispnil.}

{\C Moreover, there are several XDC-functions that refer to some
BNF-defs. The object returned by such a function is not in the XD
data model, but is of the same structure of an XD-object. For
example, {\tt XDC-prolog} parses the prolog of an XML document as
defined by the BNF-def number 22.}

{\C A return value of \lispnil\ means that the object was not
recognized at point.}

\todo{all subsequent description is not up to date and must be
rewritten}

Most of XDC functions are straight-forwardly constructed by
reproducing the BNF-def using a combination of XDP functions, XDRE
regexps and XDC functions themselves, e.g.:


\begin{code}
\begin{verbatim}
01 ;; [28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S?
02 ;;                      ('[' (markupdecl | DeclSep)* ']' S?)? '>'
03 ;; doctypedecl -> Name ExternalID? InternalID?
04
05 (defun XDC-doctypedecl ()                                  ;
06   (XDP-build 'doctypedecl                                  ;
07              (XDP-skip "!DOCTYPE" XDRE-S)                  ; '<!DOCTYPE' S
08              (XDC-Name)                                    ; Name
09              (XDP-01 (XDP-and (XDP-skip XDRE-S)            ; (S
10                               (XDC-ExternalID)))           ; ExternalID)?
11              (XDP-01 (XDP-skip XDRE-S))                    ; S?
12              (XDP-01 (XDP-and (XDP-skip "\\[")             ; ('['
13                               (XDC-InternalID)             ; (markupdecl | DeclSep)*
14                               (XDP-skip "\\]")             ; ']'
15                               (XDP-01 (XDP-skip XDRE-S)))) ; S?)?
16              (XDP-skip ">")))                              ; '>'
\end{verbatim}
\end{code}

Lines 1-2 contain the BNF-def as from \cite{w3c}.

Line 3 describes what the object is composed of, i.e. a Name
object, possibly an ExternalId object, possibly an InternalID
object.

Line 6 invokes the XDP-build function to build a `doctypedecl'
object as described by lines 7-16.

Line 7 skips over '\lt!DOCTYPE' and white space.

Line 8 extracts a Name object.

Lines 9 and 10 deal with an optional pair of white space followed
by an ExternalID object, and extract the latter.

Line 11 skips over some optional white space.

...and so on.

\

However, some XDC functions have peculiar features:

\begin{itemize}

	\item {\bf XDC-elementDesc}

	The elementDesc class does not correspond to a BNF-def, it has
	been devised for convenience, since it is directly related to
	the contents of an element header.

	\item {\bf XDC-CharData}

	Whitespace is not considered CharData, so if XDP-build
	returns an object like `((CharData ``{\em whitespace}'')),
	XDC-CharData returns \lispnil.

	\item {\bf XDC-PInstruction}

	The PInstruction class does not correspond to a BNF-def, it
	has been devised for convenience.

	\item {\bf XDC-InternalID}

	The InternalID class does not correspond to a BNF-def, it has
	been devised for convenience. It extracts an internal DTD
	without parsing it. {If in future internal DTDs should be
	parsed, this function and XDC-doctypedecl should be changed.}

\end{itemize}


% - - - - - - - - - - - - - - - - - - - - - - - - - - - -
\subsubsection{Running the Parser}
% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

To run the parser on a document:

\begin{enumerate}

	\item Visit and evaluate (or load) files {\tt
	\~ceepd1/tesi/prg/XD/XD.el} and {\tt \~ceepd1/tesi/prg/XDP/XDP.el}.

	\item Use {\em M-x test-parser RET} and provide the file name.

\end{enumerate}

This will parse the document and output either \lispnil\ (if the
file could not be parsed) or the \etree\ of the document in the
{\em *scratch*} buffer.

%------------------------------------------------------------
\subsection{\C The Writer}
%------------------------------------------------------------

The Writer carries out the opposite of the Parser. It takes an
\etree\ and produces an Emacs buffer. It assumes the \etree\ to be
correct.

\todo{\C Reminder: syntactic check of the \luintdtd. (see
\ref{sec:savingAFile}}

font-lock-keywords
font-lock-syntax-table

%------------------------------------------------------------
\subsection{The Emaxml mode}
%------------------------------------------------------------

Emacs is extended with the new mode. This includes the code
relative to the interactive management of the \ebuf\ and the
\etree, from the basic user functionalities such as hitting a key
to the more elaborate tree manipulation commands.

Some problems to be solved are to implement Emaxml as an Emacs
mode are:

\begin{itemize}

	\item How colors are managed by Emacs and how to control the
	display and the cursor. How to maintain the display up to date
	with the abstract representation held in the \etree.
			
	\item What is the data model for the \ebuf, i.e. its data
	structures and manipulation functions.

	\item How to represent a location on the \ebuf\ (e.g. what
	point will be like? What data structure will a marker be?) and
	how to map it to its corresponding item in the \etree.

\end{itemize}			

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Testing}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%================================================================
\section{Testing the Parser and the Writer}
%================================================================

%------------------------------------------------------------
\subsection{XML-equivalence of two documents}
%------------------------------------------------------------

\todo{rewrite}

Let us consider a buffer $b_0$ containing an XML document. Suppose
the Parser processes $b_0$ producing the \etree\ $e$ then the
Writer processes $e$ obtaining buffer $b_1$. The two buffers $b_0$
and $b_1$ need not be identical at the byte level, but the two XML
documents in them must be {\bf equivalent} in XML terms.

%------------------------------------------------------------
\subsection{Criteria}
%------------------------------------------------------------

The following criteria can be used to test the performance of the
Parser and the Writer.

\begin{itemize}

	\item[(i)] The Parser will be tested on a fully-featured XML
		document, say $d_0$, predicting what the resulting \etree\
		$e_0$ should be. A good example of such a document is
		``bookcase.xml'', in \cite{nut}.

	\item[(ii)] The Writer can be tested on $e_0$. The document
		$d_1$ in the resulting buffer should be XML-equivalent to
		$d_0$.

	\item[(iii)] To prove that the Parser and the Writer
		effectively carry out inverse functions, $d_1$ is fed to
		the Parser again. The resulting \etree\ $e_1$ should be
		{\em identical} to $e_0$\footnote{Actually, if the Parser
		is proved correct, test (iii) proves XML-equivalence
		between $d_0$ and $d_1$ too.}. This process can be
		represented diagrammatically as

$$ \framebox{$d_0$} \stackrel{\mbox{Parser}}{\longmapsto} (e_0)
\stackrel{\mbox{Writer}}{\longmapsto} \framebox{$d_1$}
\stackrel{\mbox{Parser}}{\longmapsto} (e_1) $$
		

	\item[(iv)] The Parser will also be tested on one or more {\em
		incorrect} XML documents. Expected results will be
		compared with the actual ones.

		Note that the Parser is not required to detect an end-tag
		matching a start-tag with a different tag-name as an
		error, since that is not specified in the BNF rules.

\end{itemize}

All tests (i)-(iv) can be automated by setting up a testing
framework.

%================================================================
\section{Testing Emaxml interactively}
%================================================================

Once proved the Parser and the Writer correct, the major mode can
be tested interactively.

A sequence of steps in terms of keystrokes tests a particular
feature. A set of such sequences tests the major mode. Such set
must be devised to cover all the features of Emaxml.

A sequence of keystrokes will be tested against an Emaxml buffer
in known state, and the resulting buffer checked visually first
(this part can be automated very little...), and logically then,
by examining one or more of the resulting \etree\, \ebuf\ and the
file written by saving the buffer (this could in principle be
automated, but may prove expensively long to set up).

The features to tests are those described in the Emaxml
specification, in particular the categories listed in
section~\ref{sec:edop} and the response to the editing commands
listed in Appendix~\ref{app:stdcmd} and Appendix~\ref{app:newcmd}.

As soon as a working prototype of Emaxml is available, prior Emacs
users will be asked to use it and assess how well its goals have
been achieved. Their feedback will allow some problems identified
to be fixed.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Documentation}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Following the Emacs philosophy and spirit, Emaxml is meant to be
an extendible system.

Useful and usable documentation is a key requirement. If only part
of the target functionalities is implemented, but it is well
documented, it will still be a partial success.

In particular, two types of documents are required: code
documentation and Emacs on line help.

%================================================================
\section{Code documentation}
%================================================================

%================================================================
\section{Emacs on line documentation}
%================================================================

%================================================================
\section{The Plan} \label{sec:Plan}
%================================================================

%================================================================
\section{README files}
%================================================================


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
	
\clearpage

\addcontentsline{toc}{chapter}{Bibliography}

\begin{thebibliography}{99}

	\bibitem{w3c} \book{W3C XML Core Working Group}{Extensible
		Markup Language (XML) 1.0 (Second
		Edition)}{http://www.w3.org/TR/2000/REC-xml-20001006}{2001}
	
		This is the official specification of XML. Includes the
		BNF grammar describing the syntax of XML.

	\bibitem{nut} \book{E. Rusty Harold and E. Scott Means}{XML in
		a Nutshell}{O'Reilly}{2001}

		Good discursive explanation of the basics of the various
		aspects of XML, plus a comprehensive coverage of all
		related topics and applications. I found it useful for
		initial documentation, and also as a quick reference.

	\bibitem{info} \book{Free Software Foundation}{Emacs Info
		Manual}{Free Software Foundation}{1999}

		Major source of information about the usage of Emacs. It
		is more than a help on-line; it can be searched in many
		ways and, as far as my experience is concerned, always
		answers one's questions. Moreover, it does not pop up
		unwanted saying that you are writing a letter.

	\bibitem{elisp} \book{B. Lewis, D. LaLiberte and R. Stallman
		and the GNU Manual Group}{Emacs Lisp
		Manual}{http://www.gnu.org/manual/elisp-manual-20-2.5/elisp.html}{1993}

		A book on Lisp, Emacs Lisp, Emacs internals, Emacs Lisp
		libraries. As readable as a novel, as useful as a quick
		reference. Available in a variety of formats including
		Info, which makes it embedded in Emacs.

	\bibitem{extend} \book{B. Glickstein}{Writing GNU Emacs
		Extensions}{O'Reilly}{1997}

		Covers the customization of Emacs from the very basics of
		Lisp to a full major mode implementation. Very rich of
		practical examples paired with Lisp theory.

\end{thebibliography}



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\appendix

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Standard Emacs commands recoded for Emaxml}
\label{app:stdcmd}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

{\bf \underline{Notes}}:

\begin{itemize}

	\item Editing operations are here referred to by their Emacs
	Lisp names, due to lack of space. For an exact definition,
	where these names are not self-explanatory, use
	'F1~f~$<$command$>$' in Emacs.

	\item The {\bf Cat} column refers to the editing categories
	listed in section \ref{sec:edop}.

\end{itemize}


\begin{tblenv}
\begin{tabular}{|l|l|l|p{8cm}|}

	\hline \hline

	{\bf Cat} & {\bf Shortcut} & {\bf Emacs Lisp name} & {\bf
	Action} \\

	\hline \hline

	ID & C-d & delete-char & Delete the following character. No
	action at end of an elementary \logu. \\

	ID & ESC k & kill-sentence & Performs kill-element. \\

	ID & C-y & yank & Described in \ref{sec:yank}. Perform
	``yanking as a child''.\\

	ID & S-\key{insert} & yank & Described in
	\ref{sec:yank}. Perform ``yanking as a sibling''.\\
 
	ID & \key{BACKSPACE} & delete-backward-char & In the middle of
	an elementary \logu, behave as usual. At beginning, behave as
	C-b.\\

	ID & \key{RET} & newline & Multiline \logu: add a newline at
		point.\\

	ID & \key{insert} & overwrite-mode & Usual behavior. \\


	GE & C-l & recenter & Usual behavior \\

	GE & C-\_, C-/ & undo & Usual behavior \\

	KY & C-k & kill-line & Monoline: Kill the rest of the current
	\logl.Multiline: also, if no non-blanks there, kill thru
	newline.\\

	MK & C-@, C-\key{SPC} & set-mark-command & Usual behavior.\\

	MK & C-x C-x & exchange-point-and-mark & Usual behavior. \\

	PM & C-a & beginning-of-line & Move point to beginning of
	current \logl.\\

	PM & C-b, \kl & backward-char & Slide backward.\\

	PM & C-\kd & forward-paragraph & Traverse down. \\

	PM & C-e & end-of-line & Move point to end of current \logl.\\

	PM & C-f, \kr & forward-char & Slide forward.\\

	PM & C-\kl & backward-word & Usual behavior, through user
	space. \\

	PM & C-n, \kd & next-line & Slide to next \logl, with
	\traveller.\\

	PM & C-p, \ku & previous-line & Slide to previous \logl, with
	\traveller.\\

	PM & C-\kr & forward-word & Usual behavior, through user
	space.\\

	PM & C-\ku & backward-paragraph & Traverse up. \\

	PM & \key{end} & end-of-buffer & Usual behavior. \\

	PM & \key{home} & beginning-of-buffer & Usual behavior. \\

	RE & C-x h & mark-whole-buffer & Usual behavior. \\

	RE & ESC @ & mark-word & Usual behavior. \\

	RE & double-mouse-1 & mouse-set-point & Usual behavior, but
	highlighting the region as described in section
	\ref{sec:high}.\\

	RE & drag-mouse-1 & mouse-set-region & Usual behavior. Region
	as described in section \ref{sec:high}.\\

	RE & mouse-1 & mouse-set-point & Usual behavior, but
	highlighting the region as described in section
	\ref{sec:high}.\\

	RE & mouse-2 & mouse-yank-at-click & Usual behavior, but
	yanking done a la Emaxml (see section \ref{sec:yank}.\\

	RE & triple-mouse-1 & mouse-set-point & Usual behavior, but
	region set to the whole \logu, and highlighting as described
	in section \ref{sec:high}.\\

	\hline

\end{tabular}
\caption{New meanings for old commands}
\end{tblenv}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Emaxml commands} \label{app:newcmd}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{tblenv}
\begin{tabular}{l|p{8cm}}

	{\bf Lisp-like non-definitive name} & {\bf Action} \\

	\hline

	create-element-child & Create a blank \luele\ as a child of
	the current smallest element. Described in
	section~\ref{sec:create}.\\

	create-element-sibling & Create a blank \luele\ as a sibling
	of the current smallest element. Described in
	section~\ref{sec:create}.\\

	create-{\em logun} & Create a blank instance of a \logu\ of
	type {\em logun}.\\

	create-leaf & Create a sibling of the current leaf just before the current leaf. 

	traverse-right & Traverse right. Described in
	section~\ref{sec:move}. \\

	traverse-left & Traverse left. Described in
	section~\ref{sec:move}. \\

	mark-more & Described in section \ref{sec:high} \\

	mark-less & Described in section \ref{sec:high} \\

\end{tabular}
\caption{New commands to be implemented}
\end{tblenv}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Notes on yanking transparency} \label{app:yankee}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Establishing yanking transparency for a leaf \logu\ needs finding
a suitable definition for:

\begin{itemize}

	\item[{(}a{)}] where point is left after killing a leaf \logu\
	$p$;

	\item[{(}b{)}] how a leaf \logu\ is inserted when point is at
	(a).

\end{itemize}

I have not been able to find a reasonable pair (a,b) for which
yanking transparency holds.

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=18cm]{deliv1/fig-yank_proof-18x10.eps}
\end{center}
\caption{Applying the definitions}
\label{fig:proof}
\end{figure}

% - - - - - - - - - - - - - - - - - - - - - - - - - - - -
\subsubsection{Example}
% - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The following is an example.

A possible reasonable definition for (a) is:

\begin{itemize}

\item[{(}a1{)}] If $p$ has a peer $q$ below it, at the first
character of $q$'s user space; otherwise if $p$ has a peer $q$
above it, at the first character of $q$'s user space; otherwise at
the first character of $p$'s parent's user space.
 
\end{itemize}

In all case point will be in a \luheader\ after killing. Possible
reasonable definitions for (b) are: When point is in the
\luheader\ of an element $r$, a leaf \logu\ $p$ is inserted

\begin{itemize}

\item[{(}b1{)}] as a peer of $r$ above $r$;

\item[{(}b2{)}] as a peer of $r$ below $r$;

\item[{(}b3{)}] as the first child of $r$;

\item[{(}b4{)}] as the last child of $r$.

\end{itemize}

Consider subtrees in figure \ref{fig:proof}.

Starting from tree (A), pairs of rules are applied to one of {\em
root}'s children. The trees other than (A) show where the pairs of
rules fail with respect to yanking transparency.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Chronicle of a murder} \label{app:killre}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


As described in section~\ref{sec:kill}, the three steps to obtain
the resulting tree after killing $p$ are:

	\begin{itemize}

		\item[(i)] all leaf \logus\ entirely included in $p$ are
		pruned;

		\item[(ii)] all non-leaf \logus\ entirely included in $p$
		are blanked;

		\item[(iii)] all characters in $p$ are eliminated.

	\end{itemize}

Figure~\ref{fig:ata} shows what a highlighted region looks like,
as described in section~\ref{sec:high}, and what the effects are
of each step in the algorithm. The bottom right picture is the
final result.

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=12cm]{deliv1/fig-kill_re-24x20.eps}
\end{center}
\caption{Killing algorithm steps}
\label{fig:ata}
\end{figure}

Step (i) prunes the comment and the {\em meta} element, which are
leaf \logus\ entirely included in $p$.

Step (ii) blanks the \luheader\ of the {\em properties} element,
and the \luelename\ and the \luattname\ of the second {\em meta}
element, which are non-leaf \logus\ entirely included in $p$.

Step (iii) eliminates ``c2.gif'' and ``co'', which are all the
characters in $p$.

Figure~\ref{fig:azzuolo} is a visual representation of the tree
stored by killing $p$.

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=8cm]{deliv1/fig-stored_tree-8x4.eps}
\end{center}
\caption{The stored tree}
\label{fig:azzuolo}
\end{figure}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Details of XDRE constants}\label{app:XDRE_constants}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{tblenv}
\begin{tabular}{l|l|p{8cm}}

	{\bf BNF-def} & {\bf XDRE constant} & {\bf Explanation} \\

	\hline

	[2] Char & XDRE-Char & Unicode character range \\

	[3] S & XDRE-S & White Space \\

	[87] CombiningChar & XDRE-CombiningChar & Among others, this
	class contains most diacritics \\

	[89] Extender & XDRE-Extender & Extenders \\

	[85] BaseChar & XDRE-BaseChar & Among others, this class
	contains the Unicode alphabetic characters of the Latin
	alphabet \\

	[86] Ideographic & XDRE-Ideographic & Unicode ideographic
	characters \\

	[84] Letter & XDRE-Letter & BaseChar's + ideographic
	characters \\

	[88] Digit & XDRE-Digit & Unicode digits \\

	[4] NameChar & XDRE-NameChar & Characters allowed in Names \\

	[5] Name & XDRE-Name & Matches a legal Name \\

	[25] Eq & XDRE-Eq & Equality sign \\

	[68] EntityRef & XDRE-EntityRef & Matches an entity reference
	(eg. `\&amp;cright;') \\

	[66] CharRef & XDRE-CharRef & Matches a character reference
	(eg. `\&amp;\#x040B;') \\

	[19] CDStart & XDRE-CDStart & Matches `\lt![CDATA[' \\

	[21] CDEnd & XDRE-CDEnd & Matches `]]\gt', the CDATA section
	terminator \\

	[69] PEReference & XDRE-PEReference & Matches a Parameter
	Entity (eg. `\%abc;') \\

	[26] VersionNum & XDRE-VersionNum & Matches the version number
	declaration in an Xml Declaration \\

	[81] EncName & XDRE-EncName & Matches the encoding name in an
	Encoding~Declaration \\

	[13] PubidChar & XDRE-PubidChar & Characters allowed in names
	of PubidLiteral's \\

\end{tabular}
\caption{The XDRE toolkit}
\end{tblenv}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Details of XDP functions}\label{app:XDP_functions}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{tblenv}
\begin{tabular}{|l|p{8cm}|}

	\hline

	{\bf XDP} & XDP-* \\

	{\bf Functionality performed} & Checks if point is at 0 or
	more occurrencies of FORM. \\

	{\bf BNF construct handled} & Char* \\

	{\bf Parameters} & FORM - a Lisp form composed of XDP and XDC
	constructs. \\

	{\bf Return} & \lispt\ if 0 occurrencies found. A list of the
	occurrencies found otherwise.\\

	\hline

\end{tabular}
\end{tblenv}

\begin{tblenv}
\begin{tabular}{|l|p{8cm}|}

	\hline

	{\bf XDP} & XDP-01 \\

	{\bf Functionality performed} & Checks if point is at 0 or 1
	occurrencies of FORM. \\

	{\bf BNF construct handled} & \\

	{\bf Parameters} & S? \\

	{\bf Return} & \lispt\ if 0 occurrencies found. The occurrency
	found otherwise.\\

	\hline

\end{tabular}
\end{tblenv}

\begin{tblenv}
\begin{tabular}{|l|p{8cm}|}

	\hline

	{\bf XDP} & XDP-match \\

	{\bf Functionality performed} & Checks if point is looking-at
	RE. \\

	{\bf BNF construct handled} & [\^ \lt \& '] \\

	{\bf Parameters} & RE - a regular expression. \\

	{\bf Return} & The match-string if matched. \lispnil\
	otherwise.\\

	\hline

\end{tabular}
\end{tblenv}

\begin{tblenv}
\begin{tabular}{|l|p{8cm}|}

	\hline

	{\bf XDP} & XDP-match-minus \\

	{\bf Functionality performed} & Checks if point is looking-at
	the difference regexp (RE1 - RE2). \\

	{\bf BNF construct handled} & Name - (('X' | 'x') ('M' | 'm')
	('L' | 'l')) \\

	{\bf Parameters} & RE1 and RE2 - two regexps. \\

	{\bf Return} & The match-string if matched. \lispnil\
	otherwise. \\

	\hline

\end{tabular}
\end{tblenv}

\begin{tblenv}
\begin{tabular}{|l|p{8cm}|}

	\hline

	{\bf XDP} & XDP-match-until \\

	{\bf Functionality performed} & If looking-at 'RE1*TERMINATOR'
	return what matches 'RE1*' and set point at end of it. \\

	{\bf BNF construct handled} & ((Char - '-') | ('-' (Char -
	'-')))* \\

	{\bf Parameters} & RE1 - a regexp.  TERMINATOR - a string.\\

	{\bf Return} & What matches `RE1*' if matched.  - \lispnil\
	otherwise.\\

	\hline

\end{tabular}
\end{tblenv}

\begin{tblenv}
\begin{tabular}{|l|p{8cm}|}

	\hline

	{\bf XDP} & XDP-skip \\

	{\bf Functionality performed} & Skips over a regular
	expression. Used for portions of buffer that don't represent
	any object. \\

	{\bf BNF construct handled} & "'" \\

	{\bf Parameters} & RE - a list of one or more regexps.\\

	{\bf Return} & \lispt\ if RE is matched. \lispnil\
	otherwise.\\

	\hline

\end{tabular}
\end{tblenv}

\begin{tblenv}
\begin{tabular}{|l|p{8cm}|}

	\hline

	{\bf XDP} & XDP-build\\

	{\bf Functionality performed} & Constructs an object by
	putting together the results of the forms in ITEMS. It is
	based on a call to XDP-and whose result is put in a
	one-element list.\\

	{\bf BNF construct handled} & [32] SDDecl ::= ...\\

	{\bf Parameters} & ITEMS - a list of Lisp forms.\\

	{\bf Return} & An object, if all of ITEMS matched. \lispnil\
	otherwise.\\

	\hline

\end{tabular}
\end{tblenv}

\begin{tblenv}
\begin{tabular}{|l|p{8cm}|}

	\hline

	{\bf XDP} & XDP-and \\

	{\bf Functionality performed} & Handles sequences. It also
	deals with forms that return \lispt\ to mean a successful
	non-match, by not appending the \lispt\ to the list
	returned.\\

	{\bf BNF construct handled} & STag content ETag\\

	{\bf Parameters} & FORMS - a list of Lisp forms.\\

	{\bf Return} & A list with the results of evaluating FORMS if
	all non-\lispnil. \lispnil\ otherwise.\\

	\hline

\end{tabular}
\end{tblenv}

\begin{tblenv}
\begin{tabular}{|l|p{8cm}|}

	\hline

	{\bf XDP} & XDP-or \\

	{\bf Functionality performed} & Handles selections.\\

	{\bf BNF construct handled} & Comment | PI | S\\

	{\bf Parameters} & FORMS - a list of Lisp forms.\\

	{\bf Return} & The first form in FORMS that matches what point
	is at, if any. \lispnil\ otherwise.\\

	\hline

\end{tabular}
\end{tblenv}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Files and directories}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\todo{map of the files involved in the project}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{The L-rules}\label{app:lrules}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The following are the rules applied in controlling the insertion of
text by the user into \logus:

\begin{description}

\item[Single word] Only one word is allowed, i.e. no spaces.

\item[


\begin{tblenv}
\begin{tabular}{|l|l|p{8cm}|}

	\hline \hline

	{\bf \Logu} & {\bf Rule} & {\bf Description} \\

	\hline \hline

	\luattname & Single word 


	\hline

\end{tabular}
\caption{New meanings for old commands}
\end{tblenv}


\chapter{TODO}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Unify tipographic styles, e.g. for code, class names, logus etc.

\end{document}
