
\documentclass[a4paper,11pt]{report}
\addtolength{\topmargin}{-1in}
\setlength{\textwidth}{6.0in}
\setlength{\textheight}{9.5in}
\addtolength{\oddsidemargin}{-0.7in}


\newenvironment{code}{\begin{verbatim}}{\end{verbatim}}

\newenvironment{tblenv}
	{\begin{figure}[htbp] \begin{center} \small}
	{\normalsize \end{center} \end{figure}}	

\newcommand{\book}[4]{#1. {\em #2}. #3, #4.}

\newcommand{\todo}[1]{{\Large {\bf TODO}: \normalsize[#1]}}
\newcommand{\todoweb}[1]{{\Large {\bf TODOWEB}: \normalsize[#1]}}


\newcommand{\lt}{$<$}
\newcommand{\gt}{$>$}
\newcommand{\lu}[1]{{\sffamily #1}}

\newcommand{\lz}{$l_0$}
\newcommand{\lo}{$l_1$}
\newcommand{\uz}{$u_0$}
\newcommand{\uo}{$u_1$}
\newcommand{\ut}{$u_2$}
\newcommand{\Tz}{$\mathsf{T_0}$}
\newcommand{\To}{$\mathsf{T_1}$}
\newcommand{\Tt}{$\mathsf{T_2}$}

\newcommand{\jbw}[1]{{\bf [JBW: #1]}}

%								leaf
\newcommand{\luele}	{\lu{Element}}%				x
\newcommand{\luheader}	{\lu{Header}}%			
\newcommand{\luelename}	{\lu{Element name}}%
\newcommand{\luattlist}	{\lu{Attribute list}}%
\newcommand{\luatt}	{\lu{Attribute}}%
\newcommand{\luattname}	{\lu{Attribute name}}%
\newcommand{\luattvalue}{\lu{Attribute value}}%
\newcommand{\luchardata}{\lu{CharData}}%			x
\newcommand{\lupi}	{\lu{Processing Instruction}}%		x
\newcommand{\lupitarget}{\lu{Processing Instruction Target}}%
\newcommand{\lupibody}	{\lu{Processing Instruction Body}}%
\newcommand{\lucomment}	{\lu{Comment}}%				x
\newcommand{\luintdtd}	{\lu{Internal DTD}}%			x
\newcommand{\lucdata}	{\lu{CDATA section}}%			x

\newcommand{\logu}{logical unit}
\newcommand{\logus}{logical units}
\newcommand{\Logu}{Logical unit}
\newcommand{\Logus}{Logical units}
\newcommand{\logl}{logical line}

\newcommand{\ebuf}{Ebuffer}
\newcommand{\etree}{Etree}

\newcommand{\key}[1]{\framebox{#1}}
\newcommand{\kl}{\key{$\leftarrow$}}
\newcommand{\kr}{\key{$\rightarrow$}}
\newcommand{\ku}{\key{$\uparrow$}}
\newcommand{\kd}{\key{$\downarrow$}}

\newcommand{\traveller}{climbing transparency}
\newcommand{\da}{$\doublearrow$}
\newcommand{\dad}{$\rightarrow$}

\usepackage[english]{babel}
\usepackage{graphicx}

\renewcommand{\baselinestretch}{1.5}

\title{Emaxml {\Large }\\ \textbf{\large An Emacs mode for editing
XML}\large }


\author{Paolo Debetto\\ Supervisor: Dr. Joe Wells}

\date{CS4 Dissertation\\ Deliverable one}

\begin{document}

\maketitle
\newpage
\tableofcontents


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Introduction}

\begin{quote}
Emaxml is an extension of Emacs, written in Emacs Lisp, to edit XML
documents. Major Emacs modes for editing SGML and XML already
exist; this is different in that it allows viewing the document as
a tree structure, both visually and logically.

\end{quote}

%----------------------------
\section{Emacs}
%----------------------------

\subsection{Editing modes}

Editing any particular type of document requires very often a
specialized editor that allows the user to perform some peculiar
processing, or that automatically changes the document layout, or that
displays the text in a way which is different from how the text is
actually stored.

For instance, editing C code and editing a text document are two very
different activities. Paragraph structure is not important when
editing code; indenting each line according to its syntax is not
important when writing a letter.

Emacs is not simply an editor, it is a code editor, a text editor, a
\LaTeX\ editor, a structured outline editor, a directory editor, a tar
file editor, an email editor, and a hundred others, not least an SGML
(and XML) editor. Emacs deals with each type of document by being in
the appropriate editing {\bf mode}.

The basic Emacs core consists of a set of capabilities such as
managing buffers, windows, files, the cursor, etc., plus a Lisp
interpreter, and was written in C. Emacs modes are extensions to the
core, and are written in Lisp, or, rather, in Emacs Lisp, the Emacs
dialect of Lisp.

\subsection{Emacs concepts}

I assume the reader of this document to be averagely familiar with the
functioning of Emacs from a user's point of view. A few key Emacs
characteristic features that are particularly relevant to my project
are briefly defined here. For detailed information refer to the Emacs
Info manual by pressing 'F1 i' in Emacs; the following definitions are
mainly summarized from there.

\begin{itemize}

	\item {\bf Buffer}: the basic editing unit; one buffer
     	corresponds to one text being edited. There may be several
     	buffers, but at any time only one is being edited, the
     	`selected' buffer,

	\item {\bf Frame}: an X Window System window in which Emacs is
	running. (The following definition for an {\em Emacs window}
	refers to subdivisions of one frame.)

	\item {\bf Window}: Emacs can split a frame into two or many
	windows.  Multiple windows can display parts of different
	buffers, or different parts of one buffer.

	\item {\bf Point}: the location at which editing commands will
	take effect. In the current buffer, the cursor shows where
	point is.

	If several files are being edited in Emacs, each in its own
	buffer, each buffer has its own point location. 

	A buffer that is not currently displayed remembers where point
	is in case it is displayed again later.  If the same buffer
	appears in more than one window, each window has its own
	position for point in that buffer.

	The point is also one end of the {\em region} (see below).

	\item {\bf Cursor}: The cursor is the rectangle on the
     	selected buffer that indicates the position of the point. The
     	cursor is on the character that follows point.  Often people
     	speak of `the cursor' when, strictly speaking, they mean
     	`point'.

	\item {\bf Mark}: an abstract pointer to a position in the
     	text. The user can set it to specify one end of the region
     	(see below), point being the other end. Each buffer has its
     	own mark.

	\item {\bf Marker}: a specialized Emacs internal data
	structure that defines a location in a buffer in terms of a
	pair $(\mathit{buffer},\mathit{location})$. It is worth noting
	that a marker follows the text as editing changes are
	made. Specifically, if text is deleted or inserted before the
	marker, the marker's position (an offset from the beginning of
	the buffer) is adjusted.

	\item {\bf Mark Ring}: used to hold several recent previous
     	locations of the mark, just in case the user wants to move
     	back to them.  Each buffer has its own mark ring; in addition,
     	there is a single global mark ring.

	\item {\bf Region}: The region is the text between point and
     	the mark.  Many commands operate on the text of the region. If
     	a portion of text is highlighted with the mouse, that becomes
     	the region and point and the mark are updated accordingly.

\end{itemize}

All these concepts, plus many others, are common to all Emacs
applications and an Emacs user will expect Emacs to behave
consistently in a new mode. Thus, the features of Emaxml must be
designed to meet what a typical Emacs user would instinctively try to
do in order to accomplish a task.

%------------
\section{XML}
%------------

I assume the reader to have a basic knowledge of XML. The purpose of
this section is not to give an exhaustive description of XML, but to
point out a few XML features that are important in the discussion of
the details of Emaxml. For further information refer to \cite{w3c},
\cite{nut}.

A good web page for quick basic information is at {\em
http://www.w3.org/XML/1999/XML-in-10-points}.
\begin{itemize}

	\item XML is a {\em syntax} which uses tags to allow tree
	structures to be written as a sequence of characters.

	\item An XML document is basically a {\em tree} structure,
	composed of a {\em root element} and (possibly) its {\em
	children}.

	\item The information in an XML document is meant to be
	processed by a {\em client application}, which may be very
	specific to a particular task. For example, a library managing
	application may keep the data base for books, users, etc. in
	XML format.

	\item The set of XML tags is not finite. A set of tags and
	their dependencies can be defined for each class of
	documents.

	\item Text consists of intermingled character data and markup. 

	{\bf Markup} takes the form of start-tags, end-tags,
	empty-element tags, entity references, character references,
	comments, CDATA section delimiters, document type
	declarations, processing instructions, XML declarations, text
	declarations, and any white space that is at the top level of
	the document entity (that is, outside the document element and
	not inside any other markup). 

	All text that is not markup constitutes the {\bf character
	data} ({\bf CharData}) of the document.

	\item Some characters such as the '$<$' sign are not allowed
	in an XML document other than for markup and in comments and
	CDATA sections. Where these characters are needed, special
	sequences called {\bf character references} are used. For
	example, '\&lt;' is the character reference for '$<$'.

\end{itemize}

As a matter of fact, XML was born and is mainly used to store
structured information, even though its flexibility makes it also a
highly powerful format for professional text editing.

%--------------------------------------------
\section{Existing Emacs modes for editing XML}
%--------------------------------------------

In Emacs 20.7.1, XML documents are edited under {\em SGML mode},
SGML being a predecessor of XML.

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=12cm]{fig-sgml_mode-12x8.eps}
\end{center}
\caption{Screenshot of SGML mode}
\label{fig:SGML}
\end{figure}

In Fig.\ref{fig:SGML} an XML file is being edited. The approach used by SGML
mode is that of syntax highlighting. Tag names, attribute names,
attribute values and the actual information (i.e. the {\em CharData})
are in different colors.

The tree structure of the document is not taken in account by Emacs,
and the indenting is left to the user. For instance, the TAB key
pressed in the middle of a line does not perform automatic indentation
as typical in other Emacs modes, and pressed at the beginning of a
line indents it as the previous line.

Many facilities are provided for manipulating elements.

Another Emacs mode suitable for editing XML documents is PSGML. It has
additional features such as support for indentation which corresponds
to the logical structure of an XML document.

PSGML is not part of the standard distribution of Emacs 20.7.1, but
must be obtained separately.

%\begin{itemize}

%	\item blinking matching of ``\gt'' with the previous ``\lt'';

%	\item ``electrification'' of keys ``\lt'', ``\&'', ``SPC''
%	within tag names, and of ``~{\texttt "}~'' and ``~'~'' anywhere else;

%	\item optional automatic capitalization of tag names;

%	\item validation of the document with an SGML parser;

%	\item a help facility that displays a description of a tag;

%	\item deletion of a tag (the start tag and the end tag are
%	removed, while everything in between is added to the
%	containing element);

%	\item keystrokes for moving forward and backward between
%	elements;

%	\item interactive insertion of attributes, where the possible
%	attributes for a particular tag can be customized;

%	\item interactive customizable insertion of a tag;
	
%\end{itemize}

%--------------------------------
\section{Similar systems}
%--------------------------------

Emaxml (as it should be when developed further) falls in the
software category of non-commercial XML editors.

In particular, other similar existing systems may or may not offer a
graphical view of the tree, and may or may not check that the user is
building a tree consistent with the DTD.

Creating Emaxml is in my opinion justified by the fact that it would
give Emacs a visual XML mode, which means that Emaxml is not just
another XML editor, but an integrated part of the most powerful editor
around. The quantity and quality of documentation available about
programming Emacs also mean that Emaxml could possibly be perfected by
anyone feeling so.

I have examined three similar programs, all from the open source
community.

\subsection{Conglomerate editor}
%- - - - - - - - - - - - - - - -

Conglomerate \footnote{Conglomerate home page is at
www.conglomerate.org\ .} is not a simple XML editor. Actually, XML is
not even mentioned in the web pages, from which the following
description is extracted:

\begin{quote}

	``Conglomerate is a complete system for working with
	documents. It lets the user create, revise, archive, search,
	convert and publish information in several media, using a
	single source document.''

\end{quote}

This project has ambitious goals:

\begin{quote}

	``To [reach out to] a wide audience, from would-be Word users to
	techies.

	Simplify simultaneous publishing of information in a range of
	output formats (print and online) from a single source.

	Replace the WYSIWYG document processing paradigm with a
	separated structure/appearance approach, even for simple
	tasks.''

\end{quote}

From Conglomerate I have taken the idea for the expanded view in
Emaxml, as can be seen from figure~\ref{fig:conglo}.

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=4cm]{fig-conglo-10x4.eps}
\end{center}
\caption{Conglomerate frontend}
\label{fig:conglo}
\end{figure}


\subsection{GETOX}
%-----------------

\begin{quote}
	``This software aims at giving users the ability to write XML
	files without having advanced knowledge of XML concepts. It
	should also allow users to produce valid documents at any
	time.''
\end{quote}

At the present stage of development, GETOX\footnote{GETOX home page is
at http://idx-getox.idealx.org/index.html} allows the user to add and
remove tags according to the DTD, edit text in PCDATA, and other
editing operations. It does not support yet cutting/pasting parts of
the XML tree and editing attributes.


\subsection{XED}
%----------------

\begin{quote}
	``XED\footnote{XED home page is at
	http://www.ltg.ed.ac.uk/~ht/xed} is a text editor for XML
	document instances. It is designed to support hand-authoring
	of small-to-medium size XML documents, and is optimised for
	keyboard input. It works very hard to ensure that you cannot
	produce a non-well-formed document. Although it does not
	validate, the results of offline validation can be accessed,
	and it does read DTDs and keep track of your document
	structure, and provides context-based accelerators to make
	element and attribute entry fast and easy.''
\end{quote}

XED offers facilities for editing the raw XML code.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Emaxml specification}

The main goal of Emaxml is to provide the Emacs user with a view of
an XML document as a tree, and with a set of facilities for manipulating
it handily.

This can be achieved by creating the Emaxml mode, which should hide
the XML syntax by displaying the document in a pseudo-graphical,
customizable, hierarchical fashion and automate the most common or most
tedious actions involved in the editing of an XML document. 

A very high number of useful editing functionalities can be devised
for Emaxml mode. Moreover, an XML editor can enforce a certain degree
of control over the semantics of the tree, i.e. can be aware or not of
the DTD for the document being edited and help the user in a number of
ways by exploiting this knowledge.

In sections \ref{sec:cmds}\ and \ref{sec:ctrl} I arbitrarily set the
initial target for my project in terms of functionalities and in terms
of the level of control implemented, respectively. However, it is a
strong requirement that the core of the mode be coded and documented
in a way to facilitate later extension in both directions.

The definitions given in {\bf bold} in this chapter do not refer to
standard terminology, but are specific to Emaxml.


%---------------------------
\section{The look of Emaxml}
%--------------------------- 

Figure \ref{fig:Emaxml} shows what an XML document should look like when
it is being edited with Emaxml\footnote{For comparison, the portion
of document displayed is the same as Fig.\ref{fig:SGML}.}. It also
shows the terminology for the logical units, which are the subject of
section \ref{sec:lu}.

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=16cm]{fig-Emaxml_lu-16x13.eps}
\end{center}
\caption{Screenshot of Emaxml, with the indication of the Logical Units}
\label{fig:Emaxml}
\end{figure}

The visual effect of a tree structure is obtained by using one color
for each level of depth. Children are enclosed inside their parent,
and attribute names and values are put together with the element name
in the header lines for that element.

The topmost element is called the {\bf seed element}\footnote{Because
it comes before the root element...}, and does not correspond to an
actual XML element; it contains information relative to the document,
such as that contained in the XML declaration. From the user's point
of view, the header of the seed element is treated as a normal header
whose element name is the name of the file, and whose attributes are
the said information.  None of the fields contained in this header are
compulsory \footnote{ In fact, the BNF production that defines the
prolog is
\begin{center}
[22] $prolog$ ::= $XMLDecl$? $Misc$* ($doctypedecl$\ $Misc$*)?
\end{center}
}, although they are very common. So they will appear by default when
visiting a new document, but, if left blank, they will not be written
in the file.

%---------------------------------
\subsection{Logical display units}\label{sec:lu}
%---------------------------------

To describe some of the features of Emaxml, the concept of an Emaxml
logical unit is introduced here.

A {\bf logical unit} is a portion of an Emaxml buffer that corresponds
to some XML syntactic component.  At any moment during editing point
will be in one or more logical units. Also, a particular location in
the text, such as that defined by the mark, can be said to be in a
particular logical unit.

The response of Emaxml to a user's input will vary depending on what
type of logical unit the cursor is in at any moment, and the editing
operations will have different meaning depending on whether they are
performed on a region which is completely included in an instance of a
logical unit or not.

The following are the possible logical units \footnote{Throughout this
document, logical unit names are in \lu{Sans Serif} font}:

\begin{itemize}

	\item \luele : a whole element, including its header and
	its children;

	\item \luheader : an element's name and its attributes;

	\item \luelename : the first word of a \luheader;

	\item \luattlist: the set of attributes of an
	element and their values;

	\item \luatt: an attribute's name and its value;

	\item \luattname: an attribute's name;

	\item \luattvalue: an attribute's value;

	\item \luchardata: the plain text;

	\item \lupi: a processing instruction;

	\item \lupitarget: the target of a processing instruction,
	i.e. the first word of a \lupi;

	\item \lupibody: the command of a processing instruction;

	\item \lucomment: a comment text;

	\item \luintdtd: the text relative to the definition of the
	internal DTD;

	\item \lucdata: the text of a CDATA section.

\end{itemize}

There may only be one instance of a \luintdtd\ and it must be a child
of the seed element. In the text of the \luintdtd\ the user can
actually write whatever s/he wants, since it is not parsed.

\luele, \luheader, \luattlist\, \luatt\ and \lupi\ are {\bf compound}
\logus, that is, they include smaller \logus (e.g. an \luattlist\ is
composed of one or more \luatt s, which in turn are composed of
\luattname s and \luattvalue s). The remaining \logus\ are self
contained and are called {\bf elementary} \logus.

Not all locations in the display belong to a logical unit. For
instance, the colored spaces at the left of an \luelename\ or the
colon following an \luattname\ are not part of any logical unit. All
such locations and their contents are said to be {\bf automatic},
because they are managed by Emaxml and cannot be reached by the
user. Non-automatic locations of the buffer form the {\bf user space}
of some \logu.

\luintdtd, \lucomment, \lupi\ and \lucdata\ are {\bf special} logical
units: their headers are automatic.

Special \logu s and \luchardata\ are {\bf multiline} \logus: they may
contain {\em newline} characters and are displayed accordingly. The
other elementary \logus\ are called {\bf monoline}.

A {\bf logical line} is either one line (up to a newline) of a
multiline \logu\ or an entire monoline \logu.

The {\bf previous} logical line with respect to a \logl\ is the
first logical line that is encountered by going left and up.  The {\bf
following} logical line with respect to a \logl\ is the logical
line that is encountered by going right and down.

\Logus\ that correspond to those XML components which may be leaves of
the tree (i.e. \luele, \luchardata, \lupi, \lucomment, \lucdata) are
called {\bf leaf \logus}.

Entity references will appear in Emaxml as part of normal CDATA, since they
are considered only as syntactic constructs.

A portion of an Emaxml buffer delimited by two locations \lz\ and \lo\
both in user space is called a {\bf unit}\footnote{The term {\em unit}
is used instead of {\em region} for consistency with the term {\em
logical unit}.} and can be:

\begin{itemize}

	\item a {\bf string}, if \lz\ and \lo\ are both inside the
	same elementary \logu;

	\item a logical unit, as said, if \lz\ and \lo\ are the first
	and the last location respectively of the same \logu;

	\item an {\bf illogical unit} \footnote{The adjective {\em
	illogical} is here chosen on purpose, to emphasize the
	apparent weirdness of a type of structure that instead may
	sometimes come quite handy.}, if \lz\ and \lo\ belong to two
	different \logus.

\end{itemize}


%-------------------------------------------------------------
\section{The behavior of Emaxml from the user's point of view} \label{sec:cmds}
%-------------------------------------------------------------

This section specifies Emaxml requirements. It is subdivided into two
main parts: the description of what the common editing operations mean
under Emaxml mode and the definition of the new editing operations
specific to XML manipulation.

\subsubsection{Editing levels}
%- - - - - - - - - - - - - - - 

In editing an XML document, a basic operation (such as insertion or
killing) can be performed at three different levels:

\begin{itemize}

	\item At the {\bf character level} the operation involves an
	Emaxml string.

	\item At the {\bf logical level} the operation is
	performed over an entire logical unit, for example the
	deletion of an element.

	\item At the {\bf illogical level} the operation is
	performed on an illogical unit.

\end{itemize}


\subsection{Standard Emacs editing operations {\em \`{a} la} Emaxml} 
\label{sec:edop}
%- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

The typical Emacs user will expect a number of standard editing
facilities to be available in Emaxml mode, and their behavior to
parallel that of other modes.

As said, my project has as a target the redefinition of a finite set
of operations only. This is a subset of the list obtained by pressing
'F1 b' in a {\em Fundamental} buffer in Emacs. It is not as important
to implement a large number of functionalities immediately as it is to
design the code in a modular fashion and document it properly, and
choose at least a few operations from each of the following
categories:

\begin{itemize}

	\item {\bf PM}: point movement;

	\item {\bf ID}: insertion and deletion;

	\item {\bf MK}: mark operations;

	\item {\bf RE}: region setting;

	\item {\bf SR}: search \& replacement;

	\item {\bf GE}: general common operations.

\end{itemize}

Appendix~\ref{app:stdcmd} lists the minimum set of commands to be
implemented for Emaxml as part of this project.

\subsubsection{Visiting a file}
%- - - - - - - - - - - - - -

Emaxml mode will be activated on visiting a file with extension {\em
.xml} and when a buffer containing a file with extension other than
{\em .xml} is saved with extension {\em .xml}\footnote{In Emacs, if a
buffer is saved with 'C-x C-w' ('Save Buffer As...' in the Files menu)
with a different extension, its major mode changes accordingly.}. In
the latter case, the buffer needs be re-displayed also.  In case the
file does not already exist, the buffer will contain the seed element, blank, 
as described in section \ref{sec:lu}.

\subsubsection{Highlighting the region} \label{sec:high}
%- - - - -- - - - -    - - -  - -  -   -  -  -  -  -  -  - 

When using the mouse to highlight the region, or in Transient Mark mode
\footnote{In Transient Mark mode, when the mark is active, the region
is highlighted.}, only the locations of user space included in the
region will be highlighted.

In figure \ref{fig:ata}-(0) the region highlighted is {\em illogical},
i.e. it starts in an elementary \logu\ and terminates in a different
one.

Whenever a leaf \logu\ is completely included in the highlighted
region, all of its physical space will be colored, including the non-user
space, otherwise only the included user space will be colored. In
figure~\ref{fig:ata}-(0), the comment and the first
meta element are examples of the first case, the image,
properties and the last meta elements are examples of the
second case.

Two useful commands will be  implemented in Emaxml: mark-more and
mark-less.  The first is to expand the region to the next bigger \logu\
containing the region (or containing point if mark is undefined). The
second will do the opposite. For example, if the region is currently a
string in an \luattvalue, mark-more sets region to the containing
\luatt. Further calls to mark-more set region to the containing
\luattlist, then to the \luheader, then to the \luele, and so on up to
the entire buffer. Calls to mark-less will always select the first child
when there is more than one.

\subsubsection{Text search and replacement}
%-  -  -   -   -  -   -  -  -    -  -   -  -

Text search and replacement in Emaxml will operate on the user space
only.  Both string and regular expression search and replacement will
be implemented.

\subsubsection{Moving point} \label{sec:move}
%-  -  -  -  -  -  -  -  -  -  -  -  -  -  - 

Movement in Emaxml acquires different meanings (and names) depending on which
level is performed at:

\begin{itemize}
	
	\item At character level: {\bf sliding}.

	\begin{itemize}

		\item {\bf Horizontal sliding}: moving by one character. 

		Point can slide inside the current \logl\ one
		character backward or forward.

		Sliding point backward when at the beginning of a \logl\
		moves it to the end of the previous \logl.

		Sliding point forward when at the end of a \logl\ moves
		it to the beginning of the previous \logl.

		\item {\bf Vertical sliding}: moving by \logl s.

		When performing a vertical sliding movement
		(e.g. 'next-line'), point is expected to behave as
		closely as possible to the usual behavior.  A movement
		that starts at the $l$th location of a \logl, and ends
		at a different \logl, will end at the $l$th location
		of the arrival \logl\ if that has at least $l$
		characters, or at its last location otherwise.
		However, $l$ is remembered for successive vertical
		movements.  This Emacs standard property I will call
		{\bf \traveller}, because it is proved for an editor
		by demonstrating that performing a vertical movement
		in one direction followed immediately by a vertical
		movement in the opposite direction, point will always
		return to the initial location.

		Point can slide one \logl\ up or down with \traveller.

	\end{itemize}

	\item At the logical level: {\bf traversing}.

	Traversing (the XML tree) refers to moving from one XML
	component to another, hence the \logus\ involved are:
	\luheader, \luchardata, \lupi, \lucomment, \luintdtd, \lucdata.

	Traversing will be implemented with no \traveller; point is
	always moved to the first character of the arrival \logu.

	\begin{itemize}

		\item {\bf Horizontal traversing}\footnote{The choice
		of adjectives {\em horizontal} and {\em vertical} for
		traversing is due to the fact that in an Emaxml buffer
		the parent of a \logu\ can always be thought of as
		being ``on the left'', while a peer is always up or
		down.}: moving hierarchically.

		Point can traverse left from one of the said \logus\
		to its XML parent.

		Traversing right can only be done from a \luheader\ to
		the first child of the current element, if any.

		\item {\bf Vertical traversing}: moving to peers.

		Point can traverse from one of the said \logus\
		(except the seed element, which has no peers) to the
		next or previous instance of the same \logu\ found in
		the buffer, regardless of them being sibling.

	\end{itemize}

\end{itemize}

\subsubsection{Mark and the mark ring}
%-    -     -     -      -     -   -

The mark ring will be implemented in Emaxml, consistently with its
standard definition and functionality. Obviously, the internals
relative to such implementation will be specific to Emaxml; markers
have to be objects of a different data structure, since they must
describe a location in terms of the tree.

\subsubsection{Killing and yanking in Emacs} 
%- - - - -- - - - -    - - -  - -  -   -  -  -  -  -  -  - 

{\em Killing} and {\em yanking} in Emacs jargon mean {\em cutting} and
{\em pasting} respectively.

In standard Emacs operation when a portion of the buffer is killed it
is deleted from the buffer and remembered by Emacs for later use.  One
property (I shall call it {\bf yanking transparency}) of an Emacs
buffer is that if some text is killed and immediately yanked, the
buffer does not change. Point may be in a different location, though.

Yanking is an insertion operation; the portion of buffer being yanked
is inserted before point.

\subsubsection{Killing in Emaxml} \label{sec:kill}
%-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - - -  - -   

Many Emacs commands exist for killing. The ones considered here are:
kill-line, kill-region, kill-word, kill-buffer. Their names are
self-explanatory.  They are all implemented in Emaxml, the only one
that changes its meaning being kill-line, which kills a \logl\ instead
of a normal line. One further killing command is implemented in
Emaxml, kill-element.

Since the object of the editing in Emaxml is a tree, the killing and
yanking operations must be redefined in terms of \logus. For instance,
what happens if the user tries to yank an \luattlist\ into a \luchardata?

Let us consider a portion $p$ of an Emaxml buffer being killed that
starts from location \lz\ in leaf \logu\ \uz\ and ends at location
\lo\ in leaf \logu\ \uo.

Emaxml will store killed \logus\ and il\logus\ as {\em trees} in the kill
ring. For $p$ a \logu, the stored tree will be the tree rooted at \uz\
and will include all its children. For $p$ an il\logu, the stored tree
will be rooted at the smallest \logu\ $s$ that include all the \logus\
totally or partially included in $p$, minus the children of $s$ not in $p$
and minus the portions of user space of $s$ not in $p$. This implies that the
stored tree may have some blank \logus\ in it.

Killing $p$ has the following effects:
\begin{itemize}

	\item[(1)] If $p$ is a string, point is left at \lz.

	\item[(2)] If $p$ is a leaf \logu\, then point is left:

	\begin{itemize}

		\item if $p$ has a peer $q$ below it, at the first
		character of $q$'s user space;

		\item otherwise if $p$ has a peer $q$ above it, at the
		first character of $q$'s user space;

		\item otherwise at the first character of $p$'s
		parent's user space.

	\end{itemize}

	\item[(3)] If $p$ is a non-leaf \logu, then it is one of the
	elementary \logus\ in an \luele; a blank \logu\ of the same
	type as \uz\ is inserted in place of $p$ with point at its
	first location.

	\item[(4)] If $p$ is an il\logu, then point is left as in
	policies (1), (2), (3) but substituting ``the portion of \uz\
	in $p$'' for $p$.

	When killing an il\logu, Emaxml rebuilds the tree as follows,
	in order\footnote{For an example, see Appendix~\ref{app:killre}}:

	\begin{itemize}

		\item[(i)] all leaf \logus\ entirely included in $p$
		are pruned;

		\item[(ii)] all non-leaf \logus\ entirely included in
		$p$ are blanked;

		\item[(iii)] all characters in $p$ are eliminated.

	\end{itemize} 
	
\end{itemize}

\subsubsection{Yanking in Emaxml} \label{sec:yank}
%-  -  -  -  -  -  - - -  -  -  -   -  -    -   -  

In Emaxml, yanking will leave point after the last location of the user space
of the yanked unit, i.e. the cursor will be under the first character of
the user space after the yanked unit.

Yanking a non-string unit involves somehow tree manipulation. The only
limitation posed by Emaxml is that non-leaf \logus\ cannot be yanked
other than in a \luheader\. Since both logical and illogical units are
complete subtrees, there is no difference in yanking them.

Let us consider a unit $p$ stored by Emaxml in the kill ring.

If $p$ is a leaf \logu, then yanking transparency does not hold
\footnote{See Appendix~\ref{app:yankee} for a discussion of this.}, so
$p$ must be explicitly yanked as a peer or as a child. If yanked as a
peer it is inserted before the current leaf, if yanked as a child it
is inserted as its last child.

Yanking $p$ into some \logu\ \ut\ is as follows:

\begin{tblenv}
\begin{tabular}{l|l|p{5cm}}
\hline

	{\bf $p$ is a} &
	{\bf \ut\ is a} &
	{\bf Yanking $p$ into \ut} \\

\hline

	string &
	elementary \logu &
	$p$ can be yanked anywhere in \ut \\

	leaf \logu &
	\luheader &
	$p$ can be yanked either as child or
	as peer of the element which \ut\ belongs to\\

	leaf \logu &
	leaf \logu &
	$p$ yanked as a peer of \ut \\

	\luattlist\ or \luatt &
	\luheader &
	$p$ inserted at end of \ut's \luattlist \\
	
	
\hline
\end{tabular}
\caption{Yanking criteria for $p$ not illogical.}
\end{tblenv}


%- - - - - - - - - - -  - - - - - - - -  - - - - -
\subsection{Emaxml specialized editing operations}
%- - - - - - - - - - - - - - - - - - - - - - - - -

Appendix~\ref{app:newcmd} lists the minimum set of new commands
specific to Emaxml to be implemented as part of this project.
They are described in the following sections.

\subsubsection{Creating a new instance of a logical unit} \label{sec:create}
%- - - - -- - - - -    - - -  - -  -   -  -  -  -  -  -  - 

Only new instances of a leaf \logu\ or of an \luatt\ can be created.

Creating a new instance inserts a {\bf blank} \logu, described as follows:

\begin{itemize}

	\item a blank \luattname\ is represented by an automatic colon;

	\item a blank \luattvalue\ is an empty string;

	\item a blank \luatt\ is a blank \luattname\ and a blank \luattvalue;

	\item a blank \luattlist\ is a blank \luatt;

	\item a blank \luelename\ is represented by an automatic space;

	\item a blank \luheader\ is a blank \luelename\ and a blank \luattlist;

	\item a blank \luele\ is a blank \luheader;

	\item a blank \luchardata\ is three lines with white background;

	\item a blank instance of the remaining leaf \logus, is the
	appropriate lateral heading and a line with white background.

\end{itemize}

An \luatt\ can only be created when point is on the \luheader\ of a
non-blank \luele. If point is on another \luatt, this and the other
ones are shifted down.

A leaf \logu\ can be created as a child of an element or as a sibling
of a leaf \logu. A new child is inserted as the last child of the
element, while a sibling is inserted in place of the leaf component.

\subsubsection{Adjusting the displaying of the tree structure}
%- - - - - - - - - - - - - - - - - - - - - - - - - - - - -

An element (and the relative subtree rooted at it) can be displayed
{\bf outline} or {\bf inline} (that is, vertically or horizontally)
and {\bf expanded} or {\bf collapsed} (that is, completely visible or
displayed as the element name only).

These characteristics are independent so there are four ways of
displaying a subtree (see Fig.\ref{fig:ilol}), called {\bf display
modes}: outline-expanded, outline-collapsed, inline-expanded,
inline-collapsed.

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=18cm]{fig-tree_views-24x12.eps}
\end{center}
\caption{Display modes}
\label{fig:ilol}
\end{figure}

The {\bf display state} of a subtree is defined by the display mode of
all the elements it is formed by.

The following statements define the manipulation of the visual tree:

\begin{itemize}

	\item The tree structure is by default displayed entirely
	outline-expanded when the document is initially visited.

	\item Making an outline subtree inline makes all its children
	temporarily inline, and does not change its or its children's
	expansion mode.

	\item Expanding a collapsed subtree brings it back to the
	display status it was before being collapsed (i.e. all its
	elements return to their previous display mode).

	\item The root element and the seed element cannot be made
	inline or collapsed.

	\item The \luheader\ of a collapsed or inline element is not
	user space.

\end{itemize}

An optional further development may be that the entire document
display status be saved along with the file (e.g. encoded somehow
inside the document) and restored the next time the document is
visited in Emaxml mode.

%-----------------------
\section{Control issues} \label{sec:ctrl}
%-----------------------

Emaxml, at the stage of development I set as target for my project,
will not enforce control over the tree structure created by the user
or read from an XML file in terms of the DTD. 

Emaxml will see the document ``simply'' as a syntactic structure that
must comply with the grammar set out by the BNF rules in
\cite{w3c}. Note that the internal DTD is {\em not} parsed. The user
will be able to manipulate the tree as s/he wants without being
bothered with indentation. The ``less-thans'', quotes, and other
syntactic sugar of XML will be hidden. This should hopefully let the
user concentrate more on the contents, but, on the other end, the user
will also be able to create documents that are not well-formed, or
invalid.  The type of control enforced by Emaxml is syntactic only,
but it is of a very high degree: the goal of the project is that it
should be impossible to produce a document which does not comply with
the BNF rules in \cite{w3c}.

However, the actual code implementing the editing mode must be
designed so that adding semantic awareness and control should be
easy. 

Examples of events that may trigger a contents check or other
semantics-related operations are:

\begin{itemize}
	
	\item the contents of a non-leaf elementary \logu\
	(i.e. \luelename, \luattname, \luattvalue) is changed
	at the character level;
		
	\item point leaves an elementary \logu.

	\item the tree is changed; this includes creation of instances
	of any \logu, killing and yanking at the logical and illogical
	levels,

	\item the contents of the seed element are changed;

	\item the contents of the \luintdtd\ are changed.

\end{itemize}

These and other such events must be easily recognizable and
exploitable from the programmer's point of view.



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Development plan}

\section{Main components}

The project is to be implemented as an extension to Emacs, i.e. as an
Emacs mode. Thus, it is to be coded in Emacs Lisp.

The object of the editing, an XML file, will be seen by Emaxml in two
ways at the same time: as an Emacs buffer (the {\bf \ebuf}), used to
display the visual representation of the tree, and as a list (the {\bf
\etree}), structured as to allow an abstract, logical view of the
document and appropriate manipulation.

The system has been divided in four major components, to be developed
in sequence:

\begin{itemize}

	\item[1] {\bf Parser}.

		The Parser takes as input an Emacs buffer in
		Fundamental mode (i.e. not in SGML mode, hence with no
		meta-information about syntax highlighting) containing
		an XML document and extracts the relative \etree.  The
		Parser checks the syntax of the document and gives an
		indication of the error in case it is not correct.

		Some details of the implementation of the Parser are
		given in section~\ref{sec:done}.

	\item[2] {\bf Writer}.

		The Writer carries out the opposite. It takes an
		\etree\ and produces an Emacs buffer. It assumes the
		\etree\ to be correct.

	\item[3] {\bf Emaxml mode}.

		Emacs is extended with the new mode. This includes the
		code relative to the interactive management of the
		\ebuf\ and the \etree, from the basic user
		functionalities such as hitting a key to the more
		elaborate tree manipulation commands.

		Some problems to be solved are to implement Emaxml as
		an Emacs mode are:

		\begin{itemize}

			\item How colors are managed by Emacs and how
			to control the display and the cursor. How to
			maintain the display up to date with the
			abstract representation held in the \etree.
			
			\item What is the data model for the \ebuf,
			i.e. its data structures and manipulation
			functions.

			\item How to represent a location on the
			\ebuf\ (e.g. what point will be like? What
			data structure will a marker be?) and how to
			map it to its corresponding item in the
			\etree.

		\end{itemize}			

\end{itemize}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Performance assessment}

\section{XML-equivalence of two documents}
%-----------------------------------------

Let us consider a buffer $b_0$ containing an XML document. Suppose the
Parser processes $b_0$ producing the \etree\ $e$ then the Writer
processes $e$ obtaining buffer $b_1$. The two buffers $b_0$ and $b_1$
need not be identical at the byte level, but the two XML documents in
them must be {\bf equivalent} in XML terms.

\section{Testing the Parser and the Writer}
%------------------------------------------

The following criteria can be used to test Emaxml performance.

\begin{itemize}

	\item[(i)] The Parser will be tested on a fully-featured XML
		document, say $d_0$, predicting what the resulting
		\etree\ $e_0$ should be. A good example of such a
		document is ``bookcase.xml'', in \cite{nut}.

	\item[(ii)] The Writer can be tested on $e_0$. The document
		$d_1$ in the resulting buffer should be XML-equivalent
		to $d_0$. 

	\item[(iii)] To prove that the Parser and the Writer
		effectively carry out inverse functions, $d_1$ is fed
		to the Parser again. The resulting \etree\ $e_1$
		should be {\em identical} to $e_0$\footnote{Actually,
		if the Parser is proved correct, test (iii) proves
		XML-equivalence between $d_0$ and $d_1$ too.}. This
		process can be represented diagrammatically as

$$ \framebox{$d_0$} \stackrel{\mbox{Parser}}{\longmapsto} (e_0)
\stackrel{\mbox{Writer}}{\longmapsto} \framebox{$d_1$}
\stackrel{\mbox{Parser}}{\longmapsto} (e_1) $$
		

	\item[(iv)] The Parser will also be tested on one or more {\em
		incorrect} XML documents. Expected results will be 
		compared with the actual ones.

		Note that the Parser is not required to detect an
		end-tag matching a start-tag with a different tag-name
		as an error, since that is not specified in the BNF
		rules. 
\end{itemize}

All tests (i)-(iv) can be automated by setting up a testing framework.

\section{Testing Emaxml interactively}

Once proved the Parser and the Writer correct, the major mode can be
tested interactively.

A sequence of steps in terms of keystrokes tests a particular
feature. A set of such sequences tests the major mode. Such set must
be devised to cover all the features of Emaxml.

A sequence of keystrokes will be tested against an Emaxml buffer in known
state, and the resulting buffer checked visually first (this part
can be automated very little...), and logically then, by examining
one or more of the resulting \etree\, \ebuf\ and the file written
by saving the buffer (this could in principle be automated, but may
prove expensively long to set up).

The features to tests are those described in the Emaxml specification,
in particular the categories listed in section~\ref{sec:edop} and the
response to the editing commands listed in Appendix~\ref{app:stdcmd}
and Appendix~\ref{app:newcmd}.

As soon as a working prototype of Emaxml is available, prior Emacs
users will be asked to use it and assess how well its goals have been
achieved. Their feedback will allow some problems identified to be
fixed.

\section{Documentation requirements}

Following the Emacs philosophy and spirit, Emaxml is meant to be an
extendible system.

Useful and usable documentation is a key requirement. If only part of
the target is implemented, but it is well documented, it will still be
a partial success.

In particular, two types of documents are required: code documentation
and Emacs on line help.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Timetable}

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=15cm]{fig-gantt-24x12.eps}
\end{center}
\caption{Tasks, deliverables and their dependencies}
\label{fig:ilol}
\end{figure}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Initial achievements} \label{sec:done}

The work done so far have been along three major lines:

\begin{itemize}

	\item \underline{Definition of the problem and initial study}

	I have been reading on the three main subjects involved in the
	project, namely Lisp, Emacs internals and XML. The
	Bibliography covers the books and electronic documentation I
	have been through.

	Moreover, Emaxml has been defined in terms of the steps
	required to its development and the details of its
	features. Writing this document has clarified a great deal of
	such details.

	\item \underline{The Parser and the XMLDoc data model}

	The Parser has been fully developed, but still needs
	testing. It is formed by a set of regular expressions matching
	the basic building blocks of XML, on top of which a set of
	functions is build that recognize and extract XML components.

	The output (\etree) is a list\footnote{As everything else in
	Lisp...} that reflects the tree structure of the input
	document, in terms of objects of a data model that I have
	called XMLDoc (or XD for short). This data model has been
	derived from the BNF rules in~\cite{w3c} and has almost a
	direct parallel in the \logus\ model of the display.

	\item \underline{The documentation framework}

	The documentation has been developed as a growing ``Plan of
	work'', a dynamic document that follows the development,
	describing the goals before they get implemented, and changing
	after the implementation to describe what has really been
	produced. This somehow empirical approach to development has
	been taken due to my lack of experience with Lisp, Emacs and
	XML, and it has proven fairly successful.

	All my work is kept in directory {\tt \~{}ceepd1/tesi} on the
	Department network. This is periodically updated with the work
	I do at home. The subdirectories contain README files that are
	updated with the help of a function in Emacs Lisp that detects
	the changes in terms of files deleted and added, and ask for
	the appropriate file description. The latest version of the
	Plan can be found at {\tt \~{}ceepd1/tesi/plan/Plan.html}.

\end{itemize}



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\clearpage

\addcontentsline{toc}{chapter}{Bibliography}

\begin{thebibliography}{99}

	\bibitem{w3c} \book{W3C XML Core Working Group}{Extensible
		Markup Language (XML) 1.0 (Second
		Edition)}{http://www.w3.org/TR/2000/REC-xml-20001006}{2001}
	
		This is the official specification of XML. Includes
		the BNF grammar describing the syntax of XML.

	\bibitem{nut} \book{E. Rusty Harold and E. Scott Means}{XML in a
		Nutshell}{O'Reilly}{2001}

		Good discursive explanation of the basics of the
		various aspects of XML, plus a comprehensive coverage of all
		related topics and applications. I found it useful for
		initial documentation, and also as a quick reference.

	\bibitem{info} \book{Free Software Foundation}{Emacs Info
		Manual}{Free Software Foundation}{1999}

		Major source of information about the usage of
		Emacs. It is more than a help on-line; it can be
		searched in many ways and, as far as my experience is
		concerned, always answers one's questions. Moreover,
		it does not pop up unwanted saying that you are
		writing a letter.

	\bibitem{elisp} \book{B. Lewis, D. LaLiberte and R. Stallman
		and the GNU Manual Group}{Emacs Lisp
		Manual}{http://www.gnu.org/manual/elisp-manual-20-2.5/elisp.html}{1993}

		A book on Lisp, Emacs Lisp, Emacs internals, Emacs
		Lisp libraries. As readable as a novel, as useful as a
		quick reference. Available in a variety of formats
		including Info, which makes it embedded in Emacs.

	\bibitem{extend} \book{B. Glickstein}{Writing GNU Emacs
		Extensions}{O'Reilly}{1997}

		Covers the customization of Emacs from the very basics
		of Lisp to a full major mode implementation. Very rich
		of practical examples paired with Lisp theory.

\end{thebibliography}



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\appendix

\chapter{Standard Emacs commands recoded for Emaxml} \label{app:stdcmd}

{\bf \underline{Notes}}:

\begin{itemize}

	\item Editing operations are here referred to by their Emacs
	Lisp names, due to lack of space. For an exact definition,
	where these names are not self-explanatory, use
	'F1~f~$<$command$>$' in Emacs.

	\item The {\bf Cat} column refers to the editing categories
	listed in section \ref{sec:edop}.

\end{itemize}


\begin{tblenv}
\begin{tabular}{|l|l|l|p{8cm}|}

	\hline
	\hline

	{\bf Cat} &
	{\bf Shortcut} &
	{\bf Emacs Lisp name} &
	{\bf Action} \\

	\hline
	\hline

	ID & C-d & delete-char & Delete the following character. No
	action at end of an elementary \logu. \\

	ID & ESC k & kill-sentence & Performs kill-element. \\

	ID & C-y & yank & Described in \ref{sec:yank}. Perform
	``yanking as a child''.\\

	ID & S-\key{insert} & yank & Described in
	\ref{sec:yank}. Perform ``yanking as a sibling''.\\
 
	ID & \key{BACKSPACE} & delete-backward-char & In the middle of an
	elementary \logu, behave as usual. At beginning, behave as
	C-b.\\

	ID & \key{RET} & newline & Multiline \logu: add a newline at
		point.\\

	ID & \key{insert} & overwrite-mode & Usual behavior. \\


	GE & C-l & recenter & Usual behavior \\

	GE & C-\_, C-/ & undo & Usual behavior \\

	KY & C-k & kill-line & Monoline: Kill the rest of the current
	\logl.Multiline: also, if no non-blanks there, kill thru
		newline.\\

	MK & C-@, C-\key{SPC} & set-mark-command & Usual behavior.\\

	MK & C-x C-x & exchange-point-and-mark & Usual behavior. \\

	PM & C-a & beginning-of-line & Move point to beginning of
	current \logl.\\

	PM & C-b, \kl & backward-char & Slide backward.\\

	PM & C-\kd & forward-paragraph & Traverse down. \\

	PM & C-e & end-of-line & Move point to end of current \logl.\\

	PM & C-f, \kr & forward-char & Slide forward.\\

	PM & C-\kl & backward-word & Usual behavior, through user
	space. \\

	PM & C-n, \kd & next-line & Slide to next \logl, with
	\traveller.\\

	PM & C-p, \ku & previous-line & Slide to previous \logl, with
	\traveller.\\

	PM & C-\kr & forward-word & Usual behavior, through user
	space.\\

	PM & C-\ku & backward-paragraph & Traverse up. \\

	PM & \key{end} & end-of-buffer & Usual behavior. \\

	PM & \key{home} & beginning-of-buffer & Usual behavior. \\

	RE & C-x h & mark-whole-buffer & Usual behavior. \\

	RE & ESC @ & mark-word & Usual behavior. \\

	RE & double-mouse-1 & mouse-set-point & Usual behavior, but
	highlighting the region as described in section
	\ref{sec:high}.\\

	RE & drag-mouse-1 & mouse-set-region & Usual behavior. Region
	as described in section \ref{sec:high}.\\

	RE & mouse-1 & mouse-set-point & Usual behavior, but
	highlighting the region as described in section
	\ref{sec:high}.\\

	RE & mouse-2 & mouse-yank-at-click & Usual behavior, but
	yanking done a la Emaxml (see section \ref{sec:yank}.\\

	RE & triple-mouse-1 & mouse-set-point & Usual behavior, but
	region set to the whole \logu, and highlighting as described
	in section \ref{sec:high}.\\

	\hline

\end{tabular}
\caption{New meanings for old commands}
\end{tblenv}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Emaxml commands} \label{app:newcmd}

\begin{tblenv}
\begin{tabular}{l|p{8cm}}

	{\bf Lisp-like non-definitive name} &
	{\bf Action} \\

	\hline

	create-element-child & Create a blank \luele\ as a child of
	the current smallest element. Described in
	section~\ref{sec:create}.\\

	create-element-sibling & Create a blank \luele\ as a sibling
	of the current smallest element. Described in
	section~\ref{sec:create}.\\

	create-{\em logun} & Create a blank instance of a \logu\ of
	type {\em logun}.\\

	traverse-right & Traverse right. \\

	traverse-left & Traverse left. \\

	mark-more & Described in section \ref{sec:high} \\

	mark-less & Described in section \ref{sec:high} \\

\end{tabular}
\caption{New commands to be implemented}
\end{tblenv}





%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter {Notes on yanking transparency} \label{app:yankee}

Establishing yanking transparency for a leaf \logu\ needs finding a
suitable definition for:

\begin{itemize}

	\item[{(}a{)}] where point is left after killing a leaf \logu\ $p$;

	\item[{(}b{)}] how a leaf \logu\ is inserted when point is at (a).

\end{itemize}

I have not been able to find a reasonable pair (a,b) for which
yanking transparency holds.

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=18cm]{fig-yank_proof-18x10.eps}
\end{center}
\caption{Applying the definitions}
\label{fig:proof}
\end{figure}

\subsubsection{Example}

The following is an example.

A possible reasonable definition for (a) is:

\begin{itemize}

\item[{(}a1{)}] If $p$ has a peer $q$ below it, at the first character
of $q$'s user space; otherwise if $p$ has a peer $q$ above it, at the
first character of $q$'s user space; otherwise at the first character
of $p$'s parent's user space.
 
\end{itemize}

In all case point will be in a \luheader\ after killing. Possible
reasonable definitions for (b) are: When point is in the \luheader\ of
an element $r$, a leaf \logu\ $p$ is inserted

\begin{itemize}

\item[{(}b1{)}] as a peer of $r$ above $r$;

\item[{(}b2{)}] as a peer of $r$ below $r$;

\item[{(}b3{)}] as the first child of $r$;

\item[{(}b4{)}] as the last child of $r$.

\end{itemize}

Consider subtrees in figure \ref{fig:proof}.

Starting from tree (A), pairs of rules are applied to one of
{\em root}'s children. The trees other than (A) show where the pairs of
rules fail with respect to yanking transparency.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\chapter{Chronicle of a murder} \label{app:killre}

As described in section~\ref{sec:kill}, the three steps to obtain the
resulting tree after killing $p$ are:

	\begin{itemize}

		\item[(i)] all leaf \logus\ entirely included in $p$
		are pruned;

		\item[(ii)] all non-leaf \logus\ entirely included in
		$p$ are blanked;

		\item[(iii)] all characters in $p$ are eliminated.

	\end{itemize} 

Figure~\ref{fig:ata} shows what a highlighted region looks like, as
described in section~\ref{sec:high}, and what the effects are of each
step in the algorithm. The bottom right picture is the final result.

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=12cm]{fig-kill_re-24x20.eps}
\end{center}
\caption{Killing algorithm steps}
\label{fig:ata}
\end{figure}

Step (i) prunes the comment and the {\em meta} element, which are leaf
\logus\ entirely included in $p$.

Step (ii) blanks the \luheader\ of the {\em properties} element, and
the \luelename\ and the \luattname\ of the second {\em meta} element,
which are non-leaf \logus\ entirely included in $p$.

Step (iii) eliminates ``c2.gif'' and ``co'', which are all the
characters in $p$.

Figure~\ref{fig:azzuolo} is a visual representation of the tree stored
by killing $p$.

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=8cm]{fig-stored_tree-8x4.eps}
\end{center}
\caption{The stored tree}
\label{fig:azzuolo}
\end{figure}



\end{document}




