Converting HTML to LaTeX

My procedure for converting HTML to LaTeX is as follows.

Prerequisites

The following software installation and configuration must have taken place.

Procedure

The following procedure will convert HTML to LaTeX:

  1. If the HTML is in file.html, type these Emacs keystrokes to parse the HTML and display the formatted result in a Emacs buffer:

      M-x w3-find-file RET file.html RET
        

    The formatted result is not used in this procedure; it is merely a byproduct.

  2. Then type these Emacs keystrokes to convert the internal parse tree stored in the Emacs variable w3-last-parse-tree into XML:

      M-x w3-convert-to-xml RET
        
  3. Switch to the W3 XML Conversion buffer using these Emacs keystrokes:

      C-x b W3 SPC XML SPC Conversion RET
        
  4. Save the XML into the file file.xml using these Emacs keystrokes:

      C-x w file.xml RET
        
  5. Let a full path to the file xhtml-to-latex.xsl be path/xhtml-to-latex.xsl. Run the following command to convert the XML into LaTeX:

      java com.jclark.xsl.sax.Driver file.xml path/xhtml-to-latex.xsl > file.tex
        
  6. The resulting LaTeX file file.tex can be used as usual for LaTeX.