DexH -- Document Extractor, HTML output

Version 3

Primary contributors: Brad Eckert brad1NO@SPAMtinyboot.com

Abstract

DexH is a simple literate programming tool inspired by MPE's DOCGEN. DexH can also be used to write articles about Forth featuring a mixture of documentation and source code. DexH is a standalone program that processes a Forth source file. The following command does the conversion:
DEX input_filename

Commands

Commands are embedded within comments. You can use the following formats, with either starting at the first column.

( ?? ... ) where ?? is the command, or
\ ?? ...

You can append HTML to created files by DEXing any number of source files but you should use a *Z command to complete the HTML.

Command	Effect
**	continuation of G, E or P
*!	create and select a new output file
*>	select an existing file to add text to
*T	Title
*Q	Quotation or abstract
*S	Section
*N	Sub-section
*P	Paragraph
*E	Paragraph which is a code example
*B	Bullet entry
*G	Glossary entry for the previous line
*R	raw LaTeX
*W	raw HTML
*Z	End output
*+	Include source code as document text
*-	Turn off source code inclusion

DexH is ANS Forth except for the need for BOUNDS, SCAN, SKIP and LCOUNT. They are commonly used words but redefined here for completeness.

\ : BOUNDS OVER + SWAP ;
\ : SCAN            ( addr len char -- addr' len' )
\    >R BEGIN DUP WHILE OVER C@ R@ <>
\    WHILE 1 /STRING REPEAT THEN R> DROP ;
\ : SKIP            ( addr len char -- addr' len' )
\    >R BEGIN DUP WHILE OVER C@ R@ =
\    WHILE 1 /STRING REPEAT THEN R> DROP ;
\ : LCOUNT          ( addr -- addr' len ) DUP CELL+ SWAP @ ;

Some files use very long lines, which is desirable for long sections of documentation. You can allocate buffers for lines longer than 2000 chars by changing the following line:

2000 CHARS CONSTANT max$

HTML needs some canned boilerplate. This is created by ,| since HTML doesn't use | characters.

: (,$)  ( a len -- )  DUP C, 0 ?DO COUNT C, LOOP DROP ;
: ,|    ( <text> -- ) [CHAR] | WORD COUNT -TRAILING (,$) ;

CREATE DexHTMLheader
   ,| <?xml version="1.0"?>                                                    |
   ,| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"                 |
   ,|     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">                 |
   ,| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">      |
   ,| <head>                                                                   |
   ,| <meta http-equiv="Content-Type" content="text/xml; charset=iso-8859-1" />|
   ,| <meta name="GENERATOR" content="DexH v03" />                             |
   ,| <style type="text/css">                                                  |
   ,| </style>                                                                 |
   ,| <title>                                                                  |
   0 C,

All output is via OUT and OUTLN, which can be sent to the screen for debugging purposes.

0 VALUE testing                                 \ screen is for testing
: werr     ( n -- )      ABORT" Error writing file" ;
: out      ( a len -- )  testing IF TYPE    ELSE outfile WRITE-FILE werr THEN ;
: outln    ( a len -- )  testing IF TYPE CR ELSE outfile WRITE-LINE werr THEN ;

Some characters are replaced by special strings so they can't be interpreted as tags. Also, runs of blanks need special treatment. Some escape sequences are supported:

seq	Escape command
\i	Italics
\b	Bold
\t	Typewriter
\^	Superscript (i.e. ax\^2\d+bx+c=0)
\_	Subscript
\d	Default font (ends italic, superscript, etc.)
\n	Line break
\r	Horizontal rule
\p	Page break
\\	\

Sample usage: "ax\^2\d + bx + w\_0\d = 0" displays ax² + bx + w₀ = 0

"Try \bbold, \iitalic \dand \ttypewriter.\d" displays "Try bold, italic and typewriter."

: new-font  ( n -- )                            \ switch to a new font
   thisfont @ SWAP thisfont !
   CASE [CHAR] i OF S" </i>" out                ENDOF
        [CHAR] b OF S" </b>" out                ENDOF
        [CHAR] t OF S" </code>" out             ENDOF
        [CHAR] ^ OF S" </sup>" out              ENDOF
        [CHAR] _ OF S" </sub>" out              ENDOF
   ENDCASE ;

: outh    ( addr len -- )                       \ HTMLized text output
   999 bltally !
   BOUNDS ?DO I C@ escape @ IF
     CASE
        [CHAR] \ OF S" \"         out           ENDOF
        [CHAR] n OF S" <br />"    out           ENDOF
        [CHAR] r OF hr                          ENDOF
        [CHAR] i OF I C@ new-font S" <i>" out       ENDOF
        [CHAR] b OF I C@ new-font S" <b>" out       ENDOF
        [CHAR] t OF I C@ new-font S" <code>" out    ENDOF
        [CHAR] ^ OF I C@ new-font S" <sup>" out     ENDOF
        [CHAR] _ OF I C@ new-font S" <sub>" out     ENDOF
        [CHAR] d OF 0    new-font                   ENDOF
        no-escape I 1 out
     ENDCASE 0 escape !
   ELSE
     CASE
        [CHAR] \ OF captive @
              IF no-escape ELSE 1 escape ! THEN ENDOF
        [CHAR] & OF S" &amp;"     out           ENDOF
        [CHAR] < OF S" &lt;"      out           ENDOF
        [CHAR] > OF S" &gt;"      out           ENDOF
        [CHAR] " OF S" &quot;"    out           ENDOF
        [CHAR] © OF S" &copy;"    out           ENDOF
        BL       OF bltally @ IF S" &nbsp;" ELSE S"  " THEN out
                 1 bltally +!                   ENDOF
        I 1 out  0 bltally !
       ENDCASE
   THEN LOOP
   escape @ IF no-escape THEN                   \ trailing \
   S" " outln ;

The fields in a table are separated by | (vertical bar) and end in |.

: gl-open       ( -- )

Open glossary File

: gl-close      ( -- )

Close glossary File

: gl-ancor      ( -- )

Write ancor number

\ cr ." gl-create-entry: " prevline LCOUNT type

Create a glossary entry

Glossary

:noname  ( <filename> -- )

Convert a file or files to HTML. Output filenames are included in the source file.

: q  ( <string> -- )

Test a single line of text, outputting to the screen.

This file generated by DexH