DexH -- Document Extractor, HTML output


Version 3

Primary contributors: Brad Eckert  brad1NO@SPAMtinyboot.com

Abstract

DexH is a simple literate programming tool inspired by MPE's DOCGEN. DexH can also be used to write articles about Forth featuring a mixture of documentation and source code. DexH is a standalone program that processes a Forth source file. The following command does the conversion:
DEX input_filename

Commands

Commands are embedded within comments. You can use the following formats, with either starting at the first column.

You can append HTML to created files by DEXing any number of source files but you should use a *Z command to complete the HTML.

Command Effect
** continuation of G, E or P
*! create and select a new output file
*> select an existing file to add text to
*T Title
*Q Quotation or abstract
*S Section
*N Sub-section
*P Paragraph
*E Paragraph which is a code example
*B Bullet entry
*G Glossary entry for the previous line
*R raw LaTeX
*W raw HTML
*Z End output
*+ Include source code as document text
*- Turn off source code inclusion

DexH is ANS Forth except for the need for BOUNDS, SCAN, SKIP and LCOUNT. They are commonly used words but redefined here for completeness.

\ : BOUNDS OVER + SWAP ;
\ : SCAN            ( addr len char -- addr' len' )
\    >R BEGIN DUP WHILE OVER C@ R@ <>
\    WHILE 1 /STRING REPEAT THEN R> DROP ;
\ : SKIP            ( addr len char -- addr' len' )
\    >R BEGIN DUP WHILE OVER C@ R@ =
\    WHILE 1 /STRING REPEAT THEN R> DROP ;
\ : LCOUNT          ( addr -- addr' len ) DUP CELL+ SWAP @ ;

Some files use very long lines, which is desirable for long sections of documentation. You can allocate buffers for lines longer than 2000 chars by changing the following line:

2000 CHARS CONSTANT max$

HTML needs some canned boilerplate. This is created by ,| since HTML doesn't use | characters.

: (,$)  ( a len -- )  DUP C, 0 ?DO COUNT C, LOOP DROP ;
: ,|    ( <text> -- ) [CHAR] | WORD COUNT -TRAILING (,$) ;

CREATE DexHTMLheader
   ,| <?xml version="1.0"?>                                                    |
   ,| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"                 |
   ,|     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">                 |
   ,| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">      |
   ,| <head>                                                                   |
   ,| <meta http-equiv="Content-Type" content="text/xml; charset=iso-8859-1" />|
   ,| <meta name="GENERATOR" content="DexH v03" />                             |
   ,| <style type="text/css">                                                  |
   ,| </style>                                                                 |
   ,| <title>                                                                  |
   0 C,

All output is via OUT and OUTLN, which can be sent to the screen for debugging purposes.

0 VALUE testing                                 \ screen is for testing
: werr     ( n -- )      ABORT" Error writing file" ;
: out      ( a len -- )  testing IF TYPE    ELSE outfile WRITE-FILE werr THEN ;
: outln    ( a len -- )  testing IF TYPE CR ELSE outfile WRITE-LINE werr THEN ;

Some characters are replaced by special strings so they can't be interpreted as tags. Also, runs of blanks need special treatment. Some escape sequences are supported:

seq Escape command
\i Italics
\b Bold
\t Typewriter
\^ Superscript (i.e. ax\^2\d+bx+c=0)
\_ Subscript
\d Default font (ends italic, superscript, etc.)
\n Line break
\r Horizontal rule
\p Page break
\\ \

Sample usage: "ax\^2\d + bx + w\_0\d = 0" displays ax2 + bx + w0 = 0

"Try \bbold, \iitalic \dand \ttypewriter.\d" displays "Try bold, italic and typewriter."

: new-font  ( n -- )                            \ switch to a new font
   thisfont @ SWAP thisfont !
   CASE [CHAR] i OF S" </i>" out                ENDOF
        [CHAR] b OF S" </b>" out                ENDOF
        [CHAR] t OF S" </code>" out             ENDOF
        [CHAR] ^ OF S" </sup>" out              ENDOF
        [CHAR] _ OF S" </sub>" out              ENDOF
   ENDCASE ;

: outh    ( addr len -- )                       \ HTMLized text output
   999 bltally !
   BOUNDS ?DO I C@ escape @ IF
     CASE
        [CHAR] \ OF S" \"         out           ENDOF
        [CHAR] n OF S" <br />"    out           ENDOF
        [CHAR] r OF hr                          ENDOF
        [CHAR] i OF I C@ new-font S" <i>" out       ENDOF
        [CHAR] b OF I C@ new-font S" <b>" out       ENDOF
        [CHAR] t OF I C@ new-font S" <code>" out    ENDOF
        [CHAR] ^ OF I C@ new-font S" <sup>" out     ENDOF
        [CHAR] _ OF I C@ new-font S" <sub>" out     ENDOF
        [CHAR] d OF 0    new-font                   ENDOF
        no-escape I 1 out
     ENDCASE 0 escape !
   ELSE
     CASE
        [CHAR] \ OF captive @
              IF no-escape ELSE 1 escape ! THEN ENDOF
        [CHAR] & OF S" &amp;"     out           ENDOF
        [CHAR] < OF S" &lt;"      out           ENDOF
        [CHAR] > OF S" &gt;"      out           ENDOF
        [CHAR] " OF S" &quot;"    out           ENDOF
        [CHAR] © OF S" &copy;"    out           ENDOF
        BL       OF bltally @ IF S" &nbsp;" ELSE S"  " THEN out
                 1 bltally +!                   ENDOF
        I 1 out  0 bltally !
       ENDCASE
   THEN LOOP
   escape @ IF no-escape THEN                   \ trailing \
   S" " outln ;

The fields in a table are separated by | (vertical bar) and end in |.

: gl-open       ( -- )

Open glossary File

: gl-close      ( -- )

Close glossary File

: gl-ancor      ( -- )

Write ancor number

\ cr ." gl-create-entry: " prevline LCOUNT type

Create a glossary entry


Glossary

:noname  ( <filename> -- )

Convert a file or files to HTML. Output filenames are included in the source file.

: q  ( <string> -- )

Test a single line of text, outputting to the screen.


This file generated by DexH