


html2pdbtxt(1)		  User Commands		   html2pdbtxt(1)



NAME
     html2pdbtxt - HTML	to Doc Text converter for 3Com PalmPilots

SYNOPSIS
     html2pdbtxt [ -bchars ] [ -ttitle ] [ -uURL  ]  file.html	[
     file.txt ]
     html2pdbtxt -v

DESCRIPTION
     html2pdbtxt converts HTML to text suitable	for conversion to
     a	Doc(4)	file  via  txt2pdbdoc(1).  If no text filename is
     given, the	generated text is sent to standard output.

  HTML Tags
     The following HTML	tags (and corresponding	ending tags)  are
     recognized:   ADDRESS,  A NAME, BLOCKQUOTE, BR, CENTER, DIV,
     DL, DT, H1, H2, H3, H4, H5, H6, OL, OPTION, PRE, P,  SELECT,
     SCRIPT,  STYLE,  TABLE,  TITLE,  UL.  In all cases, the most
     ``reasonable'' thing is done given	the  constraints  of  the
     Doc(4)  format  which is essentially plain	text.  ALT attri-
     butes  (typically	found  in  IMG	tags)  have  their   text
     extracted	and  placed  between  brackets	[like this].  All
     other HTML	tags are stripped.

  Character Entities
     Both HTML character and numeric  (decimal	and  hexadecimal)
     entity  references	are converted to their byte value accord-
     ing to the	ISO 8859-1 (Latin 1) character set so they appear
     properly  on the Pilot.  For example, ``r&eacute;sum&#233;''
     becomes ``resume''	with accented letter 'e's.

  Document Title
     Unless specified with  the	 -t  option,  the  HTML	 file  is
     scanned  for  <TITLE>  ...	 </TITLE> tags and, if found, the
     title is extracted	and put	on line	1 of the generated file.

  Bookmarks
     Bookmarks	are  placed  into  the	generated  file	 wherever
     <A	NAME="..."> tags are found in the HTML file.

OPTIONS
     -bchars   Specify the character sequence that is to serve as
	       the bookmark indicator.	The default is (*).  (See
	       the CAVEATS.)

     -ttitle   Specfify	the title of  the  document  that  is  to
	       appear  on line 1 of the	generated file overriding
	       any title  found	 inside	 the  HTML  file  between
	       <TITLE> ... </TITLE> tags.

     -uurl     Specify the URL the HTML	file supposedly	came from
	       and put it on the line after the	title, if any, in



html2pdbtxt	  Last change: November	3, 1998			1






html2pdbtxt(1)		  User Commands		   html2pdbtxt(1)



	       the generated file.

     -v	       Print the version number	to  standard  output  and
	       exit.

EXAMPLE
     To	convert	an HTML	file to	Doc:

	  html2pdbtxt -u http://www.wonderland.org/ alice.html alice.txt
	  txt2pdbdoc `head -1 alice.txt` alice.txt alice.pdb


CAVEATS
     1.	 Some Doc readers have a ``feature'' whereby, during  the
	 scan  for  bookmakrs  phase, they recognize the bookmark
	 sequence of characters	anywhere in the	text and not just
	 at the	beginning of a line.

     2.	 Some Doc readers do not allow the bookmark  sequence  to
	 contain the > character since they interpret that as the
	 sequnce delimiter, e.g., <->> will be interpreted as the
	 sequence being	merely -.

     3.	 Ordered lists (via the	OL tag)	are treated as	unordered
	 lists (like the UL tag) because it would greatly compli-
	 cate the code since it	would have to  be  parsed  rather
	 than simple substitutions being performed.

SEE ALSO
     pdbtxt2html(1), txt2pdbdoc(1), doc(4)

     International Standards Organization.  ``ISO 8859-1:  Infor-
     mation Processing -- 8-bit	single-byte coded graphic charac-
     ter sets -- Part 1: Latin alphabet	No. 1.''  1987.

     World Wide	Web Consortium.	 ``Character entity references in
     HTML 4.0.''  HTML 4.0 Specification, http://www.w3.org/

AUTHOR
     Paul J. Lucas <pjl@best.com>















html2pdbtxt	  Last change: November	3, 1998			2



