MAN.9FRONT.ORG RTFM


     UHTML(1)                                                 UHTML(1)

     NAME
          uhtml - convert foreign character set HTML file to unicode

     SYNOPSIS
          uhtml [ -p ] [ -c charset ] [ file ]

     DESCRIPTION
          HTML comes in various character-set encodings and has spe-
          cial forms to encode characters. To make it easier to pro-
          cess HTML, uhtml is used to normalize it to a Unicode-only
          form.

          Uhtml detects the character set of the HTML input file and
          calls tcs(1) to convert it to UTF replacing HTML-entity
          forms by their Unicode character representations except for
          lt, gt, amp, quot, and apos.  The converted HTML is written
          to standard output. If no file was given, it is read from
          standard input. If the -p option is given, the detected
          character set is printed and the program exits without con-
          version.  In case character-set detection fails, the default
          (UTF) is assumed. This default can be changed with the -c
          option.

     SOURCE
          /sys/src/cmd/uhtml.c

     SEE ALSO
          tcs(1)