UHTML(1)                                                 UHTML(1)

     NAME
          uhtml - convert foreign character set HTML file to unicode

     SYNOPSIS
          uhtml [ -p ] [ -c charset ] [ file ]

     DESCRIPTION
          HTML comes in various character set encodings and has spe-
          cial forms to encode characters. To make it easier to pro-
          cess html, uhtml is used to normalize it to a unicode only
          form.

          Uhtml detects the character set of the html input file and
          calls tcs(1) to convert it to utf replacing html-entity
          forms by ther unicode character representations except for
          lt gt amp quot and apos . The converted html is written to
          standard output. If no file was given, it is read from stan-
          dard input. If the -p option is given, the detected charac-
          ter set is printed and the program exits without conversion.
          In case character set detection fails, the default (utf) is
          assumed. This default can be changed with the -c option.

     SOURCE
          /sys/src/cmd/uhtml.c

     SEE ALSO
          tcs(1)