Xerces a cestina

Stepan Roh stepan na srnet.cz
Úterý Únor 6 22:41:58 CET 2001



On Tue, 6 Feb 2001, Pavel Kankovsky wrote:

> On Tue, 6 Feb 2001, Michal Krause wrote:
>
> > Mozna, ze podpora ISO-8859-2 neni uplne optimalni. Vetsina parseru se
> > zameruje na UTF-8 (coz je z hlediska dlouhodobe perspektivy jiste
> > dobre). Zkusil bych tentyz dokument prevest do UTF-8 a jestli to bude
> > OK, asi je v Xercesu nejaka chybicka.
>
> Mozna to bude souviset s tim, ze UTF-8 je AFAIK jedine povolene kodovani
> XML dokumentu.

Neni. Relevantni uryvek z XML 1.0 Recommendation:

   In an encoding declaration, the values "UTF-8", "UTF-16",
   "ISO-10646-UCS-2", and "ISO-10646-UCS-4" should be used for the various
   encodings and transformations of Unicode / ISO/IEC 10646, the values
   "ISO-8859-1", "ISO-8859-2", ... "ISO-8859-9" should be used for the parts
   of ISO 8859, and the values "ISO-2022-JP", "Shift_JIS", and "EUC-JP"
   should be used for the various encoded forms of JIS X-0208-1997. XML
   processors may recognize other encodings; it is recommended that character
   encodings registered (as charsets) with the Internet Assigned Numbers
   Authority [IANA], other than those just listed, should be referred to
   using their registered names. Note that these registered names are defined
   to be case-insensitive, so processors wishing to match against them should
   do so in a case-insensitive way.

Shrnuto: kodovani zalezi na XML procesoru. Osobne bych rekl, ze problem
tazatele je v tom, ze Xerces v jeho konfiguraci to kodovani nezna a
nenamaha se o tom nic rici.

S pozdravem,

Stepan Roh




Další informace o konferenci Linux