Re[2]: Čeština v Apache

Jan Houstek houstek na karlin.mff.cuni.cz
Středa Únor 26 13:22:24 CET 2003


> Presne tak, protoze podle RFC ma prednost hlavicka HTTP protokolu pred
> uvedenim HTTP-EQUIV v HTML souboru. POkud mate na serveru  vsechno ve stejnem
> kodovani, tak muzete pouzit konfiguraci serveru, ale jinak je lepsi pouzvat
> HTTP-EQUIV (a v soucasne dobe ho spravne implementuji vsechny hlavni
> prohlizece).

Neuvedeni kodovani v http hlavicce muze mit bezpecnostni implikace
(viz http://www.apache.org/info/css-security).

Navic <META> taky neni prilis prakticka, protoze malo ktere prekodovadlo
(napr. recode, iconv apod.) dokaze zmenit i META, takze se muze snadno
stat, ze informace zde uvedena neplati. Praktictejsi je podle me:

1) do globalniho httpd.conf nastavit nejcasteji pouzite kodovani
2) v co nejvyssi mire pouzivat na serveru toto kodovani
3) je-li nutne pouzivat vice kodovani, nastavit si to pomoci .htaccess

-- Honza Houstek


Explicitly Setting the Character Encoding

Many web pages leave the character encoding ("charset" parameter in HTTP)
undefined. In earlier versions of HTML and HTTP, the character encoding
was supposed to default to ISO-8859-1 if it wasn't defined. In fact, many
browsers had a different default, so it was not possible to rely on the
default being ISO-8859-1. HTML version 4 legitimizes this - if the
character encoding isn't specified, any character encoding can be used.

If the web server doesn't specify which character encoding is in use, it
can't tell which characters are special. Web pages with unspecified
character encoding work most of the time because most character sets
assign the same characters to byte values below 128. But which of the
values above 128 are special? Some 16-bit character-encoding schemes have
additional multi-byte representations for special characters such as "<".
Some browsers recognize this alternative encoding and act on it. This is
"correct" behavior, but it makes attacks using malicious scripts much
harder to prevent. The server simply doesn't know which byte sequences
represent the special characters.

For example, UTF-7 provides alternative encoding for "<" and ">", and
several popular browsers recognize these as the start and end of a tag.
This is not a bug in those browsers. If the character encoding really is
UTF-7, then this is correct behavior. The problem is that it is possible
to get into a situation in which the browser and the server disagree on
the encoding. Web servers should set the character set, then make sure
that the data they insert is free from byte sequences that are special in
the specified encoding. For example:


<HTML>
<HEAD>
<META http-equiv="Content-Type"
content="text/html; charset=ISO-8859-1">
<TITLE>HTML SAMPLE</TITLE>
</HEAD>
<BODY>
<P>This is a sample HTML page
</BODY>
</HTML>



Další informace o konferenci Linux