Apache::Util::escape_html() Doesn’t Like Perl UTF-8 Strings
I got bit by a bug with Apache::Util’s escape_html()
function in mod_perl
-
It seems that it doesn’t like Perl’s Unicode encoded strings! This patch demonstrates the issue (be sure that your editor understands utf-8):
— modperl/t/net/perl/util.pl.~1.18.~ Sun May 25 03:54:08 2003+++ modperl/t/net/perl/util.pl Thu Sep 9 19:38:40 2004@@ -74,6 +74,25 @@ #print $esc_2; test ++$i, $esc eq $esc_2;++# Make sure that escape_html() understands multibyte characters.+my $utf8 = ‘<專輯>’;+my $esc_utf8 = ‘<專輯>’;+my $test_esc_utf8 = Apache::Util::escape_html($utf8);+test ++$i, $test_esc_utf8 eq $esc_utf8;+#print STDERR “Compare ‘$test_esc_utf8’\n to ‘$esc_utf8’\n”;++eval { require Encode };+unless ($@) {+ # Make sure escape_html() properly handles strings with Perl’s+ # Unicode encoding.+ $utf8 = Encode::decode_utf8($utf8);+ $esc_utf8 = Encode::decode_utf8($esc_utf8);+ $test_esc_utf8 = Apache::Util::escape_html($utf8);+ test ++$i, $test_esc_utf8 eq $esc_utf8;+ #print STDERR “Compare ‘$test_esc_utf8’\n to ‘$esc_utf8’\n”;+}+ use Benchmark; =pod
If I enable the print statements and look at the log, I see this:
Compare '<專輯>'
to '<專輯>'
Compare '<å°è¼¯>'
to '<專輯>'
The first escape appears to work correctly, but when I decode the string to
Perl’s Unicode representation, you can see how badly escape_html()
munges the
text!
Curiously, both tests fail, although the first conversion appears to be correct.
This could be due to the behavior of eq
, though I’m not sure why. But it’s the
second test that’s the more interesting, since it really screws things up.
Looking for the comments? Try the old layout.