Item9126: Wrong encoding in CompareRevisionsAddOn

pencil
Priority: Normal
Current State: Closed
Released In: 1.1.0
Target Release: minor
Applies To: Extension
Component: CompareRevisionsAddOn
Branches:
Reported By: GilmarSantosJr
Waiting For:
Last Change By: KennethLavrsen
I use {Languages}{'pt-br'}{Enabled} ticked, {Site}{Locale} is pt_BR.utf8 and {Site}{CharSet} is utf-8.

Everything works fine, except if I enable CompareRevisionsAddOn and look at topic history (everything else keeps working fine): all "special" characters seem to get double-encoded. Problem is that $entity->as_HTML() is called without parameters and this makes HTML::Element to encode all "unsafe" characters (1):

Returns a string representing in HTML the element and its descendants. The optional argument $entities specifies a string of the entities to encode. For compatibility with previous versions, specify '<>&' here. If omitted or undef, all unsafe characters are encoded as HTML entities. See HTML::Entities for details. If passed an empty string, no entities are encoded.

I changed the call from:
        return $element->as_HTML( undef, undef, {} );

to:
        return $element->as_HTML( q|'"<>%&|, undef, {} );

Taking the "dangerous" characters from "safe" encoding. Then everything worked as expected.

Any concern about commiting this change?

-- GilmarSantosJr - 08 Jun 2010

After more reading, I implemented this change (relative to trunk):

$ git diff
diff --git a/CompareRevisionsAddOn/lib/Foswiki/Contrib/CompareRevisionsAddOn/Compare.pm b/CompareRevisionsAddOn/lib/Foswiki/Contrib/CompareRevisionsAddOn/Com
index ed0a7e3..6d31949 100755
--- a/CompareRevisionsAddOn/lib/Foswiki/Contrib/CompareRevisionsAddOn/Compare.pm
+++ b/CompareRevisionsAddOn/lib/Foswiki/Contrib/CompareRevisionsAddOn/Compare.pm
@@ -322,6 +322,10 @@ sub _getTree {
     my $tree = new HTML::TreeBuilder;
     $tree->implicit_body_p_tag(1);
     $tree->p_strict(1);
+    if ( $Foswiki::cfg{UseLocale} ) {
+        require Encode;
+        $text = Encode::decode( $Foswiki::cfg{Site}{CharSet}, $text );
+    }
     $tree->parse($text);
     $tree->eof;
     $tree->elementify;

And it worked, without the change described at my previous comment. With the first solution, parse() method prints lots of messages to STDERR about parsing undecoded utf-8 strings. This solution works with no warnings.

So, what is the best fix? Any other suggestion?

-- GilmarSantosJr - 08 Jun 2010

 
Topic revision: r5 - 13 Sep 2010, KennethLavrsen
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy