You are here: Foswiki>Tasks Web>Item13302 (05 Jul 2015, GeorgeClark)Edit Attach

Item13302: CGI-4.11, 4.12, 4.13 breaks utf8 byte strings in data forms

pencil
Priority: Urgent
Current State: Closed
Released In: 2.0.0
Target Release: major
Applies To: Engine
Component: DataForms, FlexFormPlugin, FoswikiUIEdit, I18N, JQueryPlugin, MoreFormfieldsPlugin
Branches: master
Reported By: MichaelDaum
Waiting For:
Last Change By: GeorgeClark
How to reproduce:

  • Set $Foswiki::cfg{Site}{CharSet} = 'utf-8';
  • create a topic with a text such as "frühstück"
  • save it
  • edit it
  • add or replace the existing form with another one
  • new text now is "frühstück"

This is reproducible in raw edit even without NatEditPlugin or TinyMCEPlugin

This does not happen switching forms while in wysiwyg mode.

When selecting a new data form this is what is stored in the meanwhile in a hidden text field:

<input type="hidden" name="text" value="fr&Atilde;&frac14;hst&Atilde;&frac14;ck

   * Set NOWYSIWYG = on
"  />

This smells like getting caught by the new way CGI-4.13 deals with strings while creating form elements. At some point this changed in CGI. frown, sad smile

-- MichaelDaum - 10 Mar 2015

The checked in fix is just the tip of the iceberg as all the other HTML generating calls to CGI aren't fixed yet. Those that take a unicode string as a value for input, select, textarea etc are all affected the same way ... such as Foswiki::UI::Text etc

-- MichaelDaum - 10 Mar 2015

It seems as if we need to switch off autoEscape: http://perldoc.perl.org/CGI.html#AUTOESCAPING-HTML

-- MichaelDaum - 12 Mar 2015

Here's how to reproduce:

(1) Switch your site charset to utf-8.

(2) create a DataForm such as

| *Name:*| *Type:* | *Size:* | *Values:* | *Description:* | *Attributes:* | *Default:* |
| Label | label | 80 | frühstück |  |  |  |
| Text | text | 80 |  |  |  |  |
| Textarea | textarea | 80x5 |  |  |  |  |
| Textboxlist | textboxlist | 80 |  |  |  |  |
| Checkbox | checkbox | 10 | früh, spät, später |  |  |  |
| Checkbox Values | checkbox+values | 10 | früh=1, spät=2, später=3 |  |  |  |
| Radio | radio | 10 | früh, spät, später |  |  |  |
| Radio Values | radio+values | 10 | früh=1, spät=2, später=3 |  |  |  |
| Select | select | 1 | ,früh, spät, später |  |  |  |
| Select Values | select+values | 1 | none=0,früh=1, spät=2, später=3 |  |  |  |

(3) attach it to some topic

(4) edit the topic and insert "überstring" into the input field.

(5) save

(6) edit again ... the input field now contains überstring

Here's a patch that fixes it for Foswiki::Form::Text others still required in various code spots:

diff --git a/core/lib/Foswiki/Form/Text.pm b/core/lib/Foswiki/Form/Text.pm
index b02cb81..067f459 100644
--- a/core/lib/Foswiki/Form/Text.pm
+++ b/core/lib/Foswiki/Form/Text.pm
@@ -3,6 +3,7 @@ package Foswiki::Form::Text;
 
 use strict;
 use warnings;
+use Encode ();
 
 use Foswiki::Form::FieldDefinition ();
 our @ISA = ('Foswiki::Form::FieldDefinition');
@@ -27,6 +28,9 @@ sub new {
 sub renderForEdit {
     my ( $this, $topicObject, $value ) = @_;
 
+    # handle properly encoded strings down to CGI
+    $value = Encode::decode($Foswiki::cfg{Site}{CharSet}, $value); 
+
     return (
         '',
         CGI::textfield(

-- MichaelDaum - 23 Mar 2015

It's important to test this also with characters that have Unicode code points > 0x100. For example: `αβγ` (Greek alpha, beta, gamma, U+03B1 to U+03B3)

-- JanKrueger - 23 Mar 2015

Nother error: charsets are broken differently when you edit the test topic and add Text=überstring to the edit URL to set the formfield using url parameters.

Above patch does not suffice as there are two ways that CGI::textfield gets its value parameters: either (1) using the -value parameter or (2) reading the url parameter directly. Problem is that while we might properly decode the value as provided to the CGI::textfield method it will be completely ignored if the same value is in the url parameter as well.

$value = Encode::decode($Foswiki::cfg{Site}{CharSet}, $value);

$html = CGI::textfield(
     -name  => 'Text',
     -value => $value # SMELL: this is ignored when there is a "Text" url parameter as well 
);

Just found out that we need -override=1 on all CGI form building methods that have a -value parameter as well.

Here is a more thorough patch: Form-utf8.patch that fixes:

  • the "unknown" formfield type
  • checkbox
  • label
  • radio
  • text
  • textarea

For some bizarre reason select is not affected. I've also extended the DataForm definition above to cover some more cases.

-- MichaelDaum - 23 Mar 2015

This patch breaks a ut8 system using CGI 3.65 Perl 5.20.1. When I click "Edit" on the form on the attached topic, the "überstring" is rendered as a black diamond ? in place of the ü. I flipped to a perlbrew 5.20.1 with CGI 4.11, and the field works successfully. At least this decode needs to be conditional.

Some text �berstring in the formfield

-- GeorgeClark - 23 Mar 2015

Maybe I'm missing something, but this seems to fix it for me:
diff --git a/core/lib/Foswiki/Form.pm b/core/lib/Foswiki/Form.pm
index beb8247..4bd90fc 100644
--- a/core/lib/Foswiki/Form.pm
+++ b/core/lib/Foswiki/Form.pm
@@ -486,6 +486,7 @@ sub renderForEdit {
     my ( $this, $topicObject ) = @_;
     ASSERT( $topicObject->isa('Foswiki::Meta') ) if DEBUG;
     require CGI;
+    CGI::autoEscape(0);
     my $session = $this->session;
 
     if ( $this->{mandatoryFieldsPresent} ) {
@@ -563,6 +564,7 @@ sub renderForEdit {
     }
 
     $text .= $afterText;
+    CGI::autoEscape(1);
     return $text;
 }

-- GeorgeClark - 23 Mar 2015

Similar bug, how to reproduce: 1.) enter some UTF8 text in WYSIWYG 2.) Mark the text 3.) from the 1st drop-down select VERBATIM 4.) switch to wikitext editor - the text got screwed...

-- JozefMojzis - 24 Mar 2015

Also, the "View wiki text" (bottom menu) shows the topic screwed when contains utf8 text.

-- JozefMojzis - 24 Mar 2015

@George, okay that's exactly what I was afraid of. Remaining approaches:

  1. decode strings when $CGI::VERSION >= 4.11
  2. disable auto-escape ... while escaping (double-)quotes on our own (note your above patch breaks values with quotes in it)
  3. don't use CGI >= 4.13

From the change logs of CGI-4.11:

[ REFACTORING ]
    - escapeHTML (and unescapeHTML) have been refactored to use the functions
      exported by the HTML::Entities module (GH #157)

This links to https://github.com/leejo/CGI.pm/issues/157

Here's a new patch: Form-utf8.patch.

Updated above DataForm definition to cover checkbox+values as well.

-- MichaelDaum - 24 Mar 2015

Hi Michael, The checked in patches are completely screwing up on CGI 3.65. The form ended up with <?> in place of the characters, The data fields were saved as #1234 entities of some sort, tough to edit and the topic contents was scrambled.

-- GeorgeClark - 26 Mar 2015

George have you got any details how to repro your report? I tried 3.65 with utf8 and iso-8859-1 and both seem to be fine. Note that the code should behave the same as before for any CGI < 4.11

-- MichaelDaum - 26 Mar 2015

The patch is indeed not sufficient nor working out for anything else such as α Once you entered this unicode into an input field save it and open up again the editor will it be translated to a corresponding html entity ... which we don''t want is it is changing user generated content for no good reason.

I've been talking to the devs of CGI and reported the error. They reverted the behavior introduced in 4.11 and added a fix to the upcoming 4.14 version of CGI (eta next week).

I'll revert the bulk of my patch but leave the -override=>1 thingy in as that's actually addressing another bug where CGI.pm was not taking the $value calling the html generator. Instead it looked at the query string of the ongoing request as well and takes the value from there ... ignoring the one it was called with.

Once this is reverted we have to remember not to use CGI-4.11, CGI-4.12 or CGI-4.13.

-- MichaelDaum - 26 Mar 2015

I've added a check that restores the warnings on CGI versions, and raises it to an ERROR if CGI 4.11-4.13 with utf8 set.

-- GeorgeClark - 26 Mar 2015

Okay I think we did our duty here. The rest is up to the CGI devs. I will report back on the issue as soon as 4.14 is out.

-- MichaelDaum - 27 Mar 2015

Related feature request: ReduceImpactOfCGIDotPMinFoswiki

-- MichaelDaum - 30 Mar 2015

CGI-4.14 just came out and fixes the issue. Great.

-- MichaelDaum - 01 Apr 2015
 

Topic revision: r23 - 05 Jul 2015, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy