Item10751: FastCGIEngineContrib "Wide character..." errors w/UTF-8 Foswiki

pencil
Priority: Normal
Current State: Closed
Released In: 2.0.0
Target Release: major
Applies To: Extension
Component: FastCGIEngineContrib, Unicode
Branches: master
Reported By: PaulHarvey
Waiting For:
Last Change By: GeorgeClark
I have struggled all day on this problem, and finally have a "fix". I'd like a little sanity check, very brief comment from Crawford or Babar to see if this is a sane fix...

Problem 1:

"non-ascii" stuff was being "double-encoded" - for example, a simple 'ndash' character - – in HTML entity speak - is e2 80 93 in UTF-8, but was being displayed as "unknown unicode chars" and the octets seen in the response looked like c3 a2 c2 80 c2 93.

This is a sign of double-encoding, if you consider that they bytes for U+00E2 ("LATIN SMALL LETTER A WITH CIRCUMFLEX") in UTF-8 are c3 a2.

So the fix to this problem was very simple: I was running FCGI from Ubuntu 10.04 LTS which is version 0.68, and there is a note in the change logs for 0.71 that this double-encoding issue has been fixed.

Problem solved? No. Because now instead of funny looking characters, my wiki crashes, hard..

Problem 2:

Symptoms vary, from a blank screen, to an error 500 containing
foswiki.fcgi: Wide character in FCGI::Stream::WRITE at /usr/local/src/git.trin.org.au/core/lib/Foswiki/Engine/FastCGI.pm line 182.
- but it's always a "crash" (you can't access the topic). Only on topics with non-ascii chars.

So I did some googling and found http://www.gossamer-threads.com/lists/rt/devel/92714.

Anyway, the patch is below:

diff --git a/lib/Foswiki/Engine/FastCGI.pm b/lib/Foswiki/Engine/FastCGI.pm
index 036c9ec..bf29ea0 100644
--- a/lib/Foswiki/Engine/FastCGI.pm
+++ b/lib/Foswiki/Engine/FastCGI.pm
@@ -51,6 +51,7 @@ use strict;
 use FCGI;
 use POSIX qw(:signal_h);
 require File::Spec;
+use Encode();
 
 use vars qw( $VERSION $RELEASE );
 
@@ -179,6 +180,7 @@ sub preparePath {
 
 sub write {
     my ( $this, $buffer ) = @_;
+    $buffer = Encode::encode_utf8($buffer);
     syswrite STDOUT, $buffer;
 }
 

Is this an appropriate fix? Or just a band-aid?

Obviously we need a configure checker to look for >= 0.70 of FCGI

-- PaulHarvey - 17 May 2011

Are you sure what you need isn't more to binmode the output?

-- OlivierRaginel - 17 May 2011

Yeah, the fix above is very wrong. It corrupts viewfile output.

-- PaulHarvey - 18 May 2011

Actually, I've found I can use print STDOUT $buffer instead of syswrite STDOUT, $buffer - except I'm unsure why we use syswrite in the first place.

-- PaulHarvey - 18 May 2011

I guess it's to avoid buffering.

Also, to answer your utf8 question:
Note that if the filehandle has been marked as ":utf8", Unicode characters are written instead of bytes (the LENGTH,
OFFSET, and the return value of syswrite() are in UTF-8 encoded Unicode characters).  The ":encoding(...)" layer
implicitly introduces the ":utf8" layer.  See "binmode", "open", and the "open" pragma, open.
So if we binmode it, or open it as you also suggested (which does the same anyway), it should work (famous last words).

-- OlivierRaginel - 18 May 2011

It really doesn't work. Sven read this somewhere that I hadn't seen (apparently I'm blind? it's not in Ubuntu's FCGI.pm 0.71 anyway...):

FCGI.pm isn't Unicode aware, only characters within the range 0x00-0xFF are supported. Attempts to output strings containing characters above 0xFF results in a exception: (F) Wide character in %s.

Users who wants the previous (FCGI.pm <= 0.68) incorrect behavior can disable the exception by using the bytes pragma.

So now the patch we are using that seems to "solve everything" is:
diff --git a/lib/Foswiki/Engine/FastCGI.pm b/lib/Foswiki/Engine/FastCGI.pm
index 036c9ec..44a7508 100644
--- a/lib/Foswiki/Engine/FastCGI.pm
+++ b/lib/Foswiki/Engine/FastCGI.pm
@@ -179,6 +179,7 @@ sub preparePath {
 
 sub write {
     my ( $this, $buffer ) = @_;
+    use bytes;
     syswrite STDOUT, $buffer;
 }

-- PaulHarvey - 19 May 2011

So it seems from the FCGI.pm point of view, we should be calling utf8_encode on our $buffer before printing to STDOUT, which seems go against some other advice I've been reading about trying to set up the i/o layer to take care of this for us... I certainly couldn't make binmode() fix it for FCGI

-- PaulHarvey - 19 May 2011

Removed Olivier and I from the "Waiting for" list. This is in confirmed state, so is waiting for action, not for either of us.

-- CrawfordCurrie - 24 May 2011

Fixed in utf8 branch, awaiting merge

-- Main.CrawfordCurrie - 17 May 2015 - 11:02

The commit: distro:e85724a19c26 seems to corrupt output from viewfile when used for PDF attachments.

Upload and fetching via CGI or CLI does not have this corruption, so it is related to FastCGI Engine.

I have not been able to recreate the original issue, but it seems wrong to double encode, since there is already a encode in Response->print, and Respone->body, has some comments that body should already be a bytestreame.

My testing have been done with perl 5.22.0 and FCGI 0.77.

-- KryoStoffer - 02 Jul 2015
 
Topic revision: r13 - 06 Jul 2015, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy