You are here: Foswiki>Tasks Web>Item13435 (26 May 2015, GeorgeClark)Edit Attach

Item13435: {Store}{Encoding} changes back to utf-8 after each save of the configuration

pencil
Priority: Urgent
Current State: Closed
Released In: 1.2.0
Target Release: n/a
Applies To: Engine
Component: configure
Branches: master
Reported By: GeorgeClark
Waiting For:
Last Change By: GeorgeClark
The problem is that Configure::Load sets $Foswiki::cfg{Site}{CharSet} to hardcoded 'utf-8' for compatibility with older extension, which gets saved. Then on the next load, the %remap code overlays {Store}{Encoding} with {Site}{CharSet}, and deletes {Site}{CharSet} which is then recreated as utf-8. So no matter what you set Encoding to, it ends up utf-8.

  1. Remap should only apply an obsolete key if the new version is missing. Don't keep replacing it with the obsolete version.
  2. {Site}{CharSet} should be set to the {Store}{Encoding} as a default, and only default to utf-8 if encoding is not defined.

-- GeorgeClark - 25 May 2015

Partial revert. Crawford points out in email that it's correct that the {Site}{CharSet} be forced to utf-8, however it still isn't correct that the remap overrides the Store encoding with the Site Charset. That should only happen if Encoding is not configured.

-- GeorgeClark - 26 May 2015


George (et al, especially Michael), this is problematic. Sorry for the long mail, but I think I need to try and explain this clearly.

The {Site}{CharSet} is no longer used in the core, and is defined purely for use by extensions.
There are three scenarios in which {Site}{CharSet} might be used in an extension:

    For decoding request parameters. This will usually only be done for AJAX parameters that are going to feed direct into either web/topic/attachment names or topic content.
    For reading/writing directly to/from topic files on disk (e.g. ">:encoding($Foswiki::cfg{Site}{CharSet})".
    For passing data to/from external programs that only understand certain charsets on their command-lines.

Most extensions are very sloppy and have been written to tacitly assume byte-length characters. This isn't a problem so long as:

    The user is only using 7-bit-significant bytes (ASCII) in names/content, and/or
    There are no direct file operations or system() calls writing names/content, and
    There are no regexes encoded to operate only on 8-bit data (e.g. using explicit numeric char codes) either in Perl *or* in JS, and
    The Foswiki::Sandbox has been used for all external program calls.

It becomes more of a problem when the environment has been using all 8-bits of each byte for character codes (for example German) or the extension decodes parameters explicitly. The most likely problem scenario is therefore:

    {Site}{CharSet} in 1.1.x was iso-8859-* (or similar 8-bit charset) and
    The local language used for content/web/topic names uses high-bit characters and
    The extension decodes utf-8 request parameters and re-encodes them using {Site}{CharSet}.

I intended that the store would automatically trap (1) and (2) and convert disk content from the old 1.1.x {Site}{CharSet} to UTF-8 whenever the store is interacted with, by setting {Store}{Encoding} = the old 1.1.x {Site}{Encoding}. (3) would be handled by forcing the 1.2 {Site}{CharSet} = 'utf-8'.

I can't think of any circumstance where ({Site}{CharSet} == {Store}{Encoding} != 'utf-8') would be appropriate.

Other problems that may occur generally involve calls to CPAN modules that assume byte strings - for example, Digest::MD5. There is no way we can defend these - they will have to be dealt with on a case-by-case basis.

Regards,

C.

On 25/05/15 21:16, GitHub wrote:
>   Branch: refs/heads/master
>   Home:   https://github.com/foswiki/distro
>   Commit: f1451010e11a751143c11860c3839a3c8df8a436
>       https://github.com/foswiki/distro/commit/f1451010e11a751143c11860c3839a3c8df8a436
>   Author: George Clark <geonwiki@fenachrone.com>
>   Date:   2015-05-25 (Mon, 25 May 2015)
>
>   Changed paths:
>     M core/lib/Foswiki/Configure/Load.pm
>
>   Log Message:
>   -----------
>   Item13435: Don't keep overlaying Encoding with CharSet
>
</vebatim>

-- %USERSWEB%.GeorgeClark - 26 May 2015
%COMMENT%

ItemTemplate edit

Summary {Store}{Encoding} changes back to utf-8 after each save of the configuration
ReportedBy GeorgeClark
Codebase trunk
SVN Range
AppliesTo Engine
Component configure
Priority Urgent
CurrentState Closed
WaitingFor
Checkins distro:f1451010e11a distro:46c28e70b1b9
TargetRelease n/a
ReleasedIn 1.2.0
CheckinsOnBranches master
trunkCheckins
masterCheckins distro:f1451010e11a distro:46c28e70b1b9
ItemBranchCheckins
Release01x01Checkins
Topic revision: r2 - 26 May 2015, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy