Item13435: {Store}{Encoding} changes back to utf-8 after each save of the configuration
Priority: Urgent
Current State: Closed
Released In: 1.2.0
Target Release: n/a
Applies To: Engine
Component: configure
Branches: master
The problem is that Configure::Load sets
$Foswiki::cfg{Site}{CharSet}
to hardcoded
'utf-8'
for compatibility with older extension, which gets saved. Then on the next load, the
%remap
code overlays
{Store}{Encoding}
with
{Site}{CharSet}
, and deletes
{Site}{CharSet}
which is then recreated as
utf-8
. So no matter what you set Encoding to, it ends up
utf-8.
- Remap should only apply an obsolete key if the new version is missing. Don't keep replacing it with the obsolete version.
-
{Site}{CharSet}
should be set to the {Store}{Encoding}
as a default, and only default to utf-8
if encoding is not defined.
--
GeorgeClark - 25 May 2015
Partial revert. Crawford points out in email that it's correct that the
{Site}{CharSet}
be forced to
utf-8
, however it still isn't correct that the remap overrides the Store encoding with the Site Charset. That should only happen if Encoding is not configured.
--
GeorgeClark - 26 May 2015
George (et al, especially Michael), this is problematic. Sorry for the long mail, but I think I need to try and explain this clearly.
The {Site}{CharSet} is no longer used in the core, and is defined purely for use by extensions.
There are three scenarios in which {Site}{CharSet} might be used in an extension:
For decoding request parameters. This will usually only be done for AJAX parameters that are going to feed direct into either web/topic/attachment names or topic content.
For reading/writing directly to/from topic files on disk (e.g. ">:encoding($Foswiki::cfg{Site}{CharSet})".
For passing data to/from external programs that only understand certain charsets on their command-lines.
Most extensions are very sloppy and have been written to tacitly assume byte-length characters. This isn't a problem so long as:
The user is only using 7-bit-significant bytes (ASCII) in names/content, and/or
There are no direct file operations or system() calls writing names/content, and
There are no regexes encoded to operate only on 8-bit data (e.g. using explicit numeric char codes) either in Perl *or* in JS, and
The Foswiki::Sandbox has been used for all external program calls.
It becomes more of a problem when the environment has been using all 8-bits of each byte for character codes (for example German) or the extension decodes parameters explicitly. The most likely problem scenario is therefore:
{Site}{CharSet} in 1.1.x was iso-8859-* (or similar 8-bit charset) and
The local language used for content/web/topic names uses high-bit characters and
The extension decodes utf-8 request parameters and re-encodes them using {Site}{CharSet}.
I intended that the store would automatically trap (1) and (2) and convert disk content from the old 1.1.x {Site}{CharSet} to UTF-8 whenever the store is interacted with, by setting {Store}{Encoding} = the old 1.1.x {Site}{Encoding}. (3) would be handled by forcing the 1.2 {Site}{CharSet} = 'utf-8'.
I can't think of any circumstance where ({Site}{CharSet} == {Store}{Encoding} != 'utf-8') would be appropriate.
Other problems that may occur generally involve calls to CPAN modules that assume byte strings - for example, Digest::MD5. There is no way we can defend these - they will have to be dealt with on a case-by-case basis.
Regards,
C.
On 25/05/15 21:16, GitHub wrote:
> Branch: refs/heads/master
> Home: https://github.com/foswiki/distro
> Commit: f1451010e11a751143c11860c3839a3c8df8a436
> https://github.com/foswiki/distro/commit/f1451010e11a751143c11860c3839a3c8df8a436
> Author: George Clark <geonwiki@fenachrone.com>
> Date: 2015-05-25 (Mon, 25 May 2015)
>
> Changed paths:
> M core/lib/Foswiki/Configure/Load.pm
>
> Log Message:
> -----------
> Item13435: Don't keep overlaying Encoding with CharSet
>
</vebatim>
-- %USERSWEB%.GeorgeClark - 26 May 2015
%COMMENT%