Item13338: UTF8 image name shown incorrectly in the nat-edtior
Priority: Urgent
Current State: Closed
Released In: 1.2.0
Target Release: minor
Applies To: Engine
Component:
Branches: master
UTF8 image name shown incorrectly in the nat-edtior
Probably only Mac OS X issue (NFC/NFD)?
How to reproduce
- Create an image with UTF8 name
- Attach it to some topic (add link to the topic)
- Edit with Wysiwyg editor (everything works ok)
- Switch to the natedit
- The image-name shown incorrectly (anyway, after the save - the image-name is OK)
Screenshots
- attach1.png: in the TMCE
--
JozefMojzis - 27 Mar 2015
Can you check this please, and close it unless there's a problem (in which case raise the priority to Urgent). Thanks.
--
CrawfordCurrie - 19 May 2015
Meantime
for me things getting worse. Now, the linked image doesn't shown
nor in the wysiwyg. Attached screenshots.
But, maybe this is an NFC/NFD problem, and therefore I can have problems, while for the rest of the word (read: for non mac based Foswiki installations) everything could be OK. And we agreed than the NFC/NFD problem
isn't urgent now. (My Foswiki server runs on OS X, e.g. I'm not only an client who using Mac).
So, here is needed to test this by someone - non mac user. A week ago (15.may) I added to the
Item13378 one short perl script what will create a directory on your HDD with TWO files (one with NFD name and one with NFC). Could you please run it and try attach both files into some topic, to see how an "common" (linux) system works?
And if here will be problem - this is urgent, if no problem it should remain as normal, (because sometime we will need fix the Mac problem too)
Btw, the fix will be easy one global editing command
(at least by my current tests) - change every
Encode::decode_utf8
to
Unicode::Normalise::NFC Encode::decode_utf8
. And everything should work. (ofc, un-symetrically, so doesn't needs any change to the encode_utf8). But will test this when the current trunk get stabilised.
Also, you don't implemented one central encode/decode routine and you calling directly the
Encode::decode_utf8
, instead of Foswiki::DecodeUTF8 (or such, what is pity, because in case having in the
Foswiki.pm
an routine
sub DecodeUTF8 { Encode::decode_utf8(@_); }
any encoding/decoding change could be even more easier...
- normal view:
- wysiwyg edit:
- After to natedit switch (note the filename):
--
JozefMojzis - 23 May 2015
Thanks for the report, but I really need to see what codepoints are being used for the characters. You have only attached images.
--
Main.CrawfordCurrie - 29 May 2015 - 07:00
Works fine on Linux (latest trunk, attachment name (unicode charcodes) cc e6 109 105 1e41 113 144 e3, Linux client, Chrome)
I have uprated it to Urgent per your request, and centralised character set handling to make it easier for you to experiment with normalisation.
--
Main.CrawfordCurrie - 29 May 2015 - 08:50
Ad the exact codepoints:
- testing the NFD problem with the NFC filename, (cc e6 109 105 1e41 113 144 e3) doesn't seems to me much relevant...
- U+000CC LATIN CAPITAL LETTER I WITH GRAVE
- U+000E6 LATIN SMALL LETTER AE
- U+00109 LATIN SMALL LETTER C WITH CIRCUMFLEX
- U+00105 LATIN SMALL LETTER A WITH OGONEK
- U+01E41 LATIN SMALL LETTER M WITH DOT ABOVE
- U+00113 LATIN SMALL LETTER E WITH MACRON
- U+00144 LATIN SMALL LETTER N WITH ACUTE
- U+000E3 LATIN SMALL LETTER A WITH TILDE
- the code points in the my example, are the same as you get from the script what i mentioned.
Sorry, my english isn't enough good, ( is not much better as is the google translate ) so now will try to repeat myself again, (with other words), i hope it will be more clear. Sorry again for this.
So,
- In the http://foswiki.org/Tasks/Item13378 is one short perl script, i modified it a bit to EXACTLY cover this test requirements. Check the attachment here.
- run it
- it will create one random-named directory with 4 files
- check the code points for the files
- try upload to YOUR foswiki the NDF.png one (the file with the longer name) - if you want, here are the code points:
\N{U+0043}\N{U+030c}\N{U+0061}\N{U+0301}\N{U+0052}\N{U+030c}\N{U+0079}\N{U+0301}\N{U+002e}\N{U+0070}\N{U+006e}\N{U+0067}
- add its name into upload-form description field
- check the "Create a link to the attached file" checkbox
- upload
- check the result...
- the link content inserted into the text
- try edit with wysiwyg (the image isn't visible)
Unfortunately, my notebook is OS X. Therefore all filenames are ENFORCED NFD. I can't simulate the "NFC" world, but you (on Linux)
CAN do the
both tests - with the help of the attached script.
However, if the attachments works OK on the Linux, we can ignore this OS X specific problem. (for now). On the Linux probably nobody will craft NFD filenames. The problem happens (probably) only when the OS X user uploading an image with wide characters..
--
JozefMojzis - 29 May 2015
Your english is fine, I just misunderstood your problem. On linux, the NFD and NFC filenames are unique (as you'd expect, since filenames are byte strings). When I unpack a readdir, I see this:
c4 8c c3 a1 c5 98 c3 bd 2e 74 78 74
43 cc 8c 61 cc 81 52 cc 8c 79 cc 81 2e 74 78 74
c4 8c c3 a1 c5 98 c3 bd 2e 70 6e 67
43 cc 8c 61 cc 81 52 cc 8c 79 cc 81 2e 70 6e 67
When uploading the files, I get the same upload for both png's and the same upload for both .txt's i.e. they are canonically equivalent. The links insert fine, are correct, and WYSIWYG works fine.
--
CrawfordCurrie - 01 Jun 2015
After discussion on IRC I understand the problem a bit better, and it's related to the processing of the
src
attribute on the img tag. Easily fixed by removing the decoding.
--
CrawfordCurrie - 02 Jun 2015