Item9462: viewfile can't handle attachments with umlaute

pencil
Priority: Urgent
Current State: Closed
Released In: 1.1.3
Target Release: patch
Applies To: Engine
Component:
Branches:
Reported By: MichaelDaum
Waiting For:
Last Change By: KennethLavrsen
When you secure attachments with viewfile, the redirect process url-encodes an utf8 filename before passing it on to the viewfile cgi. However, now viewfile can't find the attachment anymore.

What is needed is:

  1. Foswiki::urlDecode
  2. Foswiki::UTF82SiteCharSet

before asking the storage whether the file exists.

Can somebody with more utf8 foo confirm that this is the right thing to do?

Here's what I did to Foswiki::UI::View.pm in Foswiki-1.0.9:

sub viewfile {
...
    if ($fileName) {
      $fileName = Foswiki::urlDecode($fileName);
      my $decodedFileName = $session->UTF82SiteCharSet($fileName);
      $fileName = $decodedFileName if defined $decodedFileName;
    }
...
}

-- MichaelDaum - 12 Aug 2010

Is this the same issue as Item5485?

-- GeorgeClark - 13 Aug 2010

Yes, I believe it is. Not sure I have more utf8-fu but i probably have more utf8-wu, and this looks like the right solution to me (I think I may have suggested it previously)

Michael, please go ahead with your fix.

-- CrawfordCurrie - 29 Aug 2010

Can anyone confirm, that the problem is only partly solved in 1.1.1? : It works as long as the file is called via view.pm. But if you choose in the attach dialog to link the file in the document, a link is crated, which points to the file directly an still does not work if there is an umlaut in the filename. This is especially annoying if you include pictures with an umlaut and want them to be shown after their attachment. Including a picture with an umlaut via the wysiwyg editor works well. Is there a quick workarround like the one above?

-- EnrikGuenter - 28 Oct 2010

Enrik, try disabling {UseLocale} in configure. That way filenames are sanitized while attaching them which kicks out umlaute. No solution but circumvents these problems.

-- MichaelDaum - 28 Oct 2010

That is right Michael. But in my case not an acceptable solution since the filename is beeing altered when attached and thereby becomes unfindable when seached for. I'm pretty sure i just need to use decode line like the one above at the right place to generate a href with the umlaut in the html code instead of <a href ......%f6....pdf...> as it currently does. I have to work with the IE8 here (....) and sadly it does not resolve %f6 as an umlaut whereby it accepts an umlaut in the link just fine (all browsers do).

-- EnrikGuenter - 29 Oct 2010

I tested this with Danish letters and it worked so the umlauts must be a special case.

I will try with umlauts and IE.

We need to fix this. And Enrik is right. Once you have topic names with none A-Z letters you cannot go back to not using locale on.

-- KennethLavrsen - 29 Oct 2010

According to http://irclogs.foswiki.org/bin/irclogger_log/foswiki?date=2010-10-28,Thu&sel=837#l833 Enrik runs de-de iso 8859-15 and not UTF8.

I think this original bug item addressed the UTF8 case.

I will work on the ISO case and IE8

-- KennethLavrsen - 31 Oct 2010

I cannot reproduce this problem

Viewfile works great with umlauts

I run with de_DE.ISO-8859-1 when I test

I try both IE8 and Firefox. And they behave exactly the same

When I link directly to pub I must encode the characters so ö becomes %f6

When I go via viewfile it eats any combination of encoding or no encoding of both topic name and attachment name.

Enrik we need you to describe exactly what it is you do that fails. With an example and step by step.

-- KennethLavrsen - 31 Oct 2010

Ok, let me try set it right: I attach a file called abcö.jpg in some random testpage. In the attach dialog i choose to link the file within the document.

Now, i can see the picure in the page (since 1.1.1). But in the attachment table at the bottom of the page the link points directly to the file in pub, like "http://...../pub/Sandbox/Test/abc%f6.jpg". If i click on it, i get a HTTP 403 (in ie8 and fiefox). If i change the adress manually to "http://...../pub/Sandbox/Test/abcö.jpg" it works fine.

Also, if i click on manage attachment and click on view in this page, the file is beeing called with the viewfile method and is loaded without error (since 1.1.1).

So acually only the direct link with the %f6 instead of ö doesn't work. I have to change the %f6 to ö manually. Or might this be a problem with my webserver? (Apache on Win2003 Server)

-- EnrikGuenter - 01 Nov 2010

When I test on my server it is only when a direct link to pub is url encoded (ie %f6) that it works. So I have the opposite behaviour.

It could very well be a server setting that is needed and that Apache behaves differently on a Win2003.

Try and see the discussion on Item5485. It may give some hints on what can be done.

I do not have a Win2003 or even a Foswiki running on Windows at the moment so I cannot reproduce this.

-- KennethLavrsen - 01 Nov 2010

I checked the code. A key function Foswiki::UTF82SiteCharSet has this code.

    # Convert into ISO-8859-1 if it is the site charset.  This conversion
    # is *not valid for ISO-8859-15*.
    if ( $Foswiki::cfg{Site}{CharSet} =~ /^iso-?8859-?1$/i ) {

        # ISO-8859-1 maps onto first 256 codepoints of Unicode
        # (conversion from 'perldoc perluniintro')
        $text =~ s/ ([\xC2\xC3]) ([\x80-\xBF]) /
          chr( ord($1) << 6 & 0xC0 | ord($2) & 0x3F )
            /egx;
    }

So the conversion needed does not work with iso-8859-15. You have to use -1. This means that the Euro sign is not in the charset and some other minor details.

Enrik - try ISO-8859-1. You said on IRC that you ran -15 and I even adviced you to try that. But it seems that was a bad advice.

-- KennethLavrsen - 01 Nov 2010

Thank you Kenneth, but i think the function UTF82SiteCharSet is called in viewfile. If i open my document/picture with umalut abcö.jpg via viewfile, it works.

I have to correct one statement in my previous post: If i choose to link the picture in the attach dialog, the picture is NOT beeing shown on the Page. The html code <img src="...abc%f6.jpg"... is generated. But if i go into the WYSIWYG editor, change NOTHING but click save anyway, the code is corrected to img src="...abcö.jpg"... and the picture is correctly shown.

There is no way to repair the link in the attached file table at the bottom. There, the html code is always a href="...abc%f6.jpg..", hence doesn't work.

The third scenario is, when i click on "manage" in the attachment table and than on "view": The file is called via viewfile which includes the UTF82SiteCharSet function you mentioned and is beeing shown properly.

I tried to set AddDefaultCharset utf-8/iso-8859-15 on my Apache webserver: No change what so ever. Tested with IE8 and Firefox 3.6.

Setting the charset in Foswiki to iso-8859-1 doesn't change anything either.

-- EnrikGuenter - 01 Nov 2010

Just updated to 1.1.2. Still the same problem.

Yet, i want to point out that this Task/Bug/Problem is somhow related to Item9170 . For PaulHarvey it's a bug, for me it's currently a feature since it repairs/alters at least the href links and thereby makes showing pictures with an umlaut in their filename possible.

Perhaps I should open a new task for this, since the Summary isn't actually adequate anymore. Viewfile handles umlaute well. To me it seems more like a problem of attach, since the generated links in the html code don't work.

-- EnrikGuenter - 10 Nov 2010

Hey there,

I know a problem can't have a priority higher than urgent and i know it's all your free time (and i'm thankful for that!). I just wanted to ask about the progress related to that problem.

I've set up an extendet wiki as an intranet solution for a german company with about 1000 employees from which at least 500 will work with the wiki. It is going to be launched still this year. Actually the only thing why we are holding back is this poblem with umlaute and attachments. There will be many attachments each day and though they suck, umlaute are not rare in the german language.

So both attachments and included pictures (via the attachment dialog, as mentioned pictures work when attached via Wysiwyg) have to work for us before we can lauch.

No pressure, just wanted to let you know. And also wanted to let you know about my general appreciation for Foswiki! A realy powerful and great tool allowing quick solution for complex and otherwise timeconsuming problems!

Kind regards,

-- EnrikGuenter - 08 Dec 2010

Hi Enrik, I think finishing off Item9973 will help fix this bug too. Will try to find some time very soon. I must state, however, that I am just a simple 7-bit ascii user by day - would be nice if we could get some people with more internationalisation skills to help smile

-- PaulHarvey - 20 Dec 2010

There have been some changes on trunk WysiwygPlugin and TinyMCEPlugin. If you could try to reproduce here again to see if any of the issues have changed on http://trunk.foswiki.org/Sandbox that would help.

-- PaulHarvey - 20 Dec 2010

The issue in this bug report is not related to Wysiwyg. It is the links to older versions of an attachment via viewfile that fails to work and it seems only in Window which is why I got stuck. I do not have a windows based test server.

-- KennethLavrsen - 20 Dec 2010

If you could tell me how i can get access to the trunc i'd be happy to test everything you like me to test. I'm also happy to test any test script/topic here on our server. Due to security issues our server is stricly local so i can't give you access to it.

-- EnrikGuenter - 30 Dec 2010

I think I've found the error: viewfile first sanitizes the attachment name and then decodes it. That's the wrong order.

Example:

.../bin/viewfile?filename=geräteschaft.txt

In Foswiki::UI::Viewfile

fileName = decode(untaint("ger%C3%A4teschaft.txt"));

-> gerC3A4teschaft.txt -> ERROR

When first decoding and then untaint:

fileName = untaint(decode("ger%C3%A4teschaft.txt"));
-> ger\xc3\xa4teschaft.txt -> CORRECT

Also compare the value of fileName when securing attachments using a redirect like this in apache:

    RewriteEngine on
    RewriteRule ^(.+)/+([^/]+?)/+(.+?)$ /bin/viewfile/$1/$2?filename=$3 [L,PT]

Sometimes makes a difference.

-- MichaelDaum - 11 Jan 2011

That is great Michael.

I had been staring myself blind on the same area of code but it did not occur to me that the sequence could be wrong. And I did not have the error on my Linnus based machine.

Thanks for fixing. This one had nagged me for a long time.

-- KennethLavrsen - 13 Jan 2011
 

ItemTemplate edit

Summary viewfile can't handle attachments with umlaute
ReportedBy MichaelDaum
Codebase 1.1.2, 1.0.9
SVN Range
AppliesTo Engine
Component
Priority Urgent
CurrentState Closed
WaitingFor
Checkins distro:4ca11b74e715 distro:03cdbe254bbd distro:24f3111106e7 distro:5039db55f6cf
TargetRelease patch
ReleasedIn 1.1.3
Topic revision: r32 - 16 Apr 2011, KennethLavrsen
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy