You are here: Foswiki>Tasks Web>Item1498 (07 Mar 2011, GeorgeClark)Edit Attach

Item1498: viewfile corrupts binary files that don't have known extensions

pencil
Priority: Enhancement
Current State: No Action Required
Released In: n/a
Target Release: n/a
Applies To: Engine
Component: viewfile
Branches:
Reported By: GeorgeClark
Waiting For:
Last Change By: GeorgeClark
After changing to rewrite pub file access to viewfile, we've been getting occasional reports of corrupted files. The viewfile script assumes that a file is mime type text/plain and then changes type to match the suffix in a provided file of well known suffixes in data/mime.types

When Apache is configured to rewrite pub access to viewfile, we begin to see file corruption on downloaded files.

To recreate the problem, take a file attachment, such as pub/System/FileAttachment/Smile.gif and copy it to pub/System/FileAttachment/Smile

Access the file Smile directly - Apache "mime magic" recognizes the file type and serves it as Content-Type: image/gif Access the same file using the viewfile script and the file is served as Content-Type: text/plain; charset=ISO-8859-1 (Note that Smile doesn't actually get corrupted. Konqueror browser does detect a binary file with plain text encoding and generates a pop-up with corrupted content warning.)

I suspect that a default of Application/Octet-stream might be a safer default for unknown file types. A preferable longer term solution would be to do Apache style filetype magic, but that would probably add more overhead to an already slow process.

-- GeorgeClark - 23 Apr 2009

We could use some perl modules to do that, such as: CPAN:File::MMagic or CPAN:File::MMagic::XS for speed, which are aimed to be rewrites of Apache's mod_mime_magic.

I'm strongly against defaulting to something else that text/plain for security reasons.

-- OlivierRaginel - 23 Apr 2009

I suppose the right solution is to "do as Apache does" but as long as the mime type doesn't trigger browser execution of the file, I don't understand the security implications. I don't have a strong opinion either way, but I'd like to better understand. If I name the file as ".bin" then it comes down as an octet stream and results in a browser "Save file as" dialog.

It does seem to vary by browser however. Firefox seems to examine the signature of the file to determine it's type at least somewhat independently of the mime type and will display the file if it's a known type.

-- GeorgeClark - 23 Apr 2009

All popular browsers perform some degree of file sniffing -- in particular to detect that a file is of a particular graphics file format. IE is of course infamous for sniffing files served as text/plain and determining that they are HTML. I agree with George that application/octet-stream shouldn't be any less secure than text/plain -- no browser that pays the slightest amount of attention to security should be trying to arbitrarily execute an application/octet-stream file.

At least in a public installation, I think having file sniffing would increase the opportunities for attacks, so if this feature were to be added, it should be a configurable one. (I'm not terribly in favour of it (though I realize it fits in with a user-friendly "do what I mean" philosophy) -- I think the user needs to have the power (and as a result, the responsibility) to properly tag their content.)

-- IsaacLin - 23 Apr 2009

If a document gets "text" converted that should not be the result is disaster.

Binary must be default IMHO.

Where is it coded to default to text?

-- KennethLavrsen - 23 Apr 2009

Subroutine _suffixToMimeType in /lib/Foswiki/UI/Viewfile.pm

Sets mimetype to text/plain and then overrides it to the type detected by the file suffix. No suffix or unknown suffix, results in text/plain.

There is probably some performance optimization that could be done here as well. It reads the MimeTypesFileName for every attachment. So on any page with lots of embedded files, it gets read for each file. The MimeTypes could probably be cached by the session for some reduced I/O.

-- GeorgeClark - 23 Apr 2009

Apache - with mod_mime and mod_mime_magic installed
  • Attempts to determine mime type from suffix (mod_mime)
  • If type not determined, examine file magic to try to determine type (mod_mime_magic)

Here is what Apache docs say about the matter: http://httpd.apache.org/docs/2.1/mod/core.html#defaulttype

In cases where it can neither be determined by the server nor the administrator (e.g. a proxy), it is preferable to omit the MIME type altogether rather than provide information that may be false. This can be accomplished using

DefaultType None

However if not coded, the default for DefaultType is indeed text/plain So the default used by viewfile appears to be consistent with the Apache configuration when mod_mime_magic is not installed. But the Apache recommendation is to not provide any type vs. an incorrect type.

So this should probably be configurable. And if viewfile is going to become more important, then something equivalent to mod_mime_magic is probably needed as well. Changing to an enhancement request.

Here are some other references:

-- GeorgeClark - 24 Apr 2009

Item1802 related? Is "tgz" unknown to foswiki.org?

-- OliverKrueger - 09 Jul 2009

Not sure - checking the mime.types file, tgz is a covered mime type. Also followed the steps to recreate on Item1802 - with wireshark, the file was downloaded as type application/x-gzip which is correct.

I don't believe that this is the issue.

-- GeorgeClark - 09 Jul 2009

I wonder if on linux (or wherever its supported) we could use file --mime-type (or the library it uses) to improve our hit rate.

-- SvenDowideit - 17 Nov 2009

I have not seen any further reports of corrupted files. Changing this to no action required.

-- GeorgeClark - 07 Mar 2011

ItemTemplate edit

Summary viewfile corrupts binary files that don't have known extensions
ReportedBy GeorgeClark
Codebase 1.0.4
SVN Range Foswiki-1.0.0, Thu, 08 Jan 2009, build 1878
AppliesTo Engine
Component viewfile
Priority Enhancement
CurrentState No Action Required
WaitingFor
Checkins
TargetRelease n/a
ReleasedIn n/a
Topic revision: r11 - 07 Mar 2011, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy