Item8608: Log file failure must never be a fatal error

pencil
Priority: Urgent
Current State: Closed
Released In: 1.1.0
Target Release: minor
Applies To: Engine
Component:
Branches:
Reported By: KennethLavrsen
Waiting For:
Last Change By: KennethLavrsen
If the Apache user cannot write to a Foswiki log file for any reason the whole site crashes.

The end users are met with a browser window with geek error. All is broken. No viewing. No nothing. Service interrupted.

I have now experienced 3 times that on the first in the month, the cron job (running at another user than Apache) caused the Foswiki code to create the new log file of the month.

And first time a user tries to look the server is dead.

And all is dead until I get to work and do the magic chown to the log file.

I tried to but a chown in my cron but goofed it up twice. So I have had 3 almost full days of downtime because of this stupid error.

The opening and saving to the log file must NEVER make be a fatal error. Foswiki should continue working without the log.

It should write the problem to the Apache error log. Maybe even trigger some tiny visible error message that the admin can see. But not die. Never die

I have read an argument that it is done like this so the installer can see right away if his access rights are setup right. But this argument falls apart when we know the real consequence. A site can be working fine. And one little cron job doing anything creating the first log in the month and you have a dead site - while the admin is far away or asleep.

We need to change the code that writes to the log so it is fault tolerant.

I just for safety created logs on my server for the next 3 years. Empty files owned by Apache. A silly work around.

-- KennethLavrsen - 23 Feb 2010

My obvious concern is when someone asks for support, and you ask "what is in your log files". They report "nothing", and because it's a hosted install, they have no access to the apache error logs either. How do they know there has been an error? The alternative, of blindly blundering on even after there has been an unloggable error, doesn't appeal.

Note that logfile creation is checked in configure so as long as configure is run, you should never get a logfile creation error. This is no different to having an unwwritable data or pub, which will also bring Foswiki down in flames. If the logfile is unwritable, the user gets a message in their browser even if they ignore configure, so they are not blinded.

It happens to us developers more because we tend to run unit tests as a different user, which can sometimes conflict with the apache user. But for normal humans, it's a lot less frequent.

I have enhanced the reporting, but I really don't regard this as Urgent (or even a bug) unless you can describe a use-case where a "normal" user might be bitten.

-- CrawfordCurrie - 26 Feb 2010

I noticed today that even configure doesn't work if the logdir is wrong/missing, which is what happened when we decided to change the default logdir (which I support).

The svn up killed the site (I was silly enough to do svn up on a trunk install that was serving real users, yes, I am that silly smile

-- PaulHarvey - 26 Feb 2010

Paul, can you be any more specific? I just tried what I thought you were describing, and configure works fine. If a site has {LogFileName} but not {Log}{Dir} defined in LocalSite.cfg then that will be an issue, yes. But it's an issue which will only impact those who upgrade without running configure (e.g. svn uppers)

-- CrawfordCurrie - 26 Feb 2010

I am not looking for enhanced reporting. The checker you have added makes no improvement at all. And I am not a special case here.

Foswiki should not crash if the log file cannot be written

My production site which is an out-of-the-box standard Linux based Foswiki with a nightly cron that runs the tools/tick_foswiki script. It has been down November 1st, December 1st and February 1st because of this silly issue. Each time I received 30-40 hate emails from users that got an error screen all morning until I arrived. Just because the cron tools/tick_foswiki script created the log file a little after midnight the first day of the month. In January I was lucky that some Chinese user had looked at a page and created the new log files in the middle of our night before the cron job ran.

I run my site exactly like we describe the installation in our documentation. Nothing is special.

I am for sure not the only one that have had this problem.

Problem is that Foswiki creates a new log file each month. So things can work great. And then on the 1st in the month, the site goes down. If I had been on vacation we could have had a one week downtime. I can risk loosing all the goodwill I have faught hard to get from my users and from management. This is critical. The problem is that the server goes from working to not working "by itself".

Some user that we support that does not have a log file is very secondary compared to having sites going completely out of service while not being attended because we are too strict with the error checking before we write to the log. That is a wrong priority.

The obvious easy solution is to replace the "die" with a silent skip of writing to the log. But that is not the only possible solution.

Another way would be to always write to the same log and rotate the log monthly. A but like what Apache does. It always writes to access_log and error_log and then a cron rotates the log. If we always wrote to the same log and rotated to an event201002 when we reach 201003 then we could be fault tolerant to the rotated logs and still be strict on the normal log. When you install Foswiki it is OK that you get a failure. But once it is running it must never go down again because we have a bad log implementation that cannot tolerate that Foswiki scripts are run from a cron run by another user than Apache.

Another way to protect would be to NEVER write to the log when Foswiki runs as root. I do not need logging of the tick running.

The main point here is to prevent that Foswiki goes down and provides no service because an unattended cron job has caused a new log file to be generated by another user than Apache.

-- KennethLavrsen - 26 Feb 2010

As much as I appreciate a bit of creative writing, that didn't really require a Norse saga. A simple "it doesn't work when logfiles are written by a user other than the apache user, and that can happen with cron jobs such as mailnotify" would have sufficed wink

OK, that's a fair point. I like the idea of logging to the same filename and rotating the log away on the turn of the month. I still want the die if the logfile can't be written, though. I see it as an essential bit of debugging for the new user. But as long as you get the permissions right, so that both the webserver user and the cron user can write it, there should be no problem.

-- CrawfordCurrie - 26 Feb 2010

I really find it annoying that my foswiki, which was runing for months, suddenly crashes. I never asked for my log files to be moved, and tbh, don't want those log files to be moved unless I specifically ask them to be moved.

I do know of a couple of companies which process the log files for whatever reason, and they, and anyone that has had a twiki / foswiki for a while will look to where they were.

I've no objection to the idea of making new users harder to support by moving the logs (now we get to ask about 4 different log locations, including the apache log file), but silently moving them on an upgrade will really stink.

The perfect storm version of what you've done, is allowed someone to come in, set up a foswiki so it all works fine, and then they leave. The month rolls over, fails due to directory permissions, and suddenly the poor end user has a totally broken setup, and has to abandon the wiki.

-- SvenDowideit - 27 Feb 2010

OK, with a great deal more work we can avoid moving them.

-- CrawfordCurrie - 28 Feb 2010

I believe this is now fixed, though trying to write an unwritable log file will still show an error (because it is one, and the installer must fix it)

-- CrawfordCurrie - 11 Mar 2010

 
Topic revision: r18 - 04 Oct 2010, KennethLavrsen
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy