Item2577: Doesn't match https spamlinks

pencil
Priority: Normal
Current State: Closed
Released In:
Target Release: n/a
Applies To: Extension
Component: BlackListPlugin
Branches:
Reported By: PaulHarvey
Waiting For: Main.KennethLavrsen
Last Change By: KennethLavrsen
regex matches only http://wikispam, not https://wikispam

I'm not sure if the change I've made is appropriate.

Saving edits to reasonably large topics on our production wiki is now quite slow.

Reverting.

-- PaulHarvey - 05 Jan 2010

Actually, rolling back to Kenneth's latest release on our production wiki still doesn't help the situation.

I sat there with top open to watch the CPU time count up on the view script. I got distracted at about the 02:00 minutes mark (didn't take much longer than that). Wow!

The content I'm trying to save (which also triggers spamword detection) can be found at

http://wiki.trin.org.au/Mangroves/Bibliography?raw=on

An equivalent quantity of "The quick brown fox" text only takes a couple of seconds, found at

http://wiki.trin.org.au/Sandbox/TestTopic127?raw=on

-- PaulHarvey - 05 Jan 2010

I have tried with copies of the two topics.

First I could not see 2 minute or anything like this. The quick brown fox topic saves in 7 seconds.

After updating to the checkin you did for this item I could save the topic in 7 seconds.

The problem comes when I try the Bibliography topic. I get saving time now at 7 seconds with BLP disabled and 37 seconds with it enabled where the spam regex is removed. It makes no difference if I also add all the Quick Brown Fox text. It is the number of http links that sets the delay.

I am sure the AntiWikiSpamPlugin will have the same performance as this feature is exactly the same in the two plugins. Both run through the saved topic text with the massive spam regex from the common spam regex site.

I do not see how we can speed up that unless the regexes we use can be optimized.

I tried to remove all the http strings replacing them with "hej" and then it took 7 seconds to save the topic.

I have checked in a change so we only check for http and https. There is no need to also look for gopher and telnet etc. They are not used for spam. I could not see much difference if I ran with http or https? in the regex. So this is OK to add to the plugin. Question is what we can do for speed.

With normal topics with only few http links the delay is a second or two when you save.

-- KennethLavrsen - 05 Jan 2010

I will close this as the bug item itself is resolved. I will open a new performance bug item.

-- KennethLavrsen - 07 Jan 2010
Topic revision: r6 - 07 Jan 2010, KennethLavrsen
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy