Item2603: BlackListPlugin takes ages to process topic with really many http links

pencil
Priority: Normal
Current State: Confirmed
Released In:
Target Release: n/a
Applies To: Extension
Component: BlackListPlugin
Branches:
Reported By: KennethLavrsen
Waiting For:
Last Change By: KennethLavrsen
from Item2577 Paul noted this....

I sat there with top open to watch the CPU time count up on the view script. I got distracted at about the 02:00 minutes mark (didn't take much longer than that). Wow!

The content I'm trying to save (which also triggers spamword detection) can be found at

http://wiki.trin.org.au/Mangroves/Bibliography?raw=on

An equivalent quantity of "The quick brown fox" text only takes a couple of seconds, found at

http://wiki.trin.org.au/Sandbox/TestTopic127?raw=on

-- PaulHarvey - 05 Jan 2010

I have tried with copies of the two topics.

First I could not see 2 minute or anything like this. The quick brown fox topic saves in 7 seconds.

After updating to the checkin you did for this item I could save the topic in 7 seconds.

The problem comes when I try the Bibliography topic. I get saving time now at 7 seconds with BLP disabled and 37 seconds with it enabled where the spam regex is removed. It makes no difference if I also add all the Quick Brown Fox text. It is the number of http links that sets the delay.

I am sure the AntiWikiSpamPlugin will have the same performance as this feature is exactly the same in the two plugins. Both run through the saved topic text with the massive spam regex from the common spam regex site.

I do not see how we can speed up that unless the regexes we use can be optimized.

I tried to remove all the http strings replacing them with "hej" and then it took 7 seconds to save the topic.

I have checked in a change so we only check for http and https. There is no need to also look for gopher and telnet etc. They are not used for spam. I could not see much difference if I ran with http or https? in the regex. So this is OK to add to the plugin. Question is what we can do for speed.

With normal topics with only few http links the delay is a second or two when you save.

-- KennethLavrsen - 05 Jan 2010

Opened this bug related to the performance thing so I could close Item2577

-- KennethLavrsen - 07 Jan 2010

ItemTemplate edit

Summary BlackListPlugin takes ages to process topic with really many http links
ReportedBy KennethLavrsen
Codebase trunk
SVN Range
AppliesTo Extension
Component BlackListPlugin
Priority Normal
CurrentState Confirmed
WaitingFor
Checkins
TargetRelease n/a
ReleasedIn
Topic revision: r1 - 07 Jan 2010, KennethLavrsen
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy