You are here: Foswiki>Development Web>CoreInternals (08 Apr 2011, RasmusPraestholm)Edit Attach

Core Internals

This page aims to describe, in geek-speak, what is going on in the core internals of Foswiki, and what we hope to achieve by making these changes. This page assumes a good understanding of software design terminology.

Where we are headed
- Guiding Principles
What has already been done
What is being worked on now
What might be worked on next

Where we are headed

Since Foswiki forked from (tm)Wiki, there have been a number of cosmetic changes but no major features from a user perspective. This is mainly because (tm)Wiki had reached a point where the active developers could see that to make any really significant improvements the core would have to be significantly restructured. Otherwise improvements would have been building on sand.

Most of the ideas that inform the work described here were first formulated in the early days of (tm)Wiki, after the first version of the core code had been written, when the first trained software professionals had a chance to review it. Many of these ideas have been reiterated, and thereby reinforced, over the years, but are still as relevant today as they ever were. The big difference is that with Foswiki, they are now, at long last, achievable.

Guiding Principles

Use interfaces to decouple implementation and support replaceable modules at all levels of the architecture.
Use well documented and well understood design patterns.
Don't compromise backward compatibility.
Keep it simple; many people with many different skills levels will work on the code.
Make it secure.
Make it fast.
Expose a well defined, web-accessible interface (e.g. AJAX), supporting fine grained access.

The end goal is to create a web applications platform that will support widget mashup as easily and flexibly as it supports server-side application integration, all the time in a robust and maintainable package that supports most end users out-of-the-box.

Most of the above are supported by the concept of the TOM (Topic Object Model), a model similar to the DOM (Document Object Model) familiar to most people who have coded Javascript. The principle of the TOM is to have a model of a topic that "hides" the detail of how the topic is actually represented in the store, in the same way as the DOM hides most of the detail of HTML from users. The TOM is a key concept that has guided every refactoring since it was first thought of (2001?).

What has already been done

As we said above, some time ago (tm)Wiki reached a point in its development where to make any really significant improvements required the core to be heavily re-engineered to remove the piles of loose sand. Despite this, there was significant resistance to change in the (tm)Wiki project, and much of the refactoring had to be done "under cover" for political reasons.

Despite this we have been working steadily for some years to "solidify" the sand piles, and rearchitect the core into a set of concrete object-oriented foundations. This rearchitecting has always been done in the context of the requirement to maintain backward compatibility, so from (tm)Wiki 4.0 we started a program of unit-testing everything we could, to protect investment. The main people contributing to this work have been: CrawfordCurrie, GilmarSantosJr and SvenDowideit, though many others have helped along the road. Almost all of this work has been purely volunteer effort.

There has already been a huge amount of refactoring with the goal of object orientation in the core, starting with (tm)Wiki 4.0, and most recently in the user management code. Aside from the extensive unit test suite, some of the sandpiles we have already worked on include:

Total lack of documentation
Rendering done by haphazard regex replacements
diff
Login management
Registration
Massive code modules broken down - Render, TWiki.pm - and new abstractions introduced
User management locked in to "the (tm)Wiki way" - user files on disc. User code spread throughout the core and plugins.
Locked in to CGI

What is being worked on now

The major sandpiles we still see in the core at the moment are

Internal APIs built on imperative programming principles, and the impact on the store
Search
Plugin handlers (and Func)
Rendering

We'll expand on the activities in each of these areas.

Imperative APIs and the Store

A lot of the core APIs depend on passing long lists of parameters. This is bad practice, and works against the move to OO design. Starting after 1.0.4 Crawford changed the responsibilities of the core to delegate almost all store operations to a "topic object", a role taken by a much-extended Foswiki::Meta object. Now a caller interacts with a Foswiki::Meta object that represents the store item they are changing, and has no direct interaction with the actual store implementation. The Foswiki::Meta object interacts with the real store via the methods of a well-defined interface (Foswiki::Store) which currently has a single implementation, Foswiki::Store::VCStore. This in turn uses the handler architecture from (tm)Wiki to support RCS file on disc.

The main step taken here is the decoupling of the vast bulk of the core code from the store, which allows us to introduce alternative store implementations at two different levels; file based implementations at the level of the old RCS handlers, and full store implementations, which is the way a database store will probably be done.

At the same time many other imperative APIs have been improved by passing the topic object in place of separate $web, $topic parameters. This helps encapsulate the implementation of a store object and gives scope for lazy-loading of the topic objects. It is the first major step taken towards an internal TOM to date.

Search

The Foswiki::Search module has something of a reputation among Foswiki devs. It is one of the most fragile pieces of coding some of us have seen in many years. It is a "front to back" monolithic processor of searches that virtually stands alone from the rest of Foswiki, and is written in an imperative style that went out of fashion in the 1980s. The fragility has scared off most developers, and the module has remained largely unchanged for many years. The only major change was the move of a significant chunk of the module into the store, a step which was taken several years ago in preparation for the store changes described above, and the addition of query searching. The major remaining sandpile is the processing of search results for presentation.

The strategy being taken here is to introduce the concept of result sets. Result sets are not specific to database searches, or even search - a result set can just be a list of topics (in the right format). By creating a result set object we are able to not only plug in different result processors, but also delegate result set generation e.g. to a database store. Result Sets also give us a way to separate the searching (or creation of a result set) from sorting and (importantly) rendering a list. More implementation-dependencies in the search architecture are being pushed into the store implementation, which will give individual stores much greater scope for optimisation.

Sven is working on this.

Plugin Handlers

One thing that the refactoring of the store APIs described above has thrown into sharp relief is the inefficiency of the plugin handlers (event listeners). These listeners were little more than an exposure of the internals of (tm)Wiki, and since those internals have changed the plugin listeners are now out of step. We already refactored the plugins code so that listeners are only called if they are registered, so it should now be possible to define new abstract events that are less coupled to the internals.

At the same time the Foswiki::Func API suffers from the same problem; it was a simple exposure of the internals of the (tm)Wiki code, and therefore exposes inappropriate functionality to the plugin author, albeit that functionality is often undocumented. It also requires the passing of parameters in an inefficient and clumsy way - though there is a balance to be struck between efficiency and the self-documenting nature of some the existing APIs that makes them attractive to naive users.

Crawford is planning to do this later in the Foswiki 2.0 dev cycle.

Rendering

Like Search, rendering is still a massive hodge-podge of imperative code. However rendering is a key step in the generation of the TOM from the TML representation of a topic. We are currently working on dividing the rendering process into "lossless" and "lossy" steps, so that a lossless TML->TOM->TML pipeline can be established. The "lossy" steps, involving the generation of the actual html (or xml, or whatever output format you want) will be deferred as late as possible in the process. This work is drawing on the experience gained from Crawford's experience with the WysiwygPlugin, which tried to be a lossless convertor from TML to HTML and back, though it isn't efficient enough for this task. Crawford is working on this.

When we have a lossless convertor we should be able to define new plugin handlers that operate on the TOM. We will still need to convert back to TML for old plugins but over time these should become the exception rather than the rule.

Another advantage of this conversion step is that it will help define the external topic object model, which will allow external components to efficiently address structural components of topics, such as tables, forms, paragraphs etc. This will significantly simplify the job of the WysiwygPlugin, and make it possible to write much more sophisticated AJAX components.

What might be worked on next

Macro definitions, lightweight Tags, Macro parameter validation

Sven: In working on the ExtractAndCentralizeFormattingRefactor, I keep re-noticing the haphazard way we validate Macro parameters, and that there is no consistent way for us to communicate parameter validation back to the user. The lightweight tags proposal of many years ago had an addendum that would enable us to specify the parameters, and basic validation ranges - thus extracting that phase of processing from the myriad of implementation functions.

Optimising search

Sven, Crawford: While talking about the search this morning, we started musing if the @tokens loop in Search.pm was misplaced. It is written the way it is because that is best for grep, but that approach is SNAFU for other search algorithms. Sven had the idea of creating text, keyword and regex nodes in the query structure, which would neatly combine the two search interfaces into one and also allow us to delegate the decision on the algorithm to where it is best made; within the store implementation.

More TOM

As alluded to above we need ways to address or identify topic sections, tables, paragraphs, headings, etc - somewhat purple numbers style, though we hope to avoid having to add notation marks as purple numbers do.

-- CrawfordCurrie, SvenDowideit