Feature Proposal: Support an XML schema for webs/topics
Motivation
We need to be able to talk Foswiki resources to XML-supporting web services. This is also necessary for using any XML database resources, such as dbxml.
Description and Documentation
Add an xml generator to Foswiki::Meta. i have already implemented a simple (almost 1:1) xml generator. Here's an (untested) DTD
<!ELEMENT web (web+, topic+)>
<!ATTLIST web
name CDATA #REQUIRED>
<!ELEMENT topic (body, form+, topicmoved, topicparent)>
<!ATTLIST topic
name CDATA #REQUIRED
format CDATA
date CDATA
version CDATA
rev CDATA
author CDATA>
<!ELEMENT body (#CDATA)>
<!ELEMENT form (field+)>
<!ATTLIST form
name CDATA #REQUIRED>
<!ELEMENT topicmoved EMPTY>
<!ATTLIST topicmoved
to CDATA #REQUIRED
from CDATA #REQUIRED
date CDATA #REQUIRED
by CDATA #REQUIRED>
<!ELEMENT topicparent EMPTY>
<!ATTLIST topicparent
name CDATA #REQUIRED>
<!ELEMENT fileattachment EMPTY>
<!ATTLIST fileattachment
name CDATA #REQUIRED
version CDATA
path CDATA
size CDATA
date CDATA
user CDATA
comment CDATA
attr CDATA
movedfrom CDATA
movedby CDATA
movedto CDATA
moveddate CDATA>
<!ELEMENT field EMPTY>
<!ATTLIST field
name CDATA #REQUIRED
value CDATA #REQUIRED
title CDATA>
Examples
<web name="TemporaryMetaTestsTestWebMetaTests">
<topic name="TestTopicMetaTests" format="1.1" version="1.1" date="12345678" rev="1" author="BaseUserMapping_666">
<body>
<![CDATA[BLEEGLE
]]>
</body>
</topic>
<topic name="WebPreferences" format="1.1" version="1.1" date="12345678" rev="1" author="BaseUserMapping_666">
<body>
<![CDATA[Preferences]]>
</body>
</topic>
<web name="SubWeb">
<topic name="WebPreferences" format="1.1" version="1.1" date="12345678" rev="1" author="BaseUserMapping_666">
<body><![CDATA[Preferences]]>
</body>
</topic>
</web>
</web>
Impact
Implementation
--
Contributors: CrawfordCurrie - 30 Nov 2009
Discussion
Some comments:
- I guess
fileattachment
should be a possible child elem of topic as well.
- How do we deal with non-standard meta data like
META:COMMENT
and the like?
- I really would like to see more sub-topic xml structuring, i.e. for tables, paragraphs and sections.
-
META:PREFERENCES
are missing.
- We should use a
foswiki
namespace.
- It might be of advantage to have a
web
attr of the topic node in addition to nesting it ... or one or the other.
- Ideally, this would not be called "support for xml generation" only. There are more objects in foswiki that would be worth of xml-ifying and which are not subsumed by the current Meta API. So adding a
toXml
to the Meta class is just a first step.
- Internal modules and extensions would ideally operate on the dom being passed around. See Xwiki's rendering module and the role of the the document object in it.
Here's a quick brainstorming on an xml schema:
<foswiki:topic name="..." rev="..." author="..." date="..." web="...">
<foswiki:section name="..." type="...">
...foswiki:paragraph...foswiki:table...foswiki:section...
</foswiki:section>
<foswiki:paragraph>
...foswiki:paragraph...foswiki:table...foswiki:section...
</foswiki:paragraph>
<foswiki:table type="html|wiki">
<foswiki:tr>
<foswiki:th>....</foswiki:th>
<foswiki:td>....</foswiki:td>
</foswiki:tr>
</foswiki:table>
<foswiki:preference scope="local|set"> ...value... </foswiki:preference>
<foswiki:preference scope="local|set"> ...value... </foswiki:preference>
<foswiki:acl type="allow|deny" action="view|change|rename">
<foswiki:user id="..." />
<foswiki:user id="..." />
</foswiki:acl>
<foswiki:attachment name="..." date="..." size="..." comment="..." url="..." />
<foswiki:attachment name="..." date="..." size="..." comment="..." url="..." />
<foswiki:meta type="custom" name="..." ... />
<foswiki:meta type="custom" name="..." ... />
<foswiki:form name="...">
<foswiki:field name="..." title="...">...value...</foswiki:field>
<foswiki:field name="..." title="...">...value...</foswiki:field>
</foswiki:form>
<foswiki:form name="...">
<foswiki:field name="..." title="...">...value...</foswiki:field>
<foswiki:field name="..." title="...">...value...</foswiki:field>
</foswiki:form>
</foswiki:topic>
Later on, we might also think about storing more information about a site into xml, i.e. backlinks, wanted pages, user info and acls:
<foswiki:group id="...">
<foswiki:acl type="allow|deny" action="view|change|rename">
...
</foswiki:acl>
<foswiki:user id="..." />
<foswiki:user id="..." />
</foswiki:group>
<foswiki:user id="..." login="..." displayname="..." password="..." />
<foswiki:user id="..." login="..." displayname="..." password="..." />
<foswiki:user id="..." login="..." displayname="..." password="..." />
If we have a
web
node then we might also think about this
<foswiki:web name="..." summary="...">
<foswiki:preference name="..." finalized="yes|no">
...
</foswiki:preference>
<foswiki:acl type="allow|deny" action="view|change|rename">
<foswiki:user id="..." />
<foswiki:user id="..." />
</foswiki:acl>
<foswiki:web name="..." summary="...">
...
</foswiki:web>
</foswiki:web>
--
MichaelDaum - 30 Nov 2009
I am very excited to see others thinking about this.
In fact when I first picked up our tmwiki/foswiki installation,
TWiki:Plugins.XmlQueryPlugin confirmed in my mind that we had a viable platform for XML interoperability (a requirement I will have to work on in 2010).
It is important however that we think about re-using existing schemas (or at least justify why we wouldn't) for as much of our metadata as possible. DCMI at a minimum (mentioned in the DITA standard, see
SupportDITA).
I will see if I can convince my colleagues (who are closer to the soup of XML standards than I) to help this effort.
- Later: After an initial discussion, it seems we could also use FOAF for describing users along with DCMI to describe a good chunk of our topic metadata. Using existing standards where possible will make the generated XML easier for integrators who will be consuming it.
--
PaulHarvey - 02 Dec 2009
The starting point was of course an almost-verbatim map of the existing schema. While using standard schemata has attractions, it is also potentially a lot more work as code has to be develoed to decide what to do about extra/missing bits that don't map to the Foswiki schema. How good a fit could we get?
On Michael's points:
- I guess
fileattachment
should be a possible child elem of topic as well
- I thought about this, but decided against for no good reason (probably just laziness again)
- How do we deal with non-standard meta data like
META:COMMENT
and the like?
- Good question. A DTD would require extension, but by maintaining 1:1 name mappings from the XML to the Foswiki element (e.g. <comment> would map to META:COMMENT) it lets us write a SAX-based parser independent of the DTD. The only problem would come if one of these non-standard meta-data were structured, in the way FORM/FIELD are.
- I really would like to see more sub-topic xml structuring, i.e. for tables, paragraphs and sections.
- The reason I didn't try to extend the schema into the text data (paras, tables etc) is the old one; there are many different potential views (flat macros, expanded macros, HTML, heading based, table based, list based, structured TML, chained etc etc) and I don't think this is the right way to drive that decomposition. This topic is just about generating XML.
-
META:PREFERENCES
are missing.
- We should use a
foswiki
namespace.
- Yes to the namespace, I was just lazy.
- It might be of advantage to have a
web
attr of the topic node in addition to nesting it ... or one or the other.
- Hmmm. I'd be against that, because it makes it awkward to move a topic by simply juggling pointers in the DOM. When the XML is parsed, then the DOM can have a parent attribute derived at that time, but outputting it seems restrictive to me.
- Ideally, this would not be called "support for xml generation" only. There are more objects in foswiki that would be worth of xml-ifying and which are not subsumed by the current Meta API. So adding a
toXml
to the Meta class is just a first step.
- Sure, but topics was my initial focus (and initial requirement)
- Internal modules and extensions would ideally operate on the dom being passed around. See Xwiki's rendering module and the role of the the document object in it.
- That is indeed the ultimate goal of a lot of my internal restructuring of the Foswiki::Meta object.
--
CrawfordCurrie - 04 Dec 2009
- The generated XML doesn't have to use fully qualified foswikI:namespace on each element, it can be set as the default namespace at the top of the document, and then it will be implied.
- Is the reason we're bothering with the format version because we'll not be expanding macros?
But I have to ask - who is this XML for? If you're just wanting an XML dump for disaster recovery/legacy reasons, then what has been proposed so far is fine. But even if we went this way, we would have to waste a lot of time documenting the concepts behind the elements and attributes in the DTD - or are we just going to point integrators to Foswiki::Meta.pm? I suppose that doesn't matter, if the XML is only going to be handled amongst the Foswiki user/developer community...
But if we're hoping to keep Foswiki relevant as an information/knowledge management system for another 10 years, it has to play nice with the semantic web, linked data world.
Things like wolfram alpha are emerging which leverage this stuff. In my case there are a slew of biodiversity data aggregation services (mostly in development) out there which are not going to bother trying to understand an obscure XML schema, with or without comprehensive documentation.
I am confident we could mix dc:, foaf:, skos: etc. namespace elements into our XML at very little (perl code dev) cost and then fall back to foswiki: namespace elements for concepts that don't map well in to other standards.
The beauty of this? Aggregation/indexing/discovery services probably wouldn't care a whole lot about the things we have to fall back to a foswikI:namespace anyway. They can ignore the elements they don't understand, and take note of the parts they do: author, title, dates, revisions, etc.
The benefits would be huge IMHO.
I am hoping that these mappings wouldn't impose any additional burden on development over what you've already proposed, if my colleague can come up with the mappings for you.
--
PaulHarvey - 05 Dec 2009
Well, my reasons for starting this are quite pedestrian. I'm working on transforming Foswiki SEARCH queries to XPath, and I need a vehicle for testing. Since all the XPath engines I can find require an underlying XML DOM, it's easiest to generate XML for them to work on. I'm targeting a schema that is close to the %META schema to keep it simple. I was
not anticipating using this schema with a web service; I'm not against it, I just hadn't thought about it.
With regard to standards, correct me if I'm naive, but surely there are XSLT engines out there that can be used to transform to/from any standard you choose?
--
CrawfordCurrie - 05 Dec 2009
Okay, I see more clearly now. Is the XML you're generating entirely for back-end purposes? I was thinking from the point of view of what a topic/web would look like through a
viewxml
script, or
?content-type=application/rdf+xml
. If it could be trivial (plugin perhaps) to put an XSLT engine in between your "native" XML and the public-facing XML, that would make me happy...
--
PaulHarvey - 05 Dec 2009
Nothing is ever trivial. but I don't think it would be hard to interpose an XSLT stage. The only problem would be deriving a DTD that covered meta registered by extensions; but since we are (currently) restricting these to unstructured elements, that isn't a problem.
More tricky is the idea of using this to drive a TOM definition (breaking down and extracting structure from topic content). That would inevitably require structured elements in the DTD. But I'm not
too scared by that, because nothing proposed so far has drifted too far from the basic DTD I proposed above.
--
CrawfordCurrie - 05 Dec 2009