Feature Proposal: Query across all topic revisions
Motivation
We really want to be able to report some stuff based on the evolution of a topic. Who are all the authors of a topic? Did a formfield ever contain a particular value, and if so when/what rev, whose change was it? What
were the values of these fields at a particular time?
Description and Documentation
New query syntax to support such queries. A topic and all of its revisions become an array. Here's the text from
QuerySearch explaining it:
Working with arrays
A number of fields in a topic are
arrays, meaning that they refer to a list of values rather than a single value. Arrays are accessed using the square brackets ([]) and dot (.) operators.
The square brackets operator is actually a bit more clever than a simple query or array index. You can use the
comma ,
operator inside the brackets to select a set of matching elements. For example,
versions[1,5]
will select version 1 and version 5.
You can even
mix integer indices and conditions. For example, let's say we want to select all versions that are authored by 'RoaldDahl' and also select version 9.
versions[info.author='RoaldDahl', 9]
Note that if 'RoaldDahl' authored version 9, then you will get that version
twice in the result.
The index of the
first item in an array is
[1]
, the second is at
[2]
etc. An index of
[0]
is used to refer to the
last item in the array,
[-1]
refers to the
second last, etc.
When the dot operator is applied to an array, it is applied to each element of the array, and the result is a new array containing the results of each application.
Working with versions
The
versions
field is mainly used with
%QUERY
to provide a powerful view into the history of a topic. It is an
array of objects, one for each revision of the topic (including the most recent), where the oldest revision is in position
[1]
in the array, the second oldest in
[2]
etc. You can use the
versions
field to select old versions based on a query. For example,
versions[info.author='RoaldDahl']
will return a list of all versions authored by RoaldDahl. You can access field
values in the topics
as they were at the time - for example,
versions[info.author='RoaldDahl'].Published
will return an array of the values of the Published field in all versions authored by RoaldDahl.
The versions array is indexed by the revision number. Index
[0]
which always
refers to the most recent version,
[-1]
to the next most recent etc.
Gotchas
- Remember that in the query language, topic names are constants. You cannot write
Main.UserTopic/UserForm.firstName
because Main.UserTopic
will be interpreted as a form field name. If you want to refer to topics by name you must enclose the topic name in quotes i.e. 'Main.UserTopic'/UserForm.firstName
- You cannot use the result of a sub-query as an index into the
versions
array. For example, versions[UseThisVersion]
where the field UseThisVersion
is set to '2' will not work. Any index that requires reference back to the database (such as a field value) will be evaluated as a condition and not as an index. To select a version based on a field value, use versions[info.version=UseThisVersion]
Examples
revs NOT authored by PaulHarvey
%QUERY{"versions[info.author!='PaulHarvey']/info.version"}%
Result: version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version,version
authors of revs matching 'CurrentState'='CommunityVote'
%QUERY{"versions[CurrentState='CommunityVote']/info.author"}%
Result: author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author,author
revs created between the 5th and the 7th December 2010
%QUERY{"versions[info.date < d2n('2010-12-08') AND info.date > d2n('2010-12-05')]/info.version"}%
Result: version,version,version,version,version,version,version,version
What about..
Challenges
Impact
Performance. RCS store may be practical when using
VarQUERY on a single topic, but will probably drown a server scanning any non-trivial sized web with
VarSEARCH.
Implementation
CrawfordCurrie has implemented on trunk in
Tasks.Item10129
IN ADDITION it may be desirable to enhance
Foswiki::Meta
API with a more convenient way to iterate over the revisions of a given
$topicObject
(let's assume that every revision will be its own stand-alone, self-contained
$topicObj
).
SvenDowideit said on IRC:
Meta->getRevIterator() => RevIterator where revIterator->next returns a new cacheable Meta, and if rev!=head, perhaps getWriteable fails
forcing that even more unusnal situation to be written by copying the contenst of the old rev into a head writable
Discussion
Exploring what you suggest above, I agree you need some way to statically express a specific revision of a topic. Right now we do that in a URL with
Web/Topic?rev=2
. Let's assume what you suggest, that this syntax isn't part of the query language, it's instead part of a web.topic reference string, so
Web.Topic
refers always to the latest rev of a topic, but
Web.Topic?rev=2
refers to that specific rev of the topic. Then, you need some way to get all the revs of a topic. So, let's say the
revs
array postulated above contains a list of fully-qualified revision references, e.g.
['Web/Topic?rev=2', 'Web/Topic?rev=1']
You could then use the REF operator:
-
'Web.Topic'/revs[author!='GungaDin'].version
to get a list of all versions that were not authored by GungaDin.
-
revs/info.author
to get an array of all the authors of all revs
An extension to this proposal would be to treat the
?rev=2
as part of the topic reference syntax, so writing
QueryAcrossTopicRevisions?rev=1 anywhere, even outside squabs, would give a link to that specific rev of the topic.
--
CrawfordCurrie - 15 Sep 2010
Cool. I have mixed feelings about formalising
Web.Topic?rev=n
. I think it would be nice to come up with something that doesn't break simple string concatenation when you try to build a URL out of these topic references (ie. avoid too many '?'). Or is that being too geeky? I don't know.
Let's review rev access syntax:
... syntax comparison table moved to
LoadDifferentTopicVersions
--
PaulHarvey - 16 Sep 2010
and please don't forget that addressing a specific
rev is the least useful interface into the object history - more useful are
before 'date'
,
first after 'date'
and
between 'date' and 'date'
, followed by
first
,
last
and
previous
.
we should possibly consider
rev
as a second rate alias, existent only due to some implementation detail.
--
SvenDowideit - 17 Sep 2010
... more discussion moved to
LoadDifferentTopicVersions
--
PaulHarvey - 09 Dec 2010
A few thoughts.
First, in the context of something like
ApprovalPlugin surely we need the concept of [REV=APPROVED] and for this to be the default display (possibly
PlugIn managed). After all this is the approved version, of course in a full implementation you need to be offered the choice of seeing the latest version, possibly with the option to make further changes. The crucial point is that a store(s) will need to support this.
Second, we should not get hung up on a maintaining something like RCS, if a store provides (or even must provide) revisions then it can simply store a complete copy for each revision (actually with META data broken out into separate fields that will only be a single field copied per revision, it's only text that could be large).
The text field could even be compressed (does any RDBMS support this?) which should increase performance if done right.
With multiple
StoreEngines each of which can be configured may allow an Admin to make choices web by web: e.g. with/without fast revision searches.
--
JulianLevens - 27 Oct 2010
Hi Julian,
I'm hoping that the query language would support the ability to "get the latest rev which satisfies some criteria", something like
%QUERY{
"latest(
revisions[name='Web.SomeTopic' AND META:APPROVAL.state='APPROVED']
)"
}%
This example above
depends on us coming up with a topic path syntax which not only holds the web and topic, but a fully qualified identifier that has a revision component in it. So the above query might return
Web.Topic--23
- revision 23 of Web.Topic. You should then be able to use this fully qualified topic identifier in subsequent queries, Eg. get the approver:
%QUERY{
"'Web.Topic--23'/author"
}%
I also don't think we're getting hung up on RCS

Just that it doesn't make sense to create a new query capability that can't be exploited with an out-of-the-box Foswiki (or a differently configured one anyway). OTOH, if this stuff becomes a capability as a result of
QueryCustomCollections - then we could leave it up to a store plugin (that is NOT shipped with default Foswiki) to provide a 'revisions' collection that can be queried in this way.
Topics (and their revisions) constitute pretty simple data in terms of storage problems today. I'm not sure the overhead of compressing/decompressing is really worth it, but I guess that's up to each store implementation.
I have added a note regarding the
Foswiki::Meta
API: there might be some work required there, so that plugins may more conveniently iterate over the versions of a given
$topicObject
.
--
PaulHarvey - 14 Nov 2010
Thinking about Paul's -- syntax practically, I don't favour mixing up dates and revision numbers
(see LoadDifferentTopicVersions for the original discussion) -- Main.Paul Harvey - 09 Dec 2010
The REF (/) operator is current a means of referring to another topic. Let's say we extend the concept to refer to another rev
of the current topic. To illustrate:
-
'Web.Topic'/field
- refers to the value of 'field' in the latest revision of Web.Topic
-
'Web.Topic'/(info.createdate < d2n('1 aug 2010'))
- refers to the latest revision of Web.Topic iff it was created prior to 1/8/10
-
'Web.Topic--1'/field
- refers to the value of field in revision 1 of Web.Topic
-
'Web.Topic--*'/field
- refers to the value of 'field' in all revisions of Web.Topic
-
'Web.Topic--1:9'/field
- refers to the value of 'field' in revisions 1 through 9 of Web.Topic
-
'Web.Topic--*'/(info.createdate < d2n('1 aug 2010'))
- refers to an array of all revisions of Web.Topic prior to 1/8/10
and
-
field
- refers to the value of 'field' in the latest revision of the queried topic
-
info.createdate < d2n('1 aug 2010')
- refers to the latest revision of the queried topic iff it was created prior to 1/8/10
-
'--1'/field
- refers to the value of field in revision 1 of the queried topic
-
'--*'/field
- refers to the value of 'field' in all revisions of Web.Topic
-
'--1:9'/field
- refers to the value of 'field' in revisions 1 through 9 of Web.Topic
-
'--*'/(info.createdate < d2n('1 aug 2010'))
- refers to an array of all revisions of the queried topic prior to 1/8/10
--
CrawfordCurrie - 21 Nov 2010
Implementation thoughts. If a "topic" (the LHS of a REF) is an uncommitted (i.e. unloaded) meta object, until a
getField
is actually performed on it, then we can have the concept of -- as a "version resolution operator". Thus you could have a topic name in a field of a topic, then use the -- operator to focus on a specific (set of) revision(s) of that topic. -- would be a binary/ternary operator to allow for a range of revisions.
Actually, the more I think about it, the more I think it needs to be an operator (and not delegated to the meta-loader inside ''). Otherwise you will not be able to use computed ranges e.g. those where version numbers exist in dataform fields.
--
CrawfordCurrie - 01 Dec 2010
The more i think about this, the more I think it needs to be an operator. More, I have already implemented the comma operator for creating lists (required for
IN
) and having worked on arithmetic -, can't make -- work as an operator (easily confused with binary minus unary minus). So, propose the following: three new operators, unary
@
, binary
@
, and nonary
*
-
'Web.Topic'@1/field
- refers to the value of field in revision 1 of Web.Topic
-
'Web.Topic'@*/field
- refers to the value of 'field' in all revisions of Web.Topic
-
'Web.Topic'@(1,2,3,4,5,6,7,8,9)/field
- refers to the value of 'field' in revisions 1 through 9 of Web.Topic
-
'Web.Topic'@*/(info.createdate < d2n('1 aug 2010'))
- refers to an array of all revisions of Web.Topic prior to 1/8/10
Unary @ is defined as an operation on the current topic, so
-
@1/field
- refers to the value of field in revision 1 of the queried topic
-
@*/field
- refers to the value of 'field' in all revisions of Web.Topic
-
@(1,2,3,4,5,6,7,8,9)'/field
- refers to the value of 'field' in revisions 1 through 9 of Web.Topic
-
@*/(info.createdate < d2n('1 aug 2010'))
- refers to an array of all revisions of the queried topic prior to 1/8/10
Note that the explicit expansion of what was a range operator (:) is because version names are not necessarily sequential.
This lets me select a revision based on the value of a field (i.e.
'Web.Topic'@VersionField
) but still isn't quite right. The * operator is really hard to implement. And it would still be nice to be able to express ranges (and also parse lists of ranges out of field values). Possibly treat @ as a kind of "grep" operator i.e. "select all revisions that match the condition on the RHS" so
'Web.Topic'@(n IN (1,2,3))
or
'Web.Topic'@(n=1)
--
CrawfordCurrie - 04 Dec 2010
Hi Crawford, I really appreciate your input on this. As mentioned earlier, trac uses @ for version-specific page names also. I initially discounted it because I thought it might be more likely for users to have in their topic names already than '--'.
I take it this means you want QUERY to emit hits as 'Web.Topic@N' where the query matches specific versions of topics? What are your thoughts with '@' as a part of a fully qualified topic name+rev?
Also, I take it that this means
QueryAcrossTopicRevisions won't need
QueryCustomCollections. Earlier you said "... not delegated to the meta-loader inside '' ... Otherwise you will not be able to use computed ranges e.g. those where version numbers exist in dataform fields." - somehow I thought we could still do
%QUERY{"'Web.Topic'@%QUERY{"'Web.Topic'/VersionField"}%"}%
although this approach requires several requests to the db
--
PaulHarvey - 04 Dec 2010
You could indeed use nested queries, but as you observe it gets very clumsy very quickly.
There are (I think) 3 cases where you want to specify topic revisions. The first is in a URL. Well, the URL syntax is dictated by standards, and we really don't want to mess with encodings, so ?rev= is likely to remain for the foreseeable in this role. The second is in queries, and for this application I'm advocating the use of @ as an operator, as described above. The third is in in-topic used, such as
[[Web.Topic@42]]
. I think it's OK to allow this difference between the syntax employed in queries and that employed in squabs.
I'm going to have a stab at an implementation today, just to see how I get on.
--
CrawfordCurrie - 05 Dec 2010
With a range operator I would expect to be able to enter ranges, like
-
@(1..9)'/field
- refers to the value of 'field' in revisions 1 through 9
-
@(..9)'/field
- refers to the value of 'field' in revisions 1 through 9
-
@(1..)'/field
- refers to the value of 'field' in revisions 1 through the latest revision
--
ArthurClemens - 05 Dec 2010
Agreed; but what does "range" mean in the context of revisions? Revision numbering is sequential in the RCS store, but that's
not a requirement, and it won't be in (for example) git or svn stores. So a range operator has to enumerate all the available revisions of a topic, and match those that fall into the range.
A similar problem applies to the other idea I had, that of using negative indices to get versions relative to latest; e.g.
@(-1)
to get the revision before the current one. Add that to ranges and you have @(-6..0) to get the last 6 revisions - or does it? Doesn't it get the revisions that fall in the range from the current revision number minus 6, to the current revision?
Another problem I haven't solved is how to get "all revisions". '*' as a nonary operator doesn't cut it. Neither does a special '@*' operator (the parser has no concept of reverse-polish operators). I considered another reserved word, 'all', so @all would give you all revisions. But then that's yet another reserved word
--
CrawfordCurrie - 05 Dec 2010
For me,
@()
would make sense for "all". Because the ellipsis specify a range:
@(specify_range)
and if not passed, no range is specified, returning all.
But true, I haven't considered
Git revisions. If we use Git revisions we need to use Git revision syntax.
--
ArthurClemens - 05 Dec 2010
With RCS, topics have a linear history. With another type of store, non-linear history might be possible, but I suspect that linear history will remain the most common because it is easy for users to get their heads around. In the context of a linear history, a "range" does make sense.
I would like to suggest that the query syntax for dealing with revisions (and ranges of revisions) should be store-agnostic. It should not matter if I use RCS, GIT, MongoDB,
DBIStoreContrib or LittleGnomeWithPaperFoldersStoreContrib - the revisions syntax should remain the same. This will enable us to develop wiki applications and be able to use them on other foswikis with different stores.
The reality is that different stores use different revisions syntax. In URLs, we could consider using something like
rev=2
to refer to the second revision in a linear history, and
revid=76c5d170a9ae168f86c8cd98f74ee92773dbc914
for referencing an arbitrary revision using the store's own syntax. When querying across topic revisions, I suggest using RCS-style revision numbers (i.e. like
rev
, unlike
revid
), regardless of the underlying store.
--
MichaelTempest - 06 Dec 2010
I have a (non-range) implementation of the
@
operator ready to go. BTW @() loads
all revs of the topic. Path attached to
Tasks.Item10129
--
CrawfordCurrie - 06 Dec 2010
Requesting vote, so I can get this checked in. if you are against, please say why.
--
CrawfordCurrie - 07 Dec 2010
Formally the proposal has to be committed for 14 days to be accepted.
And this we need to stick to. But if I were you I would checkin the code now. All the usual "suspects" that raises concern have been part of the debate and there seem to be no major concern. So the risk that you later have to revert because the proposal is voted down is very slim.
The vote for a quick informal "temperature check" is OK but the declaration of acceptance will be in 14 days
--
KennethLavrsen - 07 Dec 2010
I have mixed feelings about this. I like the idea, but.... there are two problems.
(1) I am concerned that the syntax for
revision is not defined for
@revision
.
This is the first time (as far as I am aware) that topic revisions become part of query syntax. As defined here (i.e. not defined),
revision could be interpreted to mean "whatever the store wants it to mean". Any wiki apps developed for RCS store (e.g. mini-apps like utilities that ship with Foswiki itself) could break horribly on another store. How would you query the first revision of a topic on a store that uses GIT or
SVN?
The problem is that this is unspecified, and I suspect we are all making our own assumptions about it. I also suspect it will be a can of worms. I'd like to string those worms up where we can see them
(2) The syntax for all revisions looks like a possible syntax for no revisions. Suppose that the specific-revision-syntax problem is resolved. Suppose that I use a nested query (bearing in mind that the inner query need not use QUERY, it might use something that MySecondPlugin provides) to first get a list of revisions, and then do a QUERY based on those revisions. For example:
%QUERY{"SomeTopic@(%FOO{...}%)" ... }%
. If FOO returns an empty list, then QUERY will see
@()
, which means
all revisions, where FOO intended
no revisions.
--
MichaelTempest - 08 Dec 2010
(1) Sloppy of me not to state here that a revision is identified by a positive non-zero integer. I thought this was stated elsewhere, but it's only in code so I'm adding it to the doc.
(2) Indeed. I chose the semantics of
@()
based on the requirement to return all revisions. The original concept, the
@*
operator, was not workable due to the limitations of the parser (no postfix operators). If you can propose an alternative syntax, we can discuss.
--
CrawfordCurrie - 08 Dec 2010
This was discussed at length on
IRC. To summarise:
- CDot pointed out that in order to implement
@
, he had to formalise the concept of "lists" in the query language, which has had knock-on effects on other operators. That needs further discussion.
- The query language does not support lists of lists:
((1,2),3) == (1,2,3) == (1,(2,3))
- There was lively discussion about the nature of revision numbers, which did not reach a clear consensus. (This probably needs further discussion.)
- If we plan to use operators to refer to revisions, then we can defer specification of the syntax for specific revisions. In the meantime, we could specify that these operators return "nonsense that is understood by @()" My discussion about
Web.Topic--123
forms is my attempt to resolve this. -- PaulHarvey - 08 Dec 2010
- It should be possible to specify revision number(s) by referring to a field in (another) topic.
- Several operators were proposed - which should be discussed in (separate?) feature proposals:
- There is definitely a need for a
revat()
operator which gives the revision at a specific date (and time). Lavr described how this operator would be useful in other contexts than @()
.
- There is a need to be able to refer to specific revisions, such as the first revision and the current revision.
- MichaelDaum proposed JQuery-style
:first
, :n-th
, :last
etc similar to jquery's selectors. This notation was well-received.
- There is a need to be able to specify an empty list of revisions.
- One problem with using
@()
to mean "all revisions" is that it becomes hard to return an empty list of revisions from a macro or field. A blank field or macro result would somewhat-counter-intuitively mean "all revisions", and there isn't an easy way to say "%FOO%, or ':none' if %FOO% is blank" in the query language.
- CDot noted that
X/Y
is probably synonymous with X@(:latest)[Y]
, but said "not sure about the precedence rules there. That needs to be checked."
Crawford - you mentioned that
()
is a list-with-no-members. Would
@( () )
be viable syntax for an empty list of revisions?
Otherwise, I suggest that
@()
should be the empty list of revisions, and that we should use something like
@(:all)
to refer to all revisions. I think that will be less surprising than using
@()
to mean all revisions.
--
MichaelTempest - 08 Dec 2010
What about this...
All topics not created by
PaulHarvey:
Could not perform search. Error was: Syntax error in '@versions(:first)[info.author!='PaulHarvey']' at '@versions(:first)[info.author!='PaulHarvey']'
All topics which ever had
CurrentState='CommunityVote'
:
Could not perform search. Error was: Syntax error in '@versions(:all)[CurrentState='CommunityVote']' at '@versions(:all)[CurrentState='CommunityVote']'
All topics which had
CurrentState='CommunityVote'
in December: (ISO 8601 range)
Could not perform search. Error was: Syntax error in '@date(2010-12-01/2010-12-31)[CurrentState='CommunityVote']' at '@date(2010-12-01/2010-12-31)[CurrentState='CommunityVote']'
See also: more about topic+rev identifiers/syntax/linking/etc at
LoadDifferentTopicVersions
--
PaulHarvey - 08 Dec 2010
We've got a bit of a functionality split here; the @ operator in the Query, and the Content Access Syntax for getting at versions through URLs and in-topic specifiers. I'm committed to the first but can't commit to the second, so I cloned this last comment to
LoadDifferentTopicVersions.
Let's keep this topic focused on %QUERY.
--
CrawfordCurrie - 09 Dec 2010
Crawford pointed out on IRC that [if
FreezeRevisionNumberingScheme is accepted] then
:first
is not needed because all revisions start at 1.
--
MichaelTempest - 09 Dec 2010
Having fiddled with it some, writing queries, I'm coming round to the view that the
versions
syntax is actually more natural than the @ operator.
Consider some examples:
authors of revs matching 'CurrentState'='CommunityVote'
%QUERY{"@()[CurrentState='CommunityVote'].info.author"}% OR
%QUERY{"versions[CurrentState='CommunityVote'].info.author"}%
%QUERY{"fieldname@[info.date < d2n('2010-12-08') AND info.date > d2n('2010-12-05')].info.version"}% OR
%QUERY{"fieldname/versions[info.date < d2n('2010-12-08') AND info.date > d2n('2010-12-05')].info.version"}%
%QUERY{"versions[1].info.author"}% - who created the topic
%QUERY{"versions[1,3,5].info.author"}% - ditto for the first three revs
%QUERY{"versions.info.author"}% - array of authors of all revisions
It's a (little bit) more work for the code, but it would mean we don't need to add the
@
operator, and it just
feels right.
Note that this doesn't affect the need to have version "identifiers" such as
:last
--
CrawfordCurrie - 09 Dec 2010
I think you're right. Does this mean we have to reset the clock again?
--
PaulHarvey - 09 Dec 2010
I was not planning to interfere in your discussions, though I have been following them closely, because they are a bit too technical, in particular regarding the parser's constraints, which I am not familiar with at all.
As a user,
@()
does not mean nothing to me. As a programmer it could mean too many things depending on your programming background, and I don't like implicit parameters either, that makes the syntax more error prone. Using
versions
is better for a human.
About "version". In my current installation of Foswiki
System.Macros#QUERY_45_45_get_the_value_of_meta_45data
is
already documenting that %QUERY% can take a parameter
rev="version"
that makes the query "operate on the given version of the current topic.". Is there a reason why we do not extend that? Will this be deprecated?
As you know
rev
is also already used in URL construction.
Also we already use in URLs
?rev1=3;rev2=2
when we display differences between two revisions. For consistency sake, wouldn't that be reasonable to consider
rev1
and
rev2
instead of creating yet another way to designate a range of revisions?
The value of
rev
could be an actual revision number (also "first" and "last"), or a date and
time when you use
revat
(plus
revat1
and
revat2
for ranges)
For me, it does take a tremendous effort to find out what
%QUERY{"fieldname/versions[info.date < d2n('2010-12-08') AND info.date > d2n('2010-12-05')].info.version"}%
actually
looks for and returns. There are slashes, parenthesis, square brackets, dots, ' and ". That is a lot. I'm not quite sure I understand it.
I feel that is a bit far from what
System.Macros says "Macros are text strings - %MACRONAME% or %MACRONAME{ parameter="value" }% ".
Wouldn't that be more in the spirit of that statement:
%QUERY{"info.rev" revat1="2010-12-08" revat2="2010-12-05"}%
Also, my feeling is that with such complex syntax, people will only be able to cut and paste snippets and examples blindly, and that before they can actually construct their own queries they will have to dedicate a lot of time and efforts (which ultimately they won't do).
I am not raising a concern for that (so you do whatever you think appropriate) because you are all here more knowledgeable about the parser and syntax constraints, but since we are rethinking the syntax I felt I had to say this.
--
RaulFRodriguez - 09 Dec 2010
I fear that you are probably right. You probably do have to have programming experience to 'get'
QuerySearch syntax: the main thing for me was realising that some parts of the statement are an array of topics, some filter that array, and there's the
TopicObjectModel 'map' for accessing bits of topic (or set of topics). But I personally think we need to pursue the syntax that we've discussed here, partially because this is also the syntax that
%SEARCH{type="query"
will use and I'm not sure it's a good idea to add even more parameters to that thing (but OTOH I could be convinced that it is good idea).
QuerySearch is more of a reference topic than anything else, which is good, but I'd like to spice up the cookbook a bit, work on a new "System.QuerySearch for humans" topic as well, which uses animations like we did in System.Macros. I hope this can mitigate some of the barriers to using
QuerySearch syntax.
Maybe we can make a wrapper macro (you could probably even build a series of them with
EasyMacroPlugin) for those that "just want to do X" but can't stomach
QuerySearch syntax.
--
PaulHarvey - 09 Dec 2010
I understand where Raul is comming from. And yes, the Query syntax in its full flour is a 5 to 7 range on the
NerdoMeter.
But it is also a very powerful feature and very advanced feature. The syntax would probably be different if we had started from scratch now. But it is not a bad syntax and the important thing is that it has a simple syntax for the most used cases. In 99% of my query searches I just look for form field values and then I only have to write the name of the field.
When you take into consideration that the syntax is used also in SEARCH and SEARCH already has quite many parameters, then I would not want to add more parameters to it.
But in future we will probably see some new macros that will target specific queries with a simplified syntax.
But in the context of this proposal, I do not see any way that will get the
NerdoMeter score significantly down.
But as also Paul says - we need to add a lot more to the cookbook. We could use 3 times as many examples in it. There is nothing like good examples that show the most typical use cases.
--
KennethLavrsen - 09 Dec 2010
I was going to say pretty much the same. A cookbook is
absolutely essential especially now the syntax has grown.
The trade-offs you make when designing such a language are between compactness, expressiveness, readibility and familiarity. For anyone already
au fait with CSS selectors, then they will find familiar constructs in %QUERY. Similarly for anyone familiar with XPath, XQuery, and even SQL. Unfortunately, for anyone unfamiliar with any of those, there
is a steep learning curve.
I have started a new feature request,
WriteAQueryCookbook, to try and address this.
--
CrawfordCurrie - 10 Dec 2010
Excellent progress. I guess we have separated the first of
MichaelTempest's concerns into
LoadDifferentTopicVersions, so we just have to fix the fact that
%QUERY{"versions[].info.version"}%
Result:
... currently gives a list of all versions, instead of none. I suspect this is actually standard behaviour on all array "things" not just
versions
, so I'm not quite sure what should happen next.
--
PaulHarvey - 11 Dec 2010
That's quite logical.
versions
is an array of all versions. Remember, '[]' means 'where' in SQL-speak, so
SELECT info.version FROM topics WHERE
is what you wrote. It's consistent with all other usages of [].
Aside: Note that the default evaluation of versions
is inefficient compared to the @
operator, because it always loads all versions, whereas @
only loaded the revisions returned by the RHS. However it is more general, and I think it should be left up to stores to optimise it)
Of more concern (to me) are the semantics of an integer index.
versions
is an array of the N versions of the topic, with the most recent version occupying position 0 in the array, and the oldest occupying position N-1. So if I have a topic with 3 versions, the array will get [3,2,1] and
versions[1].info.version
will
not return 1 as you might expect - it will return 2, because that's what's at index 1 in the array. To get version 1, you have to write
versions[info.version=1]
(or
versions[-1]
). This is totally consistent with all other uses of the [] operator, but is potentially confusing for users.
The reason I like the array populated youngest-first is that
versions[0]
is always the latest revision of the topic. There's also the fact that arrays are 0-based, while revision numbers are 1-based (we
could reverse the array, but then revision 1 would be at index 0, revision N at index N-1).
MichaelTempest, are you still concerned?
Later: after writing a bunch of doc, I think the integer index is explainable. see
http://trunk.foswiki.org/System/QuerySearch and
WriteAQueryCookbook for more.
--
CrawfordCurrie - 12 Dec 2010
While farting about with this, I realised there's a strong case for an operator to uniqify a list. For example, let's say I want to create a list of all versions that were authored by Fred but always want the
latest rev, even if it wasn't authored by Fred. If I write:
versions[0,info.author='Fred']
I will get that, but if Fred authored the latest rev, it will appear in the list
twice.
A
uniq
operator would allow us to remove duplicates from the list:
uniq versions[0,info.author='Fred']
--
CrawfordCurrie - 13 Dec 2010
Both SQL and MongoDB queries use 'distinct' - I guess uniq makes sense for users coming from SSP?
Also: I'd like to use 'uniq' on any old list like
%QUERY{"unique('1,2,3,4')"}%
so would it be better as a function? Or does that make it more difficult to optimize queries?
BTW, we are already using length() and + operators in one of our wiki apps, very cool not having to use SSP
--
PaulHarvey - 13 Dec 2010
unary function or operator, it doesn't make any difference (a function is a unary operator that takes a list as it's only argument).
What's more important is how
uniq
maps to an underlying DB. I had in mind that a DISTINCT would be the optimisation target, but that doesn't quite work. In fact, I'm having serious misgivings about mixing integer indices with query expressions in lists, for the simple reason that mixing them is an implicit JOIN, which makes mapping hard.
I really wish someone would with a bit more SQL-fu would help with this. I'm a bit lost.
--
CrawfordCurrie - 14 Dec 2010
I am not convinced that it is a good idea to use/permit both query expressions and integer indices to "index"
versions[]
.
Suppose I write a wiki app where the query is built up using info collected from a user via a form, which has "by me" and "recent" checkboxes. That could give me a query expression like
versions[info.date > d2n('2010-12-08') AND info.author='MichaelTempest']
. I should be able to replace either or both of those conditionals with
1
if the checkbox is not checked, giving something like this:
versions[1 AND info.author='MichaelTempest']
. Or even like this:
versions[1 AND 1]
.
However, I cannot do that with the simpler expression
versions[info.author='MichaelTempest']
because
versions[1]
means something else. Sure, I could write
versions[1 AND info.author='MichaelTempest']
, which is reduceable to
versions[1 AND 1]
, but I do think this could bite.
We had
@(1, 7)
meaning revisions 1 and 7. One way to express that with
versions
would be
versions[info.version=1 OR info.version=7]
. That is a mouthful. Suppose we had a
revs
operator that takes a list of revision numbers and selects for those revisions. My example would then look like this:
versions[revs(1, 7)]
. We could then define
revs()
to match no revisions, which would then address my concern.
We could also eliminate integer indices by adding an
index
operator, that takes a list of indices as an argument. Then you could say
versions[index(0)]
to refer to the latest revision and it would also open up the door for things like
versions[index(0, 1, 2, -1)]
. The way we define
index
would be important. I have in mind that the arguments to
index
"let through" the specified elements of the
version
array. Thus,
index(n, m)
is identical to
index(n) OR index(m)
. This also eliminates the need for
uniq
.
I make no claim to having any SQL-fu, and I suspect there are good reasons not to do as I have suggested (e.g. this might be really clumsy in practice), but here is the suggestion anyway, in the hope that it gives someone else a better idea
--
MichaelTempest - 14 Dec 2010
We have always supported integer indexes in [] - there is nothing new in that. The only thing I have done is to extend it that so that the comma operator can be used to access a range of items from the array on the LHS. Also this relates to the comma operator, not the
and
operator, so I'm a bit confused by your examples.
In the final analysis the main reason for wanting to support integer indices is to be able to use
[0]
and [-1] to access the first and last items in the list, though other use cases - such as "the 3 most recent revisions" are also present. However I am also uncomfortable with mixing the two index types. But it's kinda unavoidable; maybe we just don't shout about it.
BTW all this is implemented, so if you are in any doubt about how it works, please do build examples on t.f.o to illustrate your points. They will never be wasted, as the working ones can be drawn into
WriteAQueryCookbook
--
CrawfordCurrie - 15 Dec 2010
From reading the docs in QuerySearch and from looking at the code, it appears that there is no way to index an array with a computed expression that involves lookups in topics.
There is definitely value in doing that. We are often asked if it is possible to show only the "approved" version of a topic to WikiGuest, or to people not in some group. I have not ever seen an answer explaining how it could be done, but I think it could be done with view templates and/or tweaking skin templates, something like this (I say "something like" that because
%QUERY{"LastApprovedVersion"}%
would return blank if there is no
LastApprovedVersion
field or if that field is blank, which would result in
all versions of the topic being show, concatenated together. Instead, it would be better to show nothing. That problem could be solved but it would obfuscate my example):
%IF{
"$ WIKINAME='WikiGuest' "
then="%QUERY{"versions[ %QUERY{"LastApprovedVersion"}% ].text"}%"
else="%QUERY{"versions[0].text"}%"
}%
However, if we had an
index
operator (which would only make sense in the context of the
[]
operator), then we could write the
then
clause as:
then="%QUERY{"versions[ index( versions[0].LastApprovedVersion ) ].text"}%"
<!-- Note: We want the text of the version identified by the value of the LastApprovedVersion field in the latest version of the topic -->
which would eliminate the need for nested queries in this particular case as well as giving a more intuitive result if the version number came from a preference setting e.g.
"versions[ index( %APPROVEDVERSION%) ].text"
.
If this idea is feasible e.g. it could be translated to SQL, then I can remove my concern. I am not asking for this
index
operator to be implemented immediately, I would just like to know if it is feasible.
--
MichaelTempest - 16 Dec 2010
This seems focused on querying a particular topic. That needs to be expanded into querying multiple topics simultaneously. Consider the following scenario:
A foswiki is used to maintain the documentation, design documents, and source code for a product that goes through
MANY releases. A customer finds a problem in an old release of the product. The customer would like to check the documentation of the product
AS OF the time of the particular release they are using. More than that, the internal customer support and development teams need to consult the design documents and source code for the particular release of the product in order to assure that they are addressing the problem in the particular release and are not fixing problems in later releases.
In foswiki, this means:
- We need to be able to easily refer to particular versions of a topic.
- We need to be able to refer to a particular version of a topic by a label (or date).
- We need to be able to pass in a label to use in fetching all topics in the foswiki.
- Different users should be able to pass in different labels so they have different views of the foswiki.
--
DavidMasterson - 02 May 2011
I think we need to change
the versions
array in a QUERY is an array of versions indexed most-recent first - as that does not provide a lasting way for an application to query for a specific revision (and I think its hella confusing to pretty much anyone that uses both the revision numbers they see in the UI (that are static) and then tries to query for a specific revision.
for example, if a user stores an versioning ID (one would expect rev, but atm, thats useless) and then in a viewtemplate tries to retreive a foeld value from that specific revision, they have to do something counter intuitive - like
versions[info.version=StoredRev].Fieldvalue
- whereas most will expect to be able to do
versions[StoredRev].Fieldvalue
.
additionally, we need user friendly ways to say first¤t and 'the one before' etc - that don't also make it impossible to insta-convert these queries to non-perl sources - and index of
-1
is very perlish - but imo would be useful for retreving in the style that is currently positive..
--
SvenDowideit - 03 May 2011
- Testing: ParkedProposal,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderConstruction,UnderConstruction,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,CommunityVote,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation,UnderInvestigation
- Testing:
- Testing: ,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,Main.MichaelTempest,,,,
- Testing:
--
PaulHarvey - 03 May 2011
While I don't have a problem with the
index
operator and I agree that labelling topics with a release date would be good practice, as would short-cutting
info.versions
, these are all extensions to the general "select a version based on a query" principle I outlined in the proposal. I have come round to Sven's view that the
versions
array needs to be a 1-based index of revision numbers, with -1 indexing the most recent revision.
Moving this to community vote didn't work, and there is an open concern, so I can't just move this to accepted :-(, but I'd like to get it in 1.2 if possible - even given the performance issues.
MichaelTempest can you please clarify your concern with implementing the proposal as written at the top of this topic?
--
CrawfordCurrie - 21 Feb 2012
It has been a while, so I've had to re-familiarise myself with all this. I can see that I got sidetracked in the discussion - sorry about that. As I understand it:
versions[ ]
is an array, indexable by a list of integers, and the proposal is extending the syntax to permit query expressions instead of a list of array indexes. There is a problem of ambiguity. Is
0
an (invalid) integer index or a query that means "no versions"? Is
1
an integer index that refers to the first revision, or is it a query that means "all versions" (i.e. the same as
versions[ ]
)? My concern now is that the proposal does not address the ambiguity. This might be covered in QuerySearch on t.f.o, but for reasons unknown I cannot log in on t.f.o. I'll have to check that later.
--
MichaelTempest - 21 Feb 2012
I handled it using a special interpretation of a constant index in the
[]
operator. If the content of the
[]
evaluates to a constant (the evaluator does not have to go back to the DB to get a value), it is treated as an index rather than a query. Of course this means you can't use a query - such as a query for a field containing an index - as an index. You can of course use
versions[info.version='4']
.
Any operator that has exclusively constant operands (or more precisely does not have to refer back to the DB for any operand) is also a constant index. This
[1+2]
is a constant index but
[1+info.version]
is not.
I did not cover this in
QuerySearch (I thought it was too nerdy), but can do so if you think it's needed.
--
CrawfordCurrie - 21 Feb 2012
I think it is necessary to cover this in the documentation because it has the potential to surprise. I know there is a tradeoff between wtf's when reading over-nerdy documentation and wtf's when figuring out under-documented stuff. In this case, I think it needs to be documented.
--
MichaelTempest - 21 Feb 2012
Thanks, Crawford - I'm removing my concern.
--
MichaelTempest - 22 Feb 2012
I thought I could deliver this for 1.2, but the discussion today on IRC has disabused me. A lot more thought - and hopefully a few more inputs - are required. Taking off the list for 1.2.
http://irclogs.foswiki.org/bin/irclogger_log/foswiki?date=2012-02-24,Fri&sel=271#l267
Resetting to
UnderInvestigation. If you have anything to say, for the love of god
say it - please don't leave me to implement something you are unhappy with.
--
CrawfordCurrie - 24 Feb 2012
Some thoughts to provoke you, they are incomplete but ...
I am
not suggesting any syntax in the following. I have only thrown together some code to explore the problem. The solution still eludes me let alone the syntax.
I started out by considering something like:
- All('Main.WebHome'/author = 'Fred') would be true if every single version has 'Fred' as the same author
- Any('Main.WebHome'/author = 'Fred') would be true if only one version has 'Fred' as the same author
Now the true/false is necessary to decide whether 'Main.WebHome' should be included in the result set. However, there is nothing in the above syntax to state that I only want the current 'Main.WebHome' or every version (or matching version) to be included in the result set. After all, if creating a table out of the result set you may wish every $topic to include the $rev and buld one row per $topic$rev. Or you may wish to create one row per topic with
optionally some of the cells populated with some of the revision details.
In attempting to come up with something generic I ended up with:
versions[when -1 AND -2 OR any(-3..-5) choose -1, -2;
when -6 choose -6]
Why you would need something as complex I am not sure, but I cannot escape the idea that all(), any() etc are effectively point solutions, although they may still be useful as syntactic sugar (once you work out how extend the syntax to define the content returned by these queries).
Alternative options are groupby(version), where multiple hits on the same topic can be flattened into one, However, how does that work neatly in conjunction with versions[] in the query? In addition, how do we express the format syntax to repeat the output (by $topic$rev) should we wish to do so.
And to complicate things even more, if the query returns multiple versions in the result set, do you need the content access during formatting to relate to the actual set of returned versions rather than the set in the main FW database. For example the DB contains r1..r10 and the query returns r3, r5, r8 in the result set. During formatting you do not know the version numbers that will be in the resultset. Therefore, you need to refer to them using something like [-1, -2, -3] except that refers to r8, r5 and r3 from the original database respectively.
What about being able to say 'Main.WebHome'[info.author[-1] = info.author[1]], i.e Current author is also the original author. Or even more powerfully, 'Main.WebHome'[info.author[*] = info.author[*-1]], would be true for any consecutive topics with the same author.
I saw this line in the description above:
- Note that if 'RoaldDahl' authored version 9, then you will get that version twice in the result.
Will that happen naturally from an SQL store? I would have thought you would get that database row only once.
Another possibility is to consider a query to run against all topic revisions not against all topics. i.e. the following 'topics' are examined one by one against selection criteria and either match or not. If they match then they are included in the result set or are discarded if not.
- Main.WebHome@1
- Main.WebHome@2
- Main.WebHome@3
- Main.WebHome@4
- Main.WebHome@5 (also -1 and current)
Of course there would need to be a default version[-1]. This would then necessarily need groupby and having clauses (or something better) but may well result in a much cleaner interface, as the responsibilities are better defined. The downside might be slightly less power, but only for extreme cases.
I know I need to think about this a lot more.
HTH
--
JulianLevens - 12 Mar 2012
Multiple hits on the same version ought to be flattened into one hit. My prior comment was reflection the implementation I had at that time.
The problem here really is just one of syntax. The strategy I had take was to say "topic.versions" i.e. each topic has a list of versions associated with it, each of which is a pointer to a very similar object, but populated with the data relative to that version. The cons were proposing instead to express the versions of an object using an operator. This lead on to questions such as "how do I express a search over all versions
of all topics ".
It's a fine point, but an important one. I decided to back down from the implementation I had done because my head had started to explode.
--
CrawfordCurrie - 12 Mar 2012
A fine point I have not yet grokked. Why is "topic.versions" not a case of topic with the ".versions" operator?
I may well be going beyond not merely syntax but semantically what we want to be possible. When this topic came up earlier I did not feel qualified (or had enough time) to analyse/comment. I'm not sure I'm qualified even now. Of course my somewhat dumb outsider view may help to re-conceive this (or not).
You say:
Multiple hits on the same version ought to be flattened into one hit
Why ought? Now while I actually like the fact we gain some simplicity by flattening, why should this be the only option imposed on the user?
Primarily, while on one hand I cannot imagine many use-cases for complex version searches. I also see that the criteria used to match a topic (with optional version(s) criteria) is orthogonal to the selection of the specific revision(s) to see in the result set. (Let's not concern ourselves yet with any correlations between the two).
- Flattening matching is a way to define these rules, while pushing selection of version(s) in the result set to the formatting phase.
- Allowing Topic@1 Topic@2 etc during the matching process complicates the matching process (albeit it's still well defined), while pushing a flattening option (groupby or somesuch) to the formatting phase.
I have no strong sense that either of the above is the better choice (and there may be other choices).
Why was flattening during matching felt to be superior?
I just have that gut feeling that we may end up painting ourselves into a corner. That feeling may in part be driven by the fact that I don't quite get it yet, but there again ...
--
JulianLevens - 12 Mar 2012
The trick is to try to think of the kind of queries people are likely to make, and organise to make their queries as easy as possible. For example,
- "Show me all versions of all topics which I authored"
- "Search all topics as they were at time T"
Then create a syntax which supports ll those types of query while still being consistent with the query syntax we already use.
--
CrawfordCurrie - 13 Mar 2012
I have a use case for you but first ...
If we try to think of the kind of queries people are
likely to make and cater for those. We are in danger of creating some point solutions that do not cover other possibilities.
And now that use case:
Imagine an app using data-forms that tracks a process. As part of the tracking one of the fields contains a state, something like: New, Being-Worked-On, For-Approval, Rework, ...
Now Rework should never happen and you want a topic to show any process that went awry — to spot any patterns for process improvement. That is show any process-topic that ever had (in its history) some Rework. Furthermore, the required table to display should also show the current version to check it's really OK now. There should be one table row per revision sorted with the most recent first. i.e.
While I am not sure that this will be a common requirement, it's seems quite a reasonable one to me. I also see no reason why other more involved version queries
and selections should not be possible.
How does the proposed method allow us to create this sort of result?
The more I think about this, it seems clearer to me that we need something to provide distinct revision matching and revision selection (into the result set). Getting this right will cost more time now, but save much grief later with a solution that proves too narrow.
This topic is QueryAcrossTopicRevisions. I believe there needs to be (at least logically) a SelectAcrossTopicRevisions and they need to work neatly together.
I appreciate that this is an involved and as you say head exploding topic, and I hope this is not a wafer thin mint too far
In my experience when my head is about to explode with ideas, it's often an indication that I haven't thought everything through yet.
--
JulianLevens - 14 Mar 2012
I too would really love to be able to get a resultset iterator which returned
Web.Topic@123
hits rather than just
Web.Topic
; and whilst in itself this doesn't seem (to me) to be hugely difficult, the impact on the rest of core is quite high - all the way up to thinking very hard about how to make SEARCH do something useful with such a thing (Eg.
multiple="revisions"
?
revisionseparator/header/footer
? Etc.).
So, rather than attempt to solve all that up-front, let's try to stick to a full spec on the
query side of things, while at the same time trying to anticipate all the needs of the
selection concerns. I think a separate SelectAcrossTopicRevisions proposal would be a great start; example use-cases are excellent for seeding discussion.
All that said, we can (albeit inefficiently) get all the revisions using existing TML & proposed
QueryAcrossTopicRevisions syntax, Eg.
%SEARCH{
"versions[State='Rework']"
type="query"
format=" * [[$web.$topic]] had these revisions in =State='Rework'=:
$percntINCLUDE{\"%TOPIC%\" section=\"display\" topic=\"$web.$topic\"}$percnt"
}%
<verbatim class="foswikiHidden">%STARTSECTION{"display"}%%FORMAT{
"%QUERY{'%topic%'/versions[State='Rework'].address}%"
format=" * [[$web.$topic@$rev]] - $formfield(ReworkNotes)"
}%%ENDSECTION{"display"}%</verbatim>
Assumptions:
- There's a topic attribute called 'address' (in addition to
web
and name
) which would be the full Web.Topic@123
tuple (I'm not proposing this as a great idea, although we are doing that in MongoDBPlugin)
- FORMAT understands
Web.Topic@123
address notation
--
PaulHarvey - 14 Mar 2012