Feature Proposal: ENCODE newlines in a string to BR

Motivation

A commonly encountered problem is the case where you have a search over formfields, and you want to present the results in a TML table, but one of the form field values has a newline in it. This requires transliteration of the newline to an HTML break. A similar transliteration is required on the | to prevent it "breaking" TML tables it is embedded in.

There are a number of other cases where transliteration of tokens is used. These cases are currently addressable using SSP %CALC, but this solution has a number of problems itself, because
  • %CALC is not a "standard" macro, and behaves differently (for example, it can't be used to process the results of a FORMQUERY or DBQUERY because of the eval order)
  • %CALC doesn't support newlines in strings
(an example use case is taken from a tracker app I have, where a ClearQuest query returns a datum that contains a bug identifier string. This string contains an short substring that must be replaced to make a topic name for autolinking. This could not be done with %CALC, and required a plugin 'fix')

I'm not aware of any clean solution to this, Partial solutions exist:
  • the %SEARCH newline parameter is barely adequate, but is clunky and inflexible - you can't control which search hit to format this way, nor can you use it to substitute a range of characters. Also this parameter is not supported by other macros, like DBQUERY. Rather that post-fixing those macros, it is cleaner to have a functional solution to this encoding.
  • SpreadSh*tPlugin $SUBSTITUTE should be able to do this, but is too stupid to replace a newline (and other problems).
  • FilterPlugin had so much promise, but just doesn't deliver on this simple requirement:-(

ENCODE is the macro that I'd expect to support this sort of encoding, but the closest it gets is type="html" which unfortunately encodes too much:

Example <b>showing</b> why it <i>fails</i>

There might be other solutions from plugins that I haven't thought of - I'd be interested to hear.

Description and Documentation

The proposal is to extend ENCODE with two parameters, old and new. Where an ENCODE macro is seen that does not have a type parameter, but does have old and new parameters, it replaces all the characters listed in old with the corresponding character in new.

For example, %ENCODE{"value" old="X,$n,$comma" new="Y,$ltbr$gt,;"}% will replace all occurrences of X with Y, the newline character ($n) with <br>, and the comma ($comma) with a semicolon.

New parameters:
  • old - comma-separated list of characters to replace. The standard formatting tokens (e.g. $n, $percent, $amp) may be used in this list, plus the additional formatting token $comma to represent a comma (,)
  • new - comma-separated list of replacements. The elements in this list match 1:1 with the elements in the old list. Again, the standard formatting tokens (plus $comma) may be used.

Note that all the other encodings (html, url, entity etc) can be seen as special cases of this more general encoding support.

I'd be keen to get this simple change into 1.1, as it addresses a long-standing requirement, and is highly compatible with other macros such as FOREACH and QUERY.

The rewritten VarENCODE would be as follows (ALERT! the examples will show the wrong expansions, because the implementation isn't checked in)

ENCODE{"string"} -- encodes a string

  • Encode character sequences in "string", by mapping characters (or sequences of characters) to an alternative character (or sequence of characters). This macro can be used to encode strings for use in URLs, to encode to HTML entities, to protect quotes, and for as many other uses as you can imagine.
  • Syntax: %ENCODE{"string"}%
  • Parameters:
    Parameter Description Default
    "string" String to encode "" (empty string)
    type="encodingname" Use a predefined encoding (see below). May not be used if old or new are given.
    old="tokenlist" Comma-separated list of tokens to replace. Tokens are normally single characters, but can also be sequences of characters. The standard format tokens may be used in this list, plus the additional formatting token $comma to represent a comma. Each token must be unique - you cannot list the same token twice. May not be used with type; required if new is used
    new="tokenlist" comma-separated list of replacement tokens. The elements in this list match 1:1 with the elements in the old list. Again, the standard format tokens (plus $comma) may be used. An empty element in the new list will result in the corresponding token in the old list being deleted from the string. If the new list is shorter than the old list it will be extended to the same length using the empty element. Tokens do not have to be unique.
    ALERT! when using old and new, be aware that the results of applying earlier tokens are not processed again using later tokens. (see examples below)
    May not be used with type; required if old is used
  • If ENCODE is called with no optional parameters (e.g. %ENCODE{"string"}%) then the default type="url" encoding will be used.
  • Predefined encodings.
    • Unless otherwise specified, the type parameter encodes the following "special characters"
      • all non-printable ASCII characters below space, except newline ("\n") and carriage return ("\r")
      • HTML special characters "<", ">", "&", single quote (') and double quote (")
      • TML special characters "%", "[", "]", "@", "_", "*", "=" and "|"
    • type="entity" Encode special characters into HTML entities, like a double quote into &#034;. Does not encode \n (newline).
    • type="html" As type="entity" except it also encodes \n (newline)
    • type="safe" Encode just the characters '"<>% into HTML entities.
    • type="quotes" Escapes double quotes with backslashes (\"), does not change any other characters
    • type="url" Encode special characters for use in URL parameters, like a double quote into %22
  • Examples
    • %ENCODE{"spaced name"}% expands to spaced%20name
    • %ENCODE{"| Blah |
      | More blah |" old="|,$n" new="&vbar;,<br />"}%
      expands to &vbar; Blah &vbar;
      &vbar; More blah &vbar;
      - this encoding is useful to protect special TML characters in tables.
    • %ENCODE{"10xx1x01x" old="1,x,0" new="A,,B"}% expands to ABABA
    • %ENCODE{"1,2" old="$comma" new=";"}% expands to 1;2
    • IDEA! Values for HTML input fields must be entity encoded.
      Example: <input type="text" name="address" value="%ENCODE{ "any text" type="entity" }%" />
    • IDEA! ENCODE can be used to filter user input from URL parameters and similar to help protect against cross-site scripting. The safest approach is to use type="entity". This can however prevent an application from fully working. You can alternatively use type="safe" which encodes only the characters '"<>% into HTML entities. When ENCODE is passing a string inside another macro always use double quotes ("") type="quote". For maximum protection against cross-site scripting you are advised to install the Foswiki:Extensions.SafeWikiPlugin.
    • IDEA! Double quotes in strings must be escaped when passed into other macros.
      Example: %SEARCH{ "%ENCODE{ "string with "quotes"" type="quotes" }%" noheader="on" }%
  • ALERT! when using old and new, be aware that the results of applying earlier tokens are not processed again using later tokens. For example,
    • %ENCODE{"A" old="A,B" new="B,C"}% will result in 'B' (not 'C'),
    • %ENCODE{"asd" old="as,d" new="d,f"}% will yield 'df', and
    • %ENCODE{"A" old="A,AA" new="AA,B"}% will give 'AA' and.
    • %ENCODE{"asdf" old="a,asdf" new="a,2"}% will give 'asdf'.
  • Related: URLPARAM

-- Contributors: CrawfordCurrie - 27 Feb 2010

Discussion

Formatting search results using a TML table is far too fragile, a known issue. Using TML tables for any layout related formatting is not advisable either.

The alternative is to use a HTML table. Only drawback: the resulting table can't be sorted by TablePlugin. Why not extend TablePlugin to be able to parse HTML tables? We need a centralized first-class table parser anyway.

That put aside, %SEARCH + %TABLE are not able to cope with large data sets anyway. Better use JQueryPlugin's grid widget which comes with the required ajax features to sort and paginate as needed.

Why doesn't FilterPlugin work out for you?

I am not adding myself to ConcernRaisedBy. I just wonder what's going on here.

-- MichaelDaum - 27 Feb 2010

I agree that formatting using TML tables is too fragile; and I usually use HTML tables and a Javascript table sorter (I never use TablePlugin if I can possibly avoid it). However I often work with other people's tables.

Why not extend TablePlugin to be able to parse HTML tables have you looked at the code? frown, sad smile

I do use FilterPlugin sometimes; but a lot of the time I am working on wikis that don't have it installed, and it's not an option to install it. Also, it has a lot in it, and just to get this tiny feature I really don't want to install a non-core plugin just to get a bunch of features I will never use, and don't want people on the wiki to use (such as SUBST, which duplicates INCLUDE, START(EXTRACT|SUBST) which duplicates sections, and MAKEINDEX which is just way out of placein that plugin).

-- CrawfordCurrie - 28 Feb 2010

TablePlugin code is a PITA. You have been working on other plugins adding more sophisticated table parsers. This all is a side show, but a pending job: to rip out all table parsers from TablePlugin, EditTablePlugin and SpreadShitPlugin that are not worth it and replace it with a centralized and capable table parsing service part of the core.

Do I get this right: FilterPlugin would have delivered, but you prefer to upgrade Foswiki to 1.1 on the site coming with this new ENCODE feature? Sounds odd.

My advice: either rewrite their TML tables and make it HTML tables, or install FilterPlugin. A lot less of a risk than upgrading to a Foswiki 1.1 not clear when it will come out.

-- MichaelDaum - 28 Feb 2010

The TML tables are there because they are simple to use. Same with all the TML syntax. "Normal" people do not know HTML. The strength of Foswiki is exactly that anyone with a few hours of reading and some working documented examples can build something simple and make it work.

It it not correct to expect normal users to be able to construct an HTML table including column widths and nice formatting.

What normal application builders are trying to do is something like

%SEARCH{blablablablabla format="| $topic | $formfield($headline) | $formfield($description) |"

and this actually partly works. Except when someone puts a '|' in the description text. If you use something different than $formfield also quotes and newlines destroy the tables.

When I built the MultiTopicSavePlugin I had the goal of making a plugin with which people - that are not total geeks - can create an application where they make a SEARCH that creates a table which again contains HTML input fields.

And I was fighting this lack of proper feature to encode for tables and had to show my examples in the plugin documentation using HTML tables.

Until a later version where I decided to build in the proper encoding in my plugin so you could still create tables using the much simpler TML table syntax. A terrible hack and something that made MultiTopicSavePlugin more geeky to use.

And it is not the first time I have had this problem with SEARCH and TML tables and I do not find it nice or acceptable that I had to go for HTML tables. It means that for those applications noone else touch them and improve them. It all becomes too geeky. And I do not want to install FilterPlugin for many good reasons of which Crawford has mentioned most of them.

Crawford hits a sweet spot with his proposal. It enables people to put an $percentENCODE{blabla old= new= }$percent and get a working table without finding their local Asperger Syndrome geek to create the application for them.

We already have an ENCODE Macro in core. It just simply lacks an encode mode to be used to protect in TML tables. Crawford's proposal is on top of it so generic that it can also be used in other situations like comma separated lists.

I support this proposal.

-- KennethLavrsen - 28 Feb 2010

Michael, you have a fair point regarding the backward compatibility. I have already rewritten existing apps to use HTML tables; I am not looking backwards, I am looking forwards.

-- CrawfordCurrie - 28 Feb 2010

The real solution is to specify result sets in all their depth. While cooking food for my kids I had time to rethink the recent discussions on a series of topics here. Conclusion is that we really need an abstract ResultSet class that TML can interact with in various ways, last but not least is formatting the results. From that POV, I must consider the proposed extension to ENCODE as adding more cruft to Foswiki just working around proper result sets.

-- MichaelDaum - 28 Feb 2010

I must have missed the backwards compatibility problem. I do not see any. The ENCODE mode remains unchanged unless you use the new feature.

And I have no clue what the ResultSet means to me as a user. That is too abstract to absorb.

I do know that this simple small enhancement of the existing ENCODE would be very useful to end users in many small simple applications. SEARCH is just one of them.

I guess we will need to arrange a vote on this one when the 14 days has passed.

-- KennethLavrsen - 28 Feb 2010

How about discussing things first to get a clearer picture before trying to squeeze opinions out of us all? This proposal is ... one day old?

-- MichaelDaum - 28 Feb 2010

I did not say I would arrange a vote now. I said after the 14-day period is expired. And naturally if a discussion is going on where something new is brought in then we wait. But I do want a vote if this ends up hanging with your concern blocking it.

-- KennethLavrsen - 28 Feb 2010

It's important to note that we already appear to have code to do the mangling of newlines in formfields via Foswiki::Render::protectFormFieldValue(), apparently for the benefit of formatted searches and at the detriment of being able to render newline sensitive TML anywhere else Eg. Tasks.Item5489.

Agree that result sets would be very important for 1.1 (in fact this is yet another feature that would merit calling it 2.0)

-- PaulHarvey - 28 Feb 2010

Result sets has nothing to do with this. ENCODE is a generic macro we already have and it has already a number of modes.

What is missing is a mode that can be used to encode ANY data that the user needs to display in a TML table.

Problem is that characters newline and vertical bar kills tables. And inside searches the double quotes causes trouble. We lack a mode that can be used by a normal end user. And not always in a SEARCH.

Crawford has proposed a generic enhancement to ENCODE so you can encode anything to anything once and for all.

If this is undesired or too geeky then I would at least propose that we add a new fixed mode that encodes | \n. | to html entity and \n to html br tag. That is what we really need. Suggesting installing plugins to do this makes no sense. It is shooting a fly with a canon. ENCODE is there already and lacks this mode.

-- KennethLavrsen - 09 Mar 2010

Don't have time to rephrase my concerns yet again. Although I remove my concerns from this one I really must say that Foswiki tends to accumulate more and more cruft with no halt. There are so many cranky wrinkles here and there ... unbelievable. So it really doesn't matter anymore adding yet another one.

-- MichaelDaum - 09 Mar 2010

Michael thanks for removing the concern. To get a solution that most of us will like.

Michael - if you had to choose between the universal but more geeky solution with old and new, and a simple extension where the existing type gets an additional mode called "table" that translates \n to html br and | to html entity. What would be your preference.

I ask because I am personally not sure. The universal ensures that in future we do not need to add more modes. But it also makes the now simple ENCODE macro more complex with two different parameters that interact.

Crawford what is your view on just doing a type="table" extension instead of the universal model? Your original headline of this was to enable encoding of \n to BR. And I add the vertical | as a requirement based on my experience of characters that have caused trouble in tables in real life.

As I said, I am personally not sure.

Now that concern is removed the proposal can be declared passed by consensus when we get to the 13th of March unless new concern is raised. So let us think about the best spec the last 3 days.

-- KennethLavrsen - 09 Mar 2010

the proposal essentially creates a new ENCODE type - replace - so can we also add the explicit type="replace" ?

-- SvenDowideit - 10 Mar 2010

That is a very good idea Sven. Even if it seems more complex on the surface it is more logical for the enduser and more clean as a feature.

type="replace" would be the only type using the additional old and new parameters. Then we would need to define the behaviour of type="replace" when the old and new is not defined. Most logical is to default them to ''. Ie of old is not defined ENCODE does nothing. If new is not defined, the old characters get deleted. And if there is a mismatch between the number of old and new characters then the extra old without a match get replaced by ''. If there are extra new they are ignored. Would that be the complete spec?

-- KennethLavrsen - 10 Mar 2010

Note that ENCODE is a mapping tool. It fulfils the role of tr/// but does not attempt to fulfil the role of s/// (we have SpreadS**t and FilterPlugin for that). The goal here is to generalise that mapping, to make it useful beyond our limited imaginations. Just off the top of my head I can think of other uses for it that users might find, for example %ENCODE{"string" old="oh shit,damn,bugger it" new="oh dear,whoops,golly gosh"}%

I explored various syntaxes - after all, this is modeled closely on perl tr/// - but ended up with old and new because I felt it was the least nerdy. Because the 'tr' mode is the default mode, it relegates the type parameter into the role of a "special case" i.e. a definition of type says that old and new need not be given. As such I don't see the replace value as useful, though I don't really object to it either; it's just noise.

I don't have a problem with adding new type names, for example table would be:
%ENCODE{"string" type="table"}%
instead of
%ENCODE{"string" old="|,$n" new="&vbar;,<br />"}%
i.e. significantly less geeky. However it is not in any sense a viable alternative to old/new, and frankly I'd rather not hardcode another type because it's so easy to:
   * Set TABLEENCODE = old="|,$n" new="&vbar;,<br />"
%ENCODE{"string" %TABLEENCODE%}%
I'd rather keep the type names for long encodings, such as entities.

-- CrawfordCurrie - 10 Mar 2010

The type="table" was only in case the more generic proposal could not be agreed.

But if we do the generic - and I hope we do, then I would not also add more types. Instead I would document the old="|,$n" new="&vbar;,
" example in the VarENCODE topic

The reason for the type="replace" (or type="transliterate" which may be more accurate but also a word noone knows if they are not programmers) would be for better clarity. The current ENCODE has a default type if it is not specified which is "url". But when you add old and new it becomes "replace" or "transliterate".

I think the enduser will have an easier time understanding the magic if there is a type that transliterate which then takes two extra parameters that the other modes ignore.''

I can live with the old proposal but I think Sven had a good point. Either as human "replace" or high society "transliterate".

-- KennethLavrsen - 10 Mar 2010

When old and new are defined, type is ignored, so it can be anything you like. Despite Micha's comments, this is a sensible extension, not a cranky wrinkle (a cranky wrinkle would be a new type). Consensus is reached, so going for it.

Addition to the spec; I'm adding $comma as a standard formatting token. I started adding it just for this proposal but realised that it is useful elsewhere for dealing with lists, and we want to maintain consistency wherever possible. I appreciate this isn't part of the original specification, but I didn't realise until I started coding, and I can't see that it's going to give anyone any issues. If it does, shout!

-- CrawfordCurrie - 11 Mar 2010

The 14 days are not over yet. You cannot declare consensus before the 14-days have passed. We discussed this just one week ago. We had an agreement. I documented the process.

This can become consensus 14-days after the date of commitment. That is the 12th of march. The FeatureProposals even calculated the days for you.

On top of it you change the proposal the same day

And you totally dismiss Sven and Me as if our opinions do not matter.

I raise concern. And I will revert checkins on this

-- KennethLavrsen - 11 Mar 2010

Sorry, my mistake; I forgot that February only has 28 days frown, sad smile I thought we had reached a consensus, Sven hadn't commented further, and you seemed to accept - happy to flip back to under investigation, though, if you think it needs more discussion.

You can add whatever value of type you want, it will purely be a documentation convention. I really don't want to have to specify a redundant type when using the macro with old and new, however; it's just extra typing as long as old and new are specified. If you insist that specification of a type parameter override old and new, however, then type="replace" becomes a requirement. I just don't think it adds any value.

The extension to $comma to make it a standard formatting token; I think it needs another feature request - SupportCommaFormattingToken

-- CrawfordCurrie - 11 Mar 2010

i do agree with Kenneth, silence probably shouldn't be seen as acceptance - when there is 24 hours between posts....

yes, i expected that you lazy typers would want type="replace" to be 'just documentation', and am willing to live with that until a problem is found.

i prefer to have a means of being explicit, even if its not mandatory - it gives automated tools a chance to group parameters, and a route to validate things.

I don't entirely grok the need for it, but am way to distracted to spend the time to see if there is a super nifty way to get there - so I recon

%!ENCODE{"go for it" old="g,o,f,o,r,i,t" new="w,h,a,t,t,h,e" type="replace"}%

yes, I expect to see something as wrong as the above in the unit tests smile


later after discussing that example on irc, I reject your proposal and would replace it with my own, cept I think Crawford already has a spec change pending.

  <CDot> SvenDowideit: what's wrong with the example you gave?
[20:48]  <CDot> seems kosher to me
[20:48]  <SvenDowideit> go on, its quite naf
[20:49]  * CDot gets "wh ahe he"
[20:50]  <SvenDowideit> yup
[20:50]  <SvenDowideit> ie, if a user repeats a letter on either side, they're clearly deserving a red syntax error
[20:51]  <CDot> erm, no. Because the replace is *not* a tr
[20:51]  <CDot> example: %ENCODE{"%" old="%,X" new="X,Y"}% expands to Y
[20:52]  * SvenDowideit wants to see how you explain that in the docco - not with an eg, but as an explaination and why
[20:52]  <SvenDowideit> cos i'm not sure that is a particualrly clever idea
[20:53]  <CDot> all the docco is in the proposal topic
[20:53]  <SvenDowideit> it sounds more like a hehaw
[20:53]  <CDot> it has pros and cons, I know
[20:53]  <SvenDowideit> it has surprises, and that is something that i find concerning
[20:53]  <CDot> implementing a full tr// is quite tricky
[20:54]  <SvenDowideit> farcing such a thing on users is a much worse kind of trick
[20:54]  <CDot> probably. I wasn't sure, myself, so the feedback is constructive.
[20:54]  <SvenDowideit> sounds like i would be thinking of adding a concern of the 'we don't have time to think about this propoerly' type
[20:55]  * SvenDowideit cannot find the docco for why a user would like to have that king of multi-evaluated encode
[20:55]  <SvenDowideit> in that if they get the order wrong, they will get into a non-idemot*&^*&^*itc loop
[20:56]  <CDot> no worries, I wasn't happy myself, so a tr/// is fine by me.
[20:56]  <SvenDowideit> which was the fuckup we both made of the cuid's
[20:56]  <CDot> true
[20:57]  <SvenDowideit> in fact, when you expend the thing further, its even further a bad idea
[20:57]  <SvenDowideit>  %!ENCODE{"%" old="%,X,NA" new="XABNA,Y,%"}%
[20:59]  <SvenDowideit> should that go round forever, or, well, basically, its an unclear 'command' that only the high priest will read correctly, most code _readers_ would probly give up and suggest that writing in obfuscated perl is more sensible
[20:59]  <SvenDowideit> especially if they've been used to how the other ENCODE types work
[20:59]  * SvenDowideit pokes a stick in the flames and cries for argentina
[21:01]  <CDot> ok "wh aht he" it is
[21:01]  <SvenDowideit> no, i think that is a bad idea too
[21:02]  <CDot> you do? make up your mind! ;-)
[21:02]  <SvenDowideit> i think the entire proposal is not good atm
[21:02]  <CDot> oh, the repeated same token
[21:02]  <CDot> *sigh* - you have an alternative?
[21:02]  <SvenDowideit> if you repeat on either side, or as the further eg make more mess, you are making a big stuffup.
[21:02]  <CDot> FilterPlugin, perhaps? Just accept it into the core, don;t ask questions?
[21:03]  <SvenDowideit> no, that beast isn't the answer either
[21:03]  <SvenDowideit> especially at this late date
[21:03]  <SvenDowideit> the other plugins that have been cored still have not been done

%!ENCODE{"%" old="%,X,NA" new="XABNA,Y,%"}%

My first worry is that this reads like a feature that makes obfuscated Perl look easy, and the second is that I can't help thing that we should do more towards fixing the root of the problem - which is that tml tables are weak at dwim.

for eg: can you think of any time that a user would actually want the following format to break the table? (if not, we could add magic..)

SEARCH{
   ...
      format="| $topic | $text | $formfield(description) |"
}

or even more specifically, if there's also a footer element?

in fact, I've often wondered why we would want stray |'s to cause one or the other row of an otherwise good table to be the wrong width..

-- SvenDowideit - 11 Mar 2010

I hear you, but this is pretty much the same approach as protectFormFieldValue, which has been causing so much grief recently.

I changed the spec here (and implementation and unit tests) to take account of your observations re repeated tokens and tr/// compatibility.

-- CrawfordCurrie - 11 Mar 2010

added another eg, and moved the IDEA! where is has a slightly better chance of being seen by those that need it most

-- SvenDowideit - 11 Mar 2010

How could the parameter instance_num of Excels string substitution function
  =SUBSTITUTE(text;old_text;new_text;instance_num)
be implemented by the proposed ENCODE enhancement? wink

-- FranzJosefGigler - 11 Mar 2010

It couldn't. Use SpreadLegsPlugin (or FilterPlugin, maybe) for that.

-- CrawfordCurrie - 11 Mar 2010

The very first sentence of this proposal reads:

A commonly encountered problem is the case where you have a search over formfields, and you want to present the results in a TML table, but one of the form field values has a newline in it.

Crawford, protectFormFieldValue() only causes us grief because it is always called when rendering any formfield via renderForDisplay(). Which ruins newline sensitive TML when displaying these formfields via META (which uses renderForDisplay(), whereas FORMFIELD does not - just dumps formfield $value) in the standard view template.

That, and it doesn't appear to work anyway. I must be going a little crazy; $formfield() expansions do get <br/> in place of newlines, so TML tables don't break...

I think Sven's example search above is exactly what 99% of users (including me) want to "just work".

Expecting users to revert to crafting HTML that likely won't validate is insane.

Expecting them to embark on an adventure in escaping out a nested ENCODE macro and learning its parameters that'd be required is insane.

I want:
  • The caller of renderForDisplay() to first call protectFormFieldValue() if they want that to happen. So it no longer mangles newlines for everybody calling renderForDisplay(). Nobody has a problem with that, and it's easy to do.
  • Think about fixing protectFormFieldValue() so it may actually work as originally intended for $formfield().
  • Convince me why
    $formfield(Foo) (and a corresponding $formfield(Foo, noprotect) )
    is a worse idea than
    $percntENCODE{\"$formfield(Foo)\" old=\"\r\n,\r,\n,|\" new=\"<br/>,<br/>,<br/>,&vbar;\"}$percnt

Finally, the fact that newlines in metadata break TML tables built around $formfield() should be considered a bug, because (ignoring the fact that it doesn't work) we already have a feature (however deficient) that is supposed to solve that problem:

The protected value is determined from the value of the field after:
  • newlines are replaced with <br> or the value of $attrs->{newline}
  • processing through breakName if $attrs->{break} is defined
  • escaping of $vars if $attrs->{protectdollar} is defined
  • | is replaced with &#124; or the value of $attrs->{bar} if defined

I have no strong opinions on the detail of this proposal, but it abandons the task it set for itself in the very first sentence.

-- PaulHarvey - 12 Mar 2010

Given this search:
%SEARCH{"Foo~'*'" header="| *Thing* | *Bar* |" format="| $formfield(Foo) | Test! |" }%

And this in the formfield value
   * a
   * b
   * c
   * d

| *1* | *2* | *3* |
| a | b | c |

asdf
... should have had this contain another SEARCH that built another TML tabl) ...

And this quick & dirty hack, which destroys half of Foswiki, based on MartinCleaver's notes in Tasks.Item5489

Index: Form/FieldDefinition.pm
===================================================================
--- Form/FieldDefinition.pm   (revision 6724)
+++ Form/FieldDefinition.pm   (working copy)
@@ -333,8 +333,10 @@
         }
     }
 
-    require Foswiki::Render;
-    $value = Foswiki::Render::protectFormFieldValue( $value, $attrs );
+    my $topicObject = Foswiki::Meta->new( $this->{session},
+        $this->{session}->{web}, $this->{session}->{topic} );
+    $value = $this->{session}->renderer->getRenderedVersion($value, $topicObject);
+    $value =~ s/\r?\n//g;
 
     $format =~ s/\$title/$this->{title}/g;
     $format =~ s/\$value/$value/g;

Before After
RenderFormFieldsBrokenly.png RenderFormfieldsProperly.png

-- PaulHarvey - 12 Mar 2010

That is indeed the first line of the proposal. But you are presupposing that said "search over formfields" was a %SEARCH. In my case, it wasn't, it was a FORMQUERY. And that was only the seed; shortly thereafter I encountered a second problem, described in the use cases, which fell to the same solution. I'm not shooting down $formfield() with parameters (though I do think the risk of it seeding a new macro language is high), I'm proposing a solution that works in a different space, and addresses other problems besides the narrow newline-in SEARCH issue.

-- CrawfordCurrie - 12 Mar 2010

During the development of MultiTopicSavePlugin these were the observations I did.

In a SEARCH the $formfield today is partly encoded for protection of tables. But it misses a few cases.

In my plugin I had to give up using $formfield and had to add a $value that uses this encoding

In view mode where the formfield is just displayed: replace \n by <br />, replace | by &#124;

In edit mode where I need the value shown in an input field I do not do any encoding.

So no matter what $formfield was either encoding too much or too little.

I tried using a combination of %FORMFIELD and %ENCODE. FORMFIELD returns the content raw and unencoded. But ENCODE lacks the feature that this feature proposal started with.

To maintain reasonable compatibility we cannot stop having $formfield encode newlines. But I doubt we break anything by adding that $formfield also encodes the | to the html entity. That would probably solve the remaining issue for the common case of SEARCH returning TML table rows. it would be a "do not need to think" solution.

But we actually need a way to let $formfield return the raw content. But then the problems starts. $formfield already has 2 additional parameters. So an encoding mode would be a 4th parameter which on top of it would rarely be used with the 2nd and 3rd parameter.

Maybe it is just best that for the 99% of the cases $formfield should continue encoding like it does + coding the |.

And %FORMFIELD should continue being raw for those 1% of use cases where encoding of $formfield is a problem. MultiTopicSavePlugin has its own mode now so it is no issue.

But that is SEARCH.

ENCODE is relevant in MANY other situations including encoding data received via URLPARAM, $pattern in searches, INCLUDEs etc. So Crawford's extension to ENCODE does create a feature which today can only be done with CALC. And the way CALC works it is an absolute pain in most cases.

-- KennethLavrsen - 12 Mar 2010

Kenneth, if you were coding MultiTopicSavePlugin to the Foswiki::Form "api" as MichaelDaum did with FlexFormPlugin, you would have the option of telling renderForDisplay() to not encode newlines and vbars. And anyway, you could use renderForEdit() which should behave properly in the first place (and would automatically know about 3rd party field types).

VarSEARCH's $formfield() should already encode | to &amp; (unless the Search.pm code overrides the defaults).

Crawford, is there a reason FormQueryPlugin couldn't be trivially adjusted to make its version of $formfield() do what you need?
  • *RED% See answer below

I doubt this is extending our macro language; we already have terribly inconsistent header/format/footer strings across Foswiki.

Crawford, am I right in thinking you would be able to solve most of your problems if you enhanced the $formfield() token in the format string implementations of the plugins you work with?
  • *RED% Yes, but I really don't want to have to do this every time I encounter an encoding problem (and this is only one of them)

Perhaps we aren't ready for a $formfield() solution until we centralise all the parsers/formatters to a common API?

As for the ENCODE enhancement, I have no problems with that technically speaking.

I will drop my concern, if we acknowledge that $percntENCODE.... is not a sensible solution for the problem we are trying to solve here.

It's hard to get outsiders excited about Foswiki when they see such horrible string soup, for apparently boring technical/legacy reasons.

-- PaulHarvey - 12 Mar 2010

Ah, interlinked concepts keep coming back.

<rant>

String soup is a problem, no doubt about it - just look at any perl program that manipulates strings. But there's a price to be paid for any functionality, in terms of usability - more to learn, more to remember.

In the past (T|Fos)Wiki has suffered from people adding in "simple fixes" or "trivial extensions" that either close a door for a more generic approach (witness the escapes in strings) or open the floodgates for other stupidness (e.g. the more esoteric plugins handlers). I am genuinely concerned that $formfield() is the thin end of an ill-considered wedge that ends with a brand new macro language, as we find new requirements for format. $include has already been proposed. How long before $formfield() is too limiting and you need $query()? What about the string soup that results when you use a URLPARAM in a format string? Wouldn't $urlparam be cleaner? Oh, and $if - surely we need $if? Will the gods forgive me for suggesting $calc? And how do these macros, with their new calling conventions, stay in step with their %BIGBROTHER (%FORMFIELD)?

</rant>

$formfield(Blah) is just shorthand for (a subset of) $percentFORMFIELD($quotBlah$quot}$percent (I think that's why the names are the same). If $formfield suddenly behaves differently to %FORMFIELD, then that connection is broken, and hey presto, we have a new macro language. If only $formfield() really did behave like %FORMFIELD%, we wouldn't have the problem stated in the first sentence. -- PH

So I see $formfield as just another (perfectly valid) alternative syntax for UseSyntaxToChangeEvaluationOrder - it's a means to call %FORMFIELD inside a macro. And I don't believe %FORMFIELD with a filtering step is an alternative to ENCODE, which offers far more, as shown in the examples above. But at the same time, there's no reason the existance of ENCODE should stop you considering a useful extension to %FORMFIELD, either - it's just not an alternative to this particular proposal.

If you want to discuss $formfield as an alternative syntax for late eval, please contribute to UseSyntaxToChangeEvaluationOrder.

-- CrawfordCurrie - 13 Mar 2010

SEARCH is the first thing users go to for extracting information out of their topics. $formfield() is the best practice that we continuously drive in all the examples for extracting data from forms.

And it's broken.

Personally I see IF, URLPARAM and QUERY as a fairly long way down the slippery slope, given that $formfield() is such a core part of what users are told use, and do use (as best as I can tell). If there are other "best practices" out there, the documentation is hard to find, and the examples out of date.

It was only around July last year when I started with Foswiki that the BugsContrib in the Extensions web still had type=regex instead of query SEARCH in it, literally hunting for META:FIELD{.... (edit: and current in svn is still this way - and uses $formfield())

My concern is that $formfield() does not behave like FORMFIELD, which should be a bug that we fix.

Instead we are talking about solving (that bug) with a new feature.

I have now removed my concern. I'm yet to be convinced that we should leave $formfield() behaving differently to FORMFIELD.

But when considered on its own the new feature is useful. So I hope we can come up with something at UseSyntaxToChangeEvaluationOrder

-- PaulHarvey - 13 Mar 2010

$formfield does not behave like FORMFIELD. Yes. That is how it should stay. $formfield is made to work in a SEARCH. And 90%+ of searches result in a table where the encoding of the text from fields is vital.

FORMFIELD does not encode which enables us to use it with or without ENCODE around it.

So do not try to resolve some "bug" in $formfield. I myself was part of getting $formfield to work as it does now and it is not a bug. It is a feature.

Problem with FORMFIELD is that ENCODE does not have a feature that can encode it for use in a table, and this very proposal was originally raised to address exactly that. And even in its more advanced trandliteration from it still would do this job.

-- KennethLavrsen - 13 Mar 2010

Did you see the patch I made, and the before/after screenshots?

It's as if protectFormFieldValues() tries to partially render the content. Or deliberately mangle it.

Instead of htis why not find a path to have it render it fully in the TML table, and do away with protectFormFieldValues() fully, as MartinCleaver originally pointed out, and as I've demonstrated in the patch?

This is the difference between META and FORMFIELD. $formfield() is really %META%, in this respect.

-- PaulHarvey - 14 Mar 2010

It is hard to discuss feature on the basis of the code that lies below.

Today a SEARCH ... format="| $formfield{SOMEFIELD} | $formfield{OTHERFIELD} |" works. Even if the SOMEFIELD or OTHERFIELD contains newlines, vertical bars, double quotes etc it all works. And this is how it has to continue to work. That is what I see as important.

If you want the content of a formfield inside an text area input field, the $formfield is no good for the same reason that it is good in normal cases. And this is where FORMFIELD returning the raw content of a form field comes to play.

If the $formfield can be made to render the content even nice inside a TML table then it is fine and I have the feeling this may be what Paul means.

-- KennethLavrsen - 14 Mar 2010

On the spec I raised concern against, I have voiced my view. And the extra change in the spec was moved to a new feature proposal.

I lift my concern and declare consensus on the original spec.

I do not think the syntax for ENCODE is big enough to have a community vote on.

Leaving it to Crawford to document his feature so the end users have a clear view how things work

-- KennethLavrsen - 14 Mar 2010

Yes, making $formfield render content properly inside TML table is what I meant.

I think the users of ENCODE will be able to understand that old/new params imply the type of transliteration. A bit late now, but it would be nice if an error is emitted, instead of ignoring a conflicting type param.

-- PaulHarvey - 14 Mar 2010

An error is emitted.

-- CrawfordCurrie - 14 Mar 2010
Topic revision: r40 - 05 Jul 2015, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy