You are here: Foswiki>Tasks Web>Item11316 (22 Dec 2014, GeorgeClark)Edit Attach

Item11316: Foswiki::Render inappropriately (ab)uses empty <p> tags

pencil
Priority: Normal
Current State: Confirmed
Released In: n/a
Target Release: major
Applies To: Engine
Component: FoswikiRender
Branches: Release01x01 trunk
Reported By: MyqLarson
Waiting For: MyqLarson, PaulHarvey
Last Change By: GeorgeClark

Background discussion

Foswiki produces invalid HTML. For example, from foswiki.org/System:

<p></p>
This is where Foswiki keeps system documentation, including documentation that is included with optional extensions.

This can be also be seen by using an HTML validator.

-- MyqLarson - 01 Dec 2011

I've passed the site through the HTML validator as you request. Maybe I'm not using the right options, but it does not seem to be complaining about the empty paragraphs. Our nightly unit tests also pass most of the foswiki scripts through HTML::Tidy for some basic sanity checks on the generated HTML.

update PaulHarvey pointed out on IRC that on Foswiki.org, the validation issues are probably due to the inclusion of the 3rd party Piwik Web Analytics code. This is not included in the Foswiki distribution code. PWik inserts some javascript, which we insert into the document head, but also provides a <noscript> fallback which is invalid in the head.

-- GeorgeClark - 02 Dec 2011

The link to the HTML validator doesn't complain about it, but look at line 178 and beyond. You'll see

and the actual text outside of the tags. This occurs on my local installation as well.

View the page source for this page and you'll see:
<p></p> I've passed the site through the HTML validator as you request. Maybe I'm not using the right options, but it does not seem to be complaining about the empty paragraphs. Our nightly unit tests also pass most of the foswiki scripts through HTML::Tidy for some basic sanity checks on the generated HTML.
Tags are not surrounding the text.

-- MyqLarson - 07 Dec 2011

The problem seems to be a non-standard thing we use on foswiki.org to track stats, piwiki, which puts some illegal <p></p> in the <head> inside a <noscript>

-- PaulHarvey - 08 Dec 2011

It happens on my localhost installation as well. I just used foswiki to show the example so that I wouldn't have to list all the extensions and such. I don't use an stats or piwiki. you can see the same problem on: Just view the source and look for a paragraph tag.

-- MyqLarson - 08 Dec 2011

W3C validator does not mark that as invalid.

w3 org says "We discourage authors from using empty P elements. User agents should ignore empty P elements."

-- ArthurClemens - 09 Dec 2011

It only marks as valid because all the content is wrapped in a div and because the doctype is set as transitional. As currently rendered, foswiki HTML will fail strict because text inside of blockquote tags must be wrapped in p or other. Current output is:
<blockquote>
<p></p>
Paragraph text.
</blockquote>

Additionally, using the WYSIWYG editor, indenting a paragraph will not survive a save because the indenting is set through an inline style on the p tag. Since the p tags are not rendered correctly, the indent applied in the editor will not appear in the saved page. It will work if something else is first applied which forces the p tags to correctly wrap the text. For example, if I centre a paragraph of text using the WYSIWYG editor and save, it will wrap the text properly in p tags:

<p align="center">lorem ipsum.</p>

Then, if I change the alignment back to left and save, the text will remain correctly wrapped:

<p align="left">lorem ipsum.</p>

Only then can I indent a paragraph in the editor and see that it appears indented upon save:

<p align="left" style="padding-left: 30px;">lorem ipsum.</p>

Surely this is not the intended behaviour. No other elements appear to be incorrectly applied so I'm quite surprised at the defensive replies here. Are people really seeing these <p></p> tags and concluding that it's intended?

-- MyqLarson - 10 Dec 2011

Ah, yes, the empty <p> tag problem. So we've got three issues in this bug report:
  • Re-titled this bug specifically to address empty <p> tags.
  • Raised a separate task Item11348 against PiwikPlugin breaking XHTML validation.
  • Note that WysiwygPlugin's inability to express paragraph indenting is being developed as a new feature in Item2516

Thank you for reminding us to fix this problem. It's been on our minds - had many discussions with MichaelDaum, ArthurClemens and others about the CSS difficulties this causes - the problem is that Foswiki::Render is a scary place that's evolved over a decade, and I guess it's not sufficiently bothered anybody enough to actually go about fixing it properly.

In fact, I wouldn't be surprised if the original render code back in 1999 actually used a <br/>, which would be much more semantically correct than empty <p> tags.

I've lowered to normal priority for three reasons:
  • Even though the markup is wrong, and I'd dearly love to see it fixed, it doesn't actually break XHTML-transitional validation.
  • It doesn't affect the user experience in any measurable way (i.e. Foswiki continues to function normally).
  • Such an ancient "feature" which has affected every release ever of T/Foswiki that I'm aware of, shouldn't block the 1.1.4 release

If you disagree on any of these points, please feel free to raise it back up to urgent again.

If there's any help you can give in getting this fixed (Eg. failing unit tests, patches, etc), I'd be extremely grateful (as would everybody, in fact). If you're so inclined, RequestAccessToSubversion and/or hop on to IRC.

-- PaulHarvey - 10 Dec 2011

Proposed interim patch

I'm not good enough at Perl to do any real work, but I've taken a stab at it.

The problem is that the code in Foswiki::Render that identifies paragraphs simply sets $isList to 0. This can easily be changed to wrap that text in p tags.

Caveats

The problems I found are:
  1. my patch will also wrap inside of multi-line comments due to the way Foswiki::Render::_takeOutProtected works (there's a warning in the comments about it). This is probably more ideal than the current state of affairs, but not perfect.
  2. XHTML validation will now fail because it will wrap the WebLeftBarLogin greeting in p tags, but there's an embedded <ul> list in there by default, and lists can't occur inside of paragraphs. This can be fixed by replacing the %BR% macro to a newline:
    %IF{"context authenticated" then='%MAKETEXT{"Hello [_1]" args="<span class=\"foswikiUserName\">[[%WIKIUSERNAME%][%SPACEOUT{%WIKINAME%}%]]</span>"}%'}%%IF{"$ LOGOUT != ''" then='
    <ul><li class="patternLogOut">%LOGOUT%</li></ul>'}%
  3. There may be other issues as I haven't tested it extensively, but it seems to handle tables, lists, spans, etc well enough.

Patch

This patch is based on http://svn.foswiki.org/trunk/core/lib/Foswiki/Render.pm (Revision 13364)
--- Render_orig.pm
+++ Render_new.pm
@@ -1125,10 +1125,14 @@
     $text =
       $this->_takeOutProtected( $text, qr/<head.*?<\/head>/si, 'head',
         $removed );
-    $text = $this->_takeOutProtected( $text, qr/<textarea\b.*?<\/textarea>/si,
+    $text = 
+      $this->_takeOutProtected( $text, qr/<textarea\b.*?<\/textarea>/si,
         'textarea', $removed );
     $text =
       $this->_takeOutProtected( $text, qr/<script\b.*?<\/script>/si, 'script',
+        $removed );
+    $text =
+      $this->_takeOutProtected( $text, qr/<style\b.*?<\/style>/si, 'style',
         $removed );
 
     # Remove the sticky tags (used in WysiwygPlugin's TML2HTML conversion)
@@ -1291,11 +1295,15 @@
         # Lists and paragraphs
         if ( $line =~ m/^\s*$/ ) {
             unless ( $tableRow || $isFirst ) {
-                $line = '<p></p>';
+                $line = '';
             }
             $isList = 0;
         }
         elsif ( $line =~ m/^\S/ ) {
+      unless ( $line =~ m/^(\s|\t)*(<|<!--|-->)/ || $isFirst || $tableRow ) {
+         push( @result, "\n<p>\n\t$line\n</p>");
+         next;
+      }
             $isList = 0;
         }
         elsif ( $line =~ m/^(\t|   )+\S/ ) {
@@ -1313,7 +1321,7 @@
                 _addListItem( $this, \@result, 'dl', 'dd', '', $1 );
                 $isList = 1;
             }
-            elsif ( $line =~ s/^((\t|   )+)\* /<li> /o ) {
+            elsif ( $line =~ s/^((\t|   )+)\* /\n<li>\n /o ) {
 
                 # Unnumbered list
                 _addListItem( $this, \@result, 'ul', 'li', '', $1 );
@@ -1431,6 +1439,7 @@
         \&verbatimCallBack );
     $text =~ s|\n?<nop>\n$||o;    # clean up clutch
 
+    $this->_putBackProtected( \$text, 'style', $removed );
     $this->_putBackProtected( \$text, 'script', $removed, \&_filterScript );
     Foswiki::putBackBlocks( \$text, $removed, 'literal', '', \&_filterLiteral );
     $this->_putBackProtected( \$text, 'literal', $removed );

-- MyqLarson - 11 Dec 2011

Just a note - the empty paragraph tags came in as a fix to Item9607 - previously it used a self-closing paragraph tag for empty lines <p /> which failed XHTML Transitional validation. That dates back to before TWiki rev 6405 - in 2005. It's in the code in the earliest version I could find.

-- GeorgeClark - 11 Dec 2011

Cool. Did you give the unit tests a spin? What were the results? I see that this has been changed to "Being worked on", but without a committed developer. I'm assuming MyqLarson here. Set back to Confirmed if you think you've done all you can.

-- PaulHarvey - 11 Dec 2011

@GeorgeClark the issue isn't really the format of empty paragraph tags, but rather paragraph tags aren't actually wrapping paragraphs, only empty lines which is sort of useless semantically.

@PaulHarvey Sorry, I'm not sure how to run unit tests. I was just making a stab at it and hoping someone with more knowledge than I could run with it. If I find some time, I'll look into it further.

-- MyqLarson - 11 Dec 2011

Please hop on to IRC so we can help you get going with running unit tests. It's really easy, and very useful smile

-- PaulHarvey - 11 Dec 2011

@PaulHarvey, Thanks for the offer, but that's about all I can do for now. I don't have enough time to do it properly. I hope someone can make use if this incremental improvement.

-- MyqLarson - 15 Dec 2011

Thanks myq - we'll see where we can take it.

-- PaulHarvey - 16 Dec 2011

Another reason to get this fixed: the paragraph-look-a-likes cannot be targeted with the :first-element and :last-element selectors.

For instance:
This is contents

Should actually get rendered like this:

Header

This is contents

Test CSS code:
<style type="text/css" media="all">
.test h2, p {
margin:20px 0;
}
.testBlock {
padding: 10px;
border: 1px solid #ccc;
}
.testBlock h2:first-child,
.testBlock p:first-child {
margin-top: 0;
}
.testBlock h2:last-child,
.testBlock p:last-child {
margin-bottom: 0;
}
</style>

-- ArthurClemens - 07 Jan 2012

We're migrating some content to Foswiki, and this has proved a major blocker to completing that task. So I'm giving this a go.

-- PaulHarvey - 30 Jan 2012

Should this target a patch release? Is there anybody relying on the old/current behaviour somehow?

-- PaulHarvey - 30 Jan 2012

Properly wrapping content into <p> ... </p> is a must for descend typographics. it would also ease inline editing too. Problems that have been facing before where related to garanteeing the renderer still produces well-formed xml, e.g. paragraphs interfering with table or list markup so that a paragraph doesnt wrap only half a table or stuff like that. This seemed to be trickier within the render code than expected. Not sure whether the above patch already copes with this.

-- MichaelDaum - 30 Jan 2012

I wrote the patch as a quick and dirty attempt (I'm not a Perl programmer). In my limited tests, it worked for all elements I checked, with the two caveats noted above regarding comments and some existing templates. I'm sure it's an oversimplified solution, but for me it produces more usable X/HTML than the current renderer.

-- MyqLarson - 15 Feb 2012

I tried to make your patch work, and I tried to start all over again fixing Render.pm a different way, but each attempt resulted in totally broken markup.

Consider:
<div>
stuff

some other stuff
</div>

I couldn't figure out a nice way to avoid things like
<p></div></p>

I've created PluggableRenderers so we can try alternative rendering solutions more easily.

-- PaulHarvey - 15 Feb 2012

Ah, I see. I never added divs to my content, just text with semantic markup. The patch assumes that the only things left after taking out large blocks such as styles and scripts are line-based elements, lists, and tables.

The only way I see to do it is to take out multiline blocks, like with scripts, styles, etc, and process them recursively looking for paragraphs. Do tables embedded in tables work well? If so, then we could use the same logic as marking up (and avoiding overmarking) lists and tables.

-- MyqLarson - 16 Feb 2012

Deferring this to a major release. Render is a difficult area of the code, and we are close enough to the 1.2 release now ... finally ... that opening this can of work would be very disruptive.

-- GeorgeClark - 22 Dec 2014
 

ItemTemplate edit

Summary Foswiki::Render inappropriately (ab)uses empty <p> tags
ReportedBy MyqLarson
Codebase 1.1.4, 1.1.4 RC2, 1.1.4 RC1, 1.1.4 beta2, 1.1.4 beta1, 1.1.3, 1.1.3 RC1, 1.1.3 beta1, 1.1.2, 1.1.1, 1.1.0, 1.1.0 beta1, 1.0.10, 1.0.9, 1.0.8, 1.0.7, 1.0.6, 1.0.5, 1.0.5 beta1, 1.0.4, 1.0.3, 1.0.2, 1.0.1, 1.0.0, 1.0.0 beta3, 1.0.0 beta2, trunk
SVN Range
AppliesTo Engine
Component FoswikiRender
Priority Normal
CurrentState Confirmed
WaitingFor MyqLarson, PaulHarvey
Checkins distro:237f2e27db3a distro:70181c6dfffc distro:97d6b2112aec
TargetRelease major
ReleasedIn n/a
CheckinsOnBranches Release01x01 trunk
trunkCheckins distro:237f2e27db3a distro:97d6b2112aec
masterCheckins
ItemBranchCheckins
Release01x01Checkins distro:70181c6dfffc
Topic revision: r25 - 22 Dec 2014, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy