Item1096: Malformed header anchors if header contains non A-Za-z0-9_ characters - simple solution

pencil
Priority: Normal
Current State: Closed
Released In: 1.0.5
Target Release: patch
Applies To: Engine
Component:
Branches:
Reported By: SergeyAfonin
Waiting For:
Last Change By: KennethLavrsen
Similar: see Tasks.Item817

Anchors are buggy when some headers with russian characters. Foswiki-1.0.0, Sun, 01 Feb 2009, build 2411.

And the same goes for any characters that are not A-Z

Example:

test topic 1


1
1
1

тестовый заголовок 2


2
2
2

test topic 3


3
3
3

тестовый заголовок 4


4
4
4

test topic 5


5
5
5

-- SergeyAfonin - 17 Feb 2009

Hm... Example works here.

-- SergeyAfonin - 17 Feb 2009

I think this is a duplicate of Item817.

-- ChristianLudwig - 17 Feb 2009

Probably yes. I apologize for switching to Russian, I hope to Eugen Mayer. С этой проблемой получилась какая-то странность. На тексте, где я попробовал TOC в первый раз, не работает ни один якорь. В упрощённым примере, который я привёл здесь, не работает ссылка "тестовый заголовок 2", но работает "тестовый заголовок 4".

-- SergeyAfonin - 17 Feb 2009

Всё ещё интереснее ! Всё сказанное относится к Firefox (3.0.4). Сейчас случайно попробовал в Konqueror - всё работает, и в примере, и в исходном тексте. Здесь, на foswiki.org, пример работает и с Firefox полностью.

-- SergeyAfonin - 17 Feb 2009

Translation attempt (1st Russian comment):
'This problem reveals some oddity. In the text where I tried TOC the 1st time, not a single anchor worked. In the simplified example above the link of "тестовый заголовок 2" doesn't work, but "тестовый заголовок 4" works.'

Translation attempt (2nd Russian comment):
'It's still more interesting. Everything said relates to Firefox (3.0.4). Now I (incidentally) tried Konqueror - all works, both in the example and in the source text. Here on foswiki.org the example completely works with Firefox, too.'

Do you have UseLocale in your LocalSite.cfg, like
$Foswiki::cfg{UseLocale} = 1;
What is your site charset/locale? (see $Foswiki::cfg{Site}{CharSet} in LocalSite.cfg).

How do the anchors that are not working look in your HTML-source?

To see if this is really a duplicate of Item817 go to the Render.pm source file and change line (about 461 line)
      $anchorName =~ s/[^$Foswiki::regex{mixedAlphaNum}]+/_/g;
in
      $anchorName =~ s/[^A-Za-z0-9]+/_/g;
If now your anchors/links work this is the same problem as described in Item817.

-- ChristianLudwig - 18 Feb 2009

I think what your translation is good, thanks.

In my LocalSite.cfg:
$Foswiki::cfg{UseLocale} = 1;
$Foswiki::cfg{Site}{CharSet} = 'utf-8';
HTML-source:
<li> <a href="#test%2520topic%25201"> test topic 1</a>

</li> <li> <a href="#%25d1%2582%25d0%25b5%25d1%2581%25d1%2582%25d0%25be%25d0%25b2%25d1%258b%25d0%25b9%2520%25d0%25b7%25d0%25b0%25d0%25b3%25d0%25be%25d0%25bb%25d0%25be%25d0%25b2%25d0"> &#1090;&#1077;&#1089;&#1090;&#1086;&#1074;&#1099;&#1081; &#1079;&#1072;&#1075;&#1086;&#1083;&#1086;&#1074;&#1086;&#1082; 2</a>
</li> <li> <a href="#test%2520topic%25203"> test topic 3</a>
</li> <li> <a href="#%25d1%2582%25d0%25b5%25d1%2581%25d1%2582%25d0%25_AN1"> &#1090;&#1077;&#1089;&#1090;&#1086;&#1074;&#1099;&#1081; &#1079;&#1072;&#1075;&#1086;&#1083;&#1086;&#1074;&#1086;&#1082; 4</a>
</li> <li> <a href="#test%2520topic%25205"> test topic 5</a>
</li>
I think this is due to the great length of anchors. And apparently, Opera and Konqueror are using only part of an anchor for navigation.

-- SergeyAfonin - 19 Feb 2009

Oh, utf-8! The horrible percents % in the anchors are the source of the problem. You can change Render.pm. I think in your case the following workaround in
--- Render.pm   (revision 2531)
+++ Render.pm   (working copy)
@@ -459,6 +459,7 @@
     {
         $anchorName =~ s/[^$Foswiki::regex{mixedAlphaNum}]+/_/g;
     }
+    $anchorName =~ s/[^A-Za-z0-9]+/_/g;
     $anchorName =~ s/__+/_/g;    # remove excessive '_' chars
     if ( !$compatibilityMode ) {
         $anchorName =~ s/^[\s#_]+//;    # no leading space nor '#', '_'
should help; i.e. insert the line
$anchorName =~ s/[^A-Za-z0-9]+/_/g;

This is really only a workaround.

-- ChristianLudwig - 19 Feb 2009

> This is really only a workaround.

but it works. thanks. smile

> i.e. insert the line

I'm a novice at web-technology, but I know diff/patch. wink

-- SergeyAfonin - 19 Feb 2009

I have looked at this solution

And it is better than it just looks at first glance.

I would say that in 99% of Latin language used this does the job.

We have an advanced solution tracked in item 817 but this solution is compatible with the TOCs generated since Cairo and therefore also compatible with all the external links people have created by right clicking on the TOC entry and copying a URL with a TOC anchor and pasted it into emails and what not.

What we do now is correct the translation of non A-Za-z0-9_ characters to _ so accented characters do not produce encoded garbage.

You may think this in itself is incompatible but the fact is that these TOC links did not work at all. They are simply not seen by the browsers as anchors. So there is nothing to be compatible with.

This solution goes into 1.0.5 and for 1.1 we use this for the compatibility anchors that you can enable via a configure setting which upgraders can use.

For non-latin character sets the result becomes simple anchors _AN1 _AN2 _AN3. Not elegant. But they WORK. Unlike today where the feature does not work at all in non-latin character sets

If the result of the convertion is an empty string we simply put an A.

The RFC says the initial char should be A-Z. But all browser I tested works with anchors that start with a digit.

It is extremely common that people use numbered section so going for RFC correctness destroys URL compatibility. So we avoid being religious about this detail.

-- KennethLavrsen - 22 Apr 2009

I set to waiting for release but may reopen if I find severe UTF8 trouble.

-- KennethLavrsen - 22 Apr 2009

ItemTemplate edit

Summary Malformed header anchors if header contains non A-Za-z0-9_ characters - simple solution
ReportedBy SergeyAfonin
Codebase 1.0.4
SVN Range Foswiki-1.0.0, 01 Feb 2009, build 2411
AppliesTo Engine
Component
Priority Normal
CurrentState Closed
WaitingFor
Checkins distro:95f63a7428ad distro:5f69e96820d4
TargetRelease patch
ReleasedIn 1.0.5
Topic revision: r14 - 25 Apr 2009, KennethLavrsen
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy