Item8098: [PATCH] UTF-8 Site Charset breaks WikiName normalization for Umlaute
Priority: Normal
Current State: Closed
Released In:
Target Release:
Hi,
LdapContrib is failing to do the normalization of wiki names and login names for german umlauts, if you happen to run your site on UTF-8. This is very bad, as UTF-8 should be the default anyway IMHO.
Attached is a patch, that will plaster over this by adjusting the s/ö/oe/ substitutions if the charset is UTF-8. A nice side effect of the patch is, that it makes the source code ASCII clean.
There is still an outstanding issue, though. The normalization concatenates the cn attribute as it is coming from LDAP. My name thus should come out as
UlrichSpörlein before the patch and
UlrichSpoerlein after the patch. The split() however is not taking the Site Locale into consideration, resulting in my wiki name becoming
UlrichSpRlein. I tried adding a setlocale(LC_CTYPE, "de_DE.UTF-8") before the split so that the regexp class [:alpha:] includes ö, but nothing changed. I confirmed, that the string in question is UTF-8 encoded (ö is represented as c6b8 or something) and also tried an utf8_upgrade(). This should go into another bug report, however.
--
UlrichSpoerlein - 04 Apr 2009