diff --git a/phpBB/docs/coding-guidelines.html b/phpBB/docs/coding-guidelines.html
index aa937d5bfb..dcbbc011a0 100644
--- a/phpBB/docs/coding-guidelines.html
+++ b/phpBB/docs/coding-guidelines.html
@@ -1763,7 +1763,7 @@ if (utf8_case_fold_nfc($string1) == utf8_case_fold_nfc($string2))
With phpBB3, the output encoding for the forum in now UTF-8, a Universal Character Encoding by the Unicode Consortium that is by design a superset to US-ASCII and ISO-8859-1. By using one character set which simultaenously supports all scripts which previously would have required different encodings (eg: ISO-8859-1 to ISO-8859-15 (Latin, Greek, Cyrillic, Thai, Hebrew, Arabic); GB2312 (Simplified Chinese); Big5 (Traditional Chinese), EUC-JP (Japanese), EUC-KR (Korean), VISCII (Vietnamese); et cetera), this removes the need to convert between encodings and improves the accessibility of multilingual forums.
- The impact is that the language files for phpBB must now also be encoded as UTF-8, with a caveat that the files must not contain a BOM for compatibility reasons with non-Unicode aware versions of PHP. For those with forums using the Latin character set (ie: most European languages), this change is transparent since UTF-8 is superset to US-ASCII and ISO-8859-1.
+ The impact is that the language files for phpBB must now also be encoded as UTF-8, with a caveat that the files must not contain a BOM for compatibility reasons with non-Unicode aware versions of PHP. For those with forums using the Latin character set (ie: most European languages), this change is transparent since UTF-8 is superset to US-ASCII and ISO-8859-1.
Language Tag:
@@ -1773,8 +1773,8 @@ if (utf8_case_fold_nfc($string1) == utf8_case_fold_nfc($string2))
Most language tags consist of a two- or three-letter language subtag (from ISO 639-1/ISO 639-2). Sometimes, this is followed by a two-letter or three-digit region subtag (from ISO 3166-1 alpha-2 or UN M.49). Some examples are:
-
- Language tag examples
+
+ Examples of various possible language tags as described by RFC 4646 and RFC 4647
Language tag |
@@ -1825,8 +1825,8 @@ if (utf8_case_fold_nfc($string1) == utf8_case_fold_nfc($string2))
Next is the ISO 15924 language script code and when one should or shouldn't use it. For example, whilst en-Latn
is syntaxically correct for describing English written with Latin script, real world English writing is more-or-less exclusively in the Latin script. For such languages like English that are written in a single script, the IANA Language Subtag Registry has a "Suppress-Script" field meaning the script code should be ommitted unless a specific language tag requires a specific script code. Some languages are written in more than one script and in such cases, the script code is encouraged since an end-user may be able to read their language in one script, but not the other. Some examples are:
-
- Language subtag + script subtag examples
+
+ Examples of using a language subtag in combination with a script subtag
Language tag |
@@ -1892,8 +1892,8 @@ if (utf8_case_fold_nfc($string1) == utf8_case_fold_nfc($string2))
Examples of English using marco-geographical regions:
-
- Coding for English using macro-geographical regions
+
+ Coding for English using macro-geographical regions (examples for English of ISO 3166-1 alpha-2 vs. UN M.49 code)
ISO 639-1/ISO 639-2 + ISO 3166-1 alpha-2 |
@@ -1918,8 +1918,8 @@ if (utf8_case_fold_nfc($string1) == utf8_case_fold_nfc($string2))
Examples of Spanish using marco-geographical regions:
-
- Coding for Spanish macro-geographical regions
+
+ Coding for Spanish macro-geographical regions (examples for Spanish of ISO 3166-1 alpha-2 vs. UN M.49 code)
ISO 639-1/ISO 639-2 + ISO 3166-1 alpha-2 |
@@ -1947,7 +1947,7 @@ if (utf8_case_fold_nfc($string1) == utf8_case_fold_nfc($string2))
Example of where the ISO 3166-1 alpha-2 is ambiguous and why UN M.49 might be preferred:
-
+
Coding for ambiguous ISO 3166-1 alpha-2 regions
@@ -2003,7 +2003,7 @@ if (utf8_case_fold_nfc($string1) == utf8_case_fold_nfc($string2))
RFC 4646 anticipates features which shall be available in (currently draft) ISO 639-3 which aims to provide as complete enumeration of languages as possible, including living, extinct, ancient and constructed languages, whether majour, minor or unwritten. A new feature of ISO 639-3 compared to the previous two revisions is the concept of macrolanguages where Arabic and Chinese are two such examples. In such cases, their respective codes of ar
and zh
is very vague as to which dialect/topolect is used or perhaps some terse classical variant which may be difficult for all but very educated users. For such macrolanguages, it is recommended that the sub-language tag is used as a suffix to the macrolanguage tag, eg:
-
+
Macrolanguage subtag + sub-language subtag examples
@@ -2047,7 +2047,7 @@ if (utf8_case_fold_nfc($string1) == utf8_case_fold_nfc($string2))
For phpBB, the language tags are not used in their raw form and instead converted to all lower-case and have the hyphen -
replaced with an underscore _
where appropriate, with some examples below:
-
+
Language tag normalisation examples
@@ -2101,7 +2101,7 @@ if (utf8_case_fold_nfc($string1) == utf8_case_fold_nfc($string2))
For the English language description, the language name is always first and any additional attributes required to describe the subtags within the language code are then listed in order separated with commas and enclosed within parentheses, eg:
-
+
English language description examples for iso.txt
@@ -2153,7 +2153,7 @@ if (utf8_case_fold_nfc($string1) == utf8_case_fold_nfc($string2))
The various Unicode control characters for bi-directional text and their HTML enquivalents where appropriate are as follows:
-
+
Unicode bidirectional control characters & HTML elements/entities
@@ -2219,7 +2219,7 @@ if (utf8_case_fold_nfc($string1) == utf8_case_fold_nfc($string2))
For iso.txt
, the directionality of the text can be explicitly set using special Unicode characters via any of the three methods provided by left-to-right/right-to-left markers/embeds/overrides, as without them, the ordering of characters will be incorrect, eg:
-
+
Unicode bidirectional control characters iso.txt