diff --git a/phpBB/docs/coding-guidelines.html b/phpBB/docs/coding-guidelines.html index 7e949d2d90..3a92f76353 100644 --- a/phpBB/docs/coding-guidelines.html +++ b/phpBB/docs/coding-guidelines.html @@ -61,6 +61,36 @@ h3 { margin-left: 20px; } +.paragraph table { + font-size: 8pt; + border-collapse: collapse; + border: 1px solid #006699; +} + +.paragraph table caption { + display: none; +} + +.paragraph table thead { + background-color: #D1D7DC; +} + +.paragraph table td, .paragraph table th { + border: 1px solid #006699; + padding: 0.5em; +} + +.paragraph table td dl { + margin: 0; + padding: 0; +} + +.paragraph table td dl dt { + float: left; + clear: both; + margin-right: 1em; +} + /* Structure */ #logo { background: #fff url(header_bg.jpg) repeat-x top right; @@ -186,6 +216,13 @@ p a {
  • Styling
  • Templating
  • +
  • Translation (i18n/L10n) Guidelines +
      +
    1. Standardisation
    2. +
    3. Other considerations
    4. +
    5. Writing Style
    6. +
    +
  • Guidelines Changelog
  • @@ -1505,9 +1542,688 @@ div
    -

    5. Guidelines Changelog

    +

    5. Translation (i18n/L10n) Guidelines

    + + 5.i. Standardisation +

    +
    + +

    Reason:

    + +

    phpBB is one of the most translated OpenSource projects, with the current stable version being available in over 60 localisations. Whilst the ad hoc approach to the naming of language packs has worked, for phpBB3 and beyond we hope to make this process saner which will allow for better interoperation with current and future web browsers.

    + +

    Encoding:

    + +

    With phpBB3, the output encoding for the forum in now UTF-8, a Universal Character Encoding by the Unicode Consortium that is by design a superset to US-ASCII and ISO-8859-1. By using one character set which simultaenously supports all scripts which previously would have required different encodings (eg: ISO-8859-1 to ISO-8859-15 (Latin, Greek, Cyrillic, Thai, Hebrew, Arabic); GB2312 (Simplified Chinese); Big5 (Traditional Chinese), EUC-JP (Japanese), EUC-KR (Korean), VISCII (Vietnamese); et cetera), this removes the need to convert between encodings and improves the accessibility of multilingual forums.

    + +

    The impact is that the language files for phpBB must now also be encoded as UTF-8, with a caveat that the files must not contain a BOM for compatibility reasons with non-Unicode aware versions of PHP. For those with forums using the Latin character set (ie: most European languages), this change is transparent since UTF-8 is superset to US-ASCII and ISO-8859-1.

    + +

    Language Tag:

    + +

    The IETF recently published RFC 4646 for tags used to identify languages, which in combination with RFC 4647 obseletes the older RFC 3006 and older-still RFC 1766. RFC 4646 uses ISO 639-1/ISO 639-2, ISO 3166-1 alpha-2, ISO 15924 and UN M.49 to define a language tag. Each complete tag is composed of subtags which are not case sensitive and can also be empty.

    + +

    Ordering of the subtags in the case that they are all non-empty is: language-script-region-variant-extension-privateuse. Should any subtag be empty, its corresponding hyphen would also be ommited. Thus, the language tag for English will be en and not en-----.

    + +

    Most language tags consist of a two- or three-letter language subtag (from ISO 639-1/ISO 639-2). Sometimes, this is followed by a two-letter or three-digit region subtag (from ISO 3166-1 alpha-2 or UN M.49). Some examples are:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Language tag examples
    Language tagDescriptionComponent subtags
    enEnglishlanguage
    masMasailanguage
    fr-CAFrench as used in Canadalanguage+region
    en-833English as used in the Isle of Manlanguage+region
    zh-HansChinese written with Simplified scriptlanguage+script
    zh-Hant-HKChinese written with Traditional script as used in Hong Konglanguage+script+region
    de-AT-1996German as used in Austria with 1996 orthographylanguage+region+variant
    + +

    The ultimate aim of a language tag is to convey the needed useful distingushing information, whilst keeping it as short as possible. So for example, use en, fr and ja as opposed to en-GB, fr-FR and ja-JP, since we know English, French and Japanese are the native language of Great Britain, France and Japan respectively.

    + +

    Next is the ISO 15924 language script code and when one should or shouldn't use it. For example, whilst en-Latn is syntaxically correct for describing English written with Latin script, real world English writing is more-or-less exclusively in the Latin script. For such languages like English that are written in a single script, the IANA Language Subtag Registry has a "Suppress-Script" field meaning the script code should be ommitted unless a specific language tag requires a specific script code. Some languages are written in more than one script and in such cases, the script code is encouraged since an end-user may be able to read their language in one script, but not the other. Some examples are:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Language subtag + script subtag examples
    Language tagDescriptionComponent subtags
    en-BraiEnglish written in Braille scriptlanguage+script
    en-DsrtEnglish written in Deseret (Mormon) scriptlanguage+script
    sr-LatnSerbian written in Latin scriptlanguage+script
    sr-CyrlSerbian written in Cyrillic scriptlanguage+script
    mn-MongMongolian written in Mongolian scriptlanguage+script
    mn-CyrlMongolian written in Cyrillic scriptlanguage+script
    mn-PhagMongolian written in Phags-pa scriptlanguage+script
    az-Cyrl-AZAzerbaijani written in Cyrillic script as used in Azerbaijanlanguage+script+region
    az-Latn-AZAzerbaijani written in Latin script as used in Azerbaijanlanguage+script+region
    az-Arab-IRAzerbaijani written in Arabic script as used in Iranlanguage+script+region
    + +

    Usage of the three-digit UN M.49 code over the two-letter ISO 3166-1 alpha-2 code should hapen if a macro-geographical entity is required and/or the ISO 3166-1 alpha-2 is ambiguous.

    + +

    Examples of English using marco-geographical regions:

    + + + + + + + + + + + + + + + + + + + + + + + +
    Coding for English using macro-geographical regions
    ISO 639-1/ISO 639-2 + ISO 3166-1 alpha-2ISO 639-1/ISO 639-2 + UN M.49 (Example macro regions)
    en-AU
    English as used in Australia
    en-053
    English as used in Australia & New Zealand
    en-009
    English as used in Oceania
    en-NZ
    English as used in New Zealand
    en-FJ
    English as used in Fiji
    en-054
    English as used in Melanesia
    + +

    Examples of Spanish using marco-geographical regions:

    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Coding for Spanish macro-geographical regions
    ISO 639-1/ISO 639-2 + ISO 3166-1 alpha-2ISO 639-1/ISO 639-2 + UN M.49 (Example macro regions)
    es-PR
    Spanish as used in Puerto Rico
    es-419
    Spanish as used in Latin America & the Caribbean
    es-019
    Spanish as used in the Americas
    es-HN
    Spanish as used in Honduras
    es-AR
    Spanish as used in Argentina
    es-US
    Spanish as used in United States of America
    es-021
    Spanish as used in North America
    + +

    Example of where the ISO 3166-1 alpha-2 is ambiguous and why UN M.49 might be preferred:

    + + + + + + + + + + + + + + + + + + + + + +
    Coding for ambiguous ISO 3166-1 alpha-2 regions
    CS assignment pre-1994CS assignment post-1994
    +
    +
    CS
    Czechoslovakia (ISO 3166-1)
    +
    200
    Czechoslovakia (UN M.49)
    +
    +
    +
    +
    CS
    Serbian & Montenegro (ISO 3166-1)
    +
    891
    Serbian & Montenegro (UN M.49)
    +
    +
    +
    +
    CZ
    Czech Republic (ISO 3166-1)
    +
    203
    Czech Republic (UN M.49)
    +
    +
    +
    +
    SK
    Slovakia (ISO 3166-1)
    +
    703
    Slovakia (UN M.49)
    +
    +
    +
    +
    RS
    Serbia (ISO 3166-1)
    +
    688
    Serbia (UN M.49)
    +
    +
    +
    +
    ME
    Montenegro (ISO 3166-1)
    +
    499
    Montenegro (UN M.49)
    +
    +
    + +

    Macro-languages & Topolects:

    + +

    RFC 4646 anticipates features which shall be available in (currently draft) ISO 639-3 which aims to provide as complete enumeration of languages as possible, including living, extinct, ancient and constructed languages, whether majour, minor or unwritten. A new feature of ISO 639-3 compared to the previous two revisions is the concept of macrolanguages where Arabic and Chinese are two such examples. In such cases, their respective codes of ar and zh is very vague as to which dialect/topolect is used or perhaps some terse classical variant which may be difficult for all but very educated users. For such macrolanguages, it is recommended that the sub-language tag is used as a suffix to the macrolanguage tag, eg:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Macrolanguage subtag + sub-language subtag examples
    Language tagDescriptionComponent subtags
    zh-cmnMandarin (Putonghau/Guoyu) Chinesemacrolanguage+sublanguage
    zh-yueYue (Cantonese) Chinesemacrolanguage+sublanguage
    zh-cmn-HansMandarin (Putonghau/Guoyu) Chinese written in Simplified scriptmacrolanguage+sublanguage+script
    zh-cmn-HantMandarin (Putonghau/Guoyu) Chinese written in Traditional scriptmacrolanguage+sublanguage+script
    zh-nan-Latn-TWMinnan (Hoklo) Chinese written in Latin script (POJ Romanisation) as used in Taiwanmacrolanguage+sublanguage+script+region
    + +
    + Top +

    + + 5.ii. Other considerations +

    +
    + +

    Normalisation of language tags for phpBB:

    + +

    For phpBB, the language tags are not used in their raw form and instead converted to all lower-case and have the hyphen - replaced with an underscore _ where appropiate, with some examples below:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Language tag normalisation examples
    Raw language tagDescriptionValue of USER_LANG
    in ./common.php
    Language pack directory
    name in /language/
    enBritish Englishenen
    de-ATGerman as used in Austriade-atde_at
    es-419Spanish as used in Latin America & Caribbeanen-419en_419
    zh-yue-Hant-HKCantonese written in Traditional script as used in Hong Kongzh-yue-hant-hkzh_yue_hant_hk
    + +

    How to use iso.txt:

    + +

    The iso.txt file is a small UTF-8 encoded plain-text file which consists of three lines:

    + + + +

    Because language tags themselves are meant to be machine read, they can be rather obtuse to humans and why descriptive strings as provided by iso.txt are needed. Whilst en-US could be fairly easily deduced to be "English as used in the United States", de-CH is more difficult less one happens to know that de is from "Deutsch", German for "German" and CH is the abbreviation of the official Latin name for Switzerland, "Confoederatio Helvetica".

    + +

    For the English language description, the language name is always first and any additional attributes required to describe the subtags within the language code are then listed in order separated with commas and enclosed within parentheses, eg:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    English language description examples for iso.txt
    Raw language tagEnglish description within iso.txt
    enBritish English
    en-USEnglish (United States)
    en-053English (Australia & New Zealand)
    deGerman
    de-CH-1996German (Switzerland, 1996 orthography)
    gws-1996Swiss German (1996 orthography)
    zh-cmn-Hans-CNMandarin Chinese (Simplified, Mainland China)
    zh-yue-Hant-HKCantonese Chinese (Traditional, Hong Kong)
    + +

    For the localised language description, just translate the English version though use whatever appropiate punctuation typical for your own locale, assuming the language uses punctuation at all.

    + +

    Unicode bi-directional considerations:

    + +

    Because phpBB is now UTF-8, all translators must take into account that certain strings may be shown when the directionality of the document is either opposite to normal or is ambiguous.

    + +

    The various Unicode control characters for bi-directional text and their HTML enquivalents where appropiate are as follows:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Unicode bidirectional control characters & HTML elements/entities
    Unicode character
    abbreviation
    Unicode
    code-point
    Unicode character
    name
    Equivalent HTML
    markup/entity
    Raw character
    (enclosed between '')
    LRMU+200ELeft-to-Right Mark‎'‎'
    RLMU+200FRight-to-Left Mark‏'‏'
    LREU+202ALeft-to-Right Embeddingdir="ltr"'‪'
    RLEU+202BRight-to-Left Embeddingdir="rtl"'‫'
    PDFU+202CPop Directional Formatting</bdo>'‬'
    LROU+202DLeft-to-Right Override<bdo dir="ltr">'‭'
    RLOU+202ERight-to-Left Override<bdo dir="rtl">'‮'
    + +

    For iso.txt, the directionality of the text can be explicitly set using special Unicode characters via any of the three methods provided by left-to-right/right-to-left markers/embeds/overrides, as without them, the ordering of characters will be incorrect, eg:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Unicode bidirectional control characters iso.txt
    DirectionalityRaw character viewDisplay of localised
    description in iso.txt
    Ordering
    dir="ltr"English (Australia & New Zealand)English (Australia & New Zealand)Correct
    dir="rtl"English (Australia & New Zealand)English (Australia & New Zealand)Incorrect
    dir="rtl" with LRMEnglish (Australia & New Zealand)U+200EEnglish (Australia & New Zealand)‎Correct
    dir="rtl" with LRE & PDFU+202AEnglish (Australia & New Zealand)U+202C‪English (Australia & New Zealand)‬Correct
    dir="rtl" with LRO & PDFU+202DEnglish (Australia & New Zealand)U+202C‭English (Australia & New Zealand)‬Correct
    + +

    In choosing which of the three methods to use, in the majority of cases, the LRM or RLM to put a "strong" character to fully enclose an ambiguous punctuation character and thus make it inherit the correct directionality is sufficient.

    +

    Within some cases, there may be mixed scripts of a left-to-right and right-to-left direction, so using LRE & RLE with PDF may be more appropiate. Lastly, in very rare instances where directionality must be forced, then use LRO & RLO with PDF.

    +

    For further information on authoring techniques of bi-directional text, please see the W3C tutorial on authoring techniques for XHTML pages with bi-directional text.

    + +

    Working with placeholders:

    + +

    As phpBB is translated into languages with different ordering rules to that of English, it is possible to show specific values in any order deemed appropiate. Take for example the extremely simple "Page X of Y", whilst in English this could just be coded as:

    + +
    +	...
    +'PAGE_OF'	=>	'Page %s of %s',
    +		/* Just grabbing the replacements as they
    +		come and hope they are in the right order */
    +	...
    +	
    + +

    … a clearer way to show explicit replacement ordering is to do:

    + +
    +	...
    +'PAGE_OF'	=>	'Page %1$s of %2$s',
    +		/* Explicit ordering of the replacements,
    +		even if they are the same order as English */
    +	...
    +	
    + +

    Why bother at all? Because some languages, the string transliterated back to English might read something like:

    + +
    +	...
    +'PAGE_OF'	=>	'Total of %2$s pages, currently on page %1$s',
    +		/* Explicit ordering of the replacements,
    +		reversed compared to English as the total comes first */
    +	...
    +	
    + +
    + Top +

    + + 5.iii. Writing Style +

    +
    + +

    Miscellaneous tips & hints:

    + +

    As the language files are PHP files, where the various strings for phpBB are stored within an array which in turn are used for display within an HTML page, rules of syntax for both must be considered. Potentially problematic characters are: ' (straight quote/apostrophe), " (straight double quote), < (less-than sign), > (greater-than sign) and & (ampersand).

    + +

    // Bad - The un-escapsed straight-quote/apostrophe will throw a PHP parse error + +

    +	...
    +'CONV_ERROR_NO_AVATAR_PATH'
    +	=>	'Note to developer: you must specify $convertor['avatar_path'] to use %s.',
    +	...
    +	
    + +

    // Good - Literal straight quotes should be escaped with a backslash, ie: \ + +

    +	...
    +'CONV_ERROR_NO_AVATAR_PATH'
    +	=>	'Note to developer: you must specify $convertor[\'avatar_path\'] to use %s.',
    +	...
    +	
    + +

    However, because phpBB3 now uses UTF-8 as its sole encoding, we can actually use this to our advantage and not have to remember to escape a straight quote when we don't have to:

    + +

    // Bad - The un-escapsed straight-quote/apostrophe will throw a PHP parse error + +

    +	...
    +'USE_PERMISSIONS'	=>	'Test out user's permissions',
    +	...
    +	
    + +

    // Okay - However, non-programmers wouldn't type "user\'s" automatically + +

    +	...
    +'USE_PERMISSIONS'	=>	'Test out user\'s permissions',
    +	...
    +	
    + +

    // Best - Use the Unicode Right-Single-Quotation-Mark character + +

    +	...
    +'USE_PERMISSIONS'	=>	'Test out user’s permissions',
    +	...
    +	
    + +

    The " (straight double quote), < (less-than sign) and > (greater-than sign) characters can all be used as displayed glyphs or as part of HTML markup, for example:

    + +

    // Bad - Invalid HTML, as segments not part of elements are not entitised + +

    +	...
    +'FOO_BAR'	=>	'PHP version < 4.3.3.<br />
    +	Visit "Downloads" at <a href="http://www.php.net/">www.php.net</a>.',
    +	...
    +	
    + +

    // Okay - No more invalid HTML, but "&quot;" is rather clumsy + +

    +	...
    +'FOO_BAR'	=>	'PHP version &lt; 4.3.3.<br />
    +	Visit &quot;Downloads&quot; at <a href="http://www.php.net/">www.php.net</a>.',
    +	...
    +	
    + +

    // Best - No more invalid HTML, and usage of correct typographical quotation marks + +

    +	...
    +'FOO_BAR'	=>	'PHP version &lt; 4.3.3.<br />
    +	Visit “Downloads” at <a href="http://www.php.net/">www.php.net</a>.',
    +	...
    +	
    + +

    Lastly, the & (ampersand) must always be entitised regardless of where it is used:

    + +

    // Bad - Invalid HTML, none of the ampersands are entitised + +

    +	...
    +'FOO_BAR'	=>	'<a href="http://somedomain.tld/?foo=1&bar=2">Foo & Bar</a>.',
    +	...
    +	
    + +

    // Good - Valid HTML, amperands are correctly entitised in all cases + +

    +	...
    +'FOO_BAR'	=>	'<a href="http://somedomain.tld/?foo=1&amp;bar=2">Foo &amp; Bar</a>.',
    +	...
    +	
    + +

    As for how these charcters are entered depends very much on choice of Operating System, current language locale/keyboard configuration and native abilities of the text editor used to edit phpBB language files. Please see http://en.wikipedia.org/wiki/Unicode#Input_methods for more information.

    + +

    Spelling, punctuation, grammar, et cetera:

    + +

    The default language pack bundled with phpBB is British English using Cambridge University Press spelling and is assigned the language code en. The style and tone of writing tends towards formal and translations should emulate this style, at least for the variant using the most compact language code. Less formal translations or those with colloquialisms must be denoted as such via either an extension or privateuse tag within its language code.

    + +
    +Top +

    + +
    + +

    6. Guidelines Changelog

    +

    Revision 1.16

    + + + +

    Revision 1.11-1.15

    + + +

    Revision 1.9-1.10