diff --git a/phpBB/docs/coding-guidelines.html b/phpBB/docs/coding-guidelines.html index 14deabf135..d7d40d926e 100644 --- a/phpBB/docs/coding-guidelines.html +++ b/phpBB/docs/coding-guidelines.html @@ -3,7 +3,7 @@ Coding Guidelines - + @@ -215,6 +215,7 @@ p a {
  • Styling
  • Templating
  • +
  • Character Sets and Encodings
  • Translation (i18n/L10n) Guidelines
    1. Standardisation
    2. @@ -1558,9 +1559,83 @@ div
      -

      5. Translation (i18n/L10n) Guidelines

      +

      5. Character Sets and Encodings

      - 5.i. Standardisation +
      + +

      What are Unicode, UCS and UTF-8?

      +

      The Universal Character Set (UCS) described in ISO/IEC 10646 consists of a large amount of characters. Each of them has a unique name and a code point which is an integer number. Unicode - which is an industry standard - complements the Universal Character Set with further information about the characters' properties and alternative character encodings. More information on Unicode can be found on the Unicode Consortium's website. One of the Unicode encodings is the 8-bit Unicode Transformation Format (UTF-8). It encodes characters with up to four bytes aiming for maximum compatability with the American Standard Code for Information Interchange which is a 7-bit encoding of a relatively small subset of the UCS.

      + +

      phpBB's use of Unicode

      +

      Unfortunately PHP does not faciliate the use of Unicode prior to version 6. Most functions simply treat strings as sequences of bytes assuming that each character takes up exactly one byte. This behaviour still allows for storing UTF-8 encoded text in PHP strings but many operations on strings have unexpected results. To circumvent this problem we have created some alternative functions to PHP's native string operations which use code points instead of bytes. These functions can be found in /includes/utf/utf_tools.php. They are also covered in the phpBB3 Sourcecode Documentation. A lot of native PHP functions still work with UTF-8 as long as you stick to certain restrictions. For example explode still works as long as the first and the last character of the delimiter string are ASCII characters.

      + +

      phpBB only uses the ASCII and the UTF-8 character encodings. Still all Strings are UTF-8 encoded because ASCII is a subset of UTF-8. The only exceptions to this rule are code sections which deal with external systems which use other encodings and character sets. Such external data should be converted to UTF-8 using the utf8_recode() function supplied with phpBB. It supports a variety of other character sets and encodings, a full list can be found below.

      + +

      With request_var() you can either allow all UCS characters in user input or restrict user input to ASCII characters. This feature is controlled by the function's third parameter called $multibyte. You should allow multibyte characters in posts, PMs, topic titles, forum names, etc. but it's not necessary for internal uses like a $mode variable which should only hold a predefined list of ASCII strings anyway.

      + +
      +// an input string containing a multibyte character
      +$_REQUEST['multibyte_string'] = 'Käse';
      +
      +// print request variable as a UTF-8 string allowing multibyte characters
      +echo request_var('multibyte_string', '', true);
      +// print request variable as ASCII string
      +echo request_var('multibyte_string', '');
      +
      + +

      This code snippet will generate the following output:

      + +
      +Käse
      +K??se
      +
      + +

      Unicode Normalization

      + +

      If you retrieve user input with multibyte characters you should additionally normalize the string using utf8_normalize_nfc() before you work with it. This is necessary to make sure that equal characters can only occur in one particular binary representation. For example the character Å can be represented either as U+00C5 (LATIN CAPITAL LETTER A WITH RING ABOVE) or as U+212B (ANGSTROM SIGN). phpBB uses Normalization Form Canonical Composition (NFC) for all text. So the correct version of the above example would look like this:

      + +
      +$_REQUEST['multibyte_string'] = 'Käse';
      +
      +echo utf8_normalize_nfc(request_var('multibyte_string', '', true));
      +echo request_var('multibyte_string', '');
      +
      + +

      Case Folding

      + +

      Case insensitive comparison of strings is no longer possible with strtolower or strtoupper as some characters have multiple lower case or multiple upper case forms depending on their position in a word. So instead you should use case folding which gives you a case insensitive version of the string which can be used for case insensitive comparisons. An NFC normalized string can be case folded using utf8_case_fold_nfc().

      + +

      // Bad - The strings might be the same even if strtolower differs

      + +
      +if (strtolower($string1) == strtolower($string2))
      +{
      +	echo '$string1 and $string2 are equal or differ in case';
      +}
      +
      + +

      // Good - Case folding is really case insensitive

      + +
      +if (utf8_case_fold_nfc($string1) == utf8_case_fold_nfc($string2))
      +{
      +	echo '$string1 and $string2 are equal or differ in case';
      +}
      +
      + +

      Confusables Detection

      + +

      phpBB offers a special method utf8_clean_string which can be used to make sure string identifiers are unique. This method uses Normalization Form Compatibility Composition (NFKC) instead of NFC and replaces similarly looking characters with a particular representative of the equivalence class. This method is currently used for usernames and group names to avoid confusion with similarly looking names.

      + +
      +Top +

      + +
      + +

      6. Translation (i18n/L10n) Guidelines

      + + 6.i. Standardisation

      @@ -1854,7 +1929,7 @@ div Top

      - 5.ii. Other considerations + 6.ii. Other considerations

      @@ -2118,7 +2193,7 @@ div Top

      - 5.iii. Writing Style + 6.iii. Writing Style

      @@ -2229,13 +2304,19 @@ div
      -

      6. Guidelines Changelog

      +

      7. Guidelines Changelog

      +

      Revision 1.24

      + + +

      Revision 1.16

      Revision 1.11-1.15

      diff --git a/phpBB/includes/message_parser.php b/phpBB/includes/message_parser.php index fda31a20e3..c2d40670af 100644 --- a/phpBB/includes/message_parser.php +++ b/phpBB/includes/message_parser.php @@ -1371,7 +1371,7 @@ class parse_message extends bbcode_firstpass include_once($phpbb_root_path . 'includes/functions_admin.' . $phpEx); $index = array_keys(request_var('delete_file', array(0 => 0))); - $index = (!empty($index[0])) ? $index[0] : false; + $index = (!empty($index)) ? $index[0] : false; if ($index !== false && !empty($this->attachment_data[$index])) { diff --git a/phpBB/includes/utf/utf_tools.php b/phpBB/includes/utf/utf_tools.php index f8156fb8d2..a3499062fe 100644 --- a/phpBB/includes/utf/utf_tools.php +++ b/phpBB/includes/utf/utf_tools.php @@ -908,8 +908,8 @@ function utf8_recode($string, $encoding) } // Trigger an error?! Fow now just give bad data :-( - //trigger_error('Unknown encoding: ' . $encoding, E_USER_ERROR); - return $string; + trigger_error('Unknown encoding: ' . $encoding, E_USER_ERROR); + //return $string; // use utf_normalizer::cleanup() ? } /**