# php-intl > Use when working with multibyte strings, UTF-8 encoding, character encoding conversion, locale-aware formatting (dates, numbers, currencies), transliteration, pluralization, or internationalization. Covers mb_strlen, mb_substr, mb_strtolower, mb_strtoupper, mb_detect_encoding, mb_convert_encoding, iconv, grapheme functions, emoji handling, unicode processing, locale-aware sorting, ICU library, intl extension (Collator, NumberFormatter, IntlDateFormatter, MessageFormatter, Transliterator, Normalizer), currency formatting, date formatting, transliteration, string normalization (NFC, NFD, NFKC, NFKD), and the UTF-8 input-to-output pipeline. - Author: peixotorms - Repository: peixotorms/odinlayer-skills - Version: 20260202221752 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/peixotorms/odinlayer-skills - Web: https://mule.run/skillshub/@@peixotorms/odinlayer-skills~php-intl:20260202221752 --- --- name: php-intl description: Use when working with multibyte strings, UTF-8 encoding, character encoding conversion, locale-aware formatting (dates, numbers, currencies), transliteration, pluralization, or internationalization. Covers mb_strlen, mb_substr, mb_strtolower, mb_strtoupper, mb_detect_encoding, mb_convert_encoding, iconv, grapheme functions, emoji handling, unicode processing, locale-aware sorting, ICU library, intl extension (Collator, NumberFormatter, IntlDateFormatter, MessageFormatter, Transliterator, Normalizer), currency formatting, date formatting, transliteration, string normalization (NFC, NFD, NFKC, NFKD), and the UTF-8 input-to-output pipeline. --- # PHP Internationalization & Encoding ## UTF-8 Everywhere ```php // Set early — affects all mb_* functions mb_internal_encoding('UTF-8'); header('Content-Type: text/html; charset=UTF-8'); // Use mb_* for all string operations on user text $len = mb_strlen($str); // NOT strlen() — counts bytes, not chars $upper = mb_strtoupper($str); // NOT strtoupper() $sub = mb_substr($str, 0, 10); // NOT substr() $pos = mb_strpos($str, $needle); // PHP 8.4+ multibyte trim $clean = mb_trim($str); $clean = mb_ltrim($str); $clean = mb_rtrim($str); ``` | Rule | Detail | |------|--------| | `strlen()` counts bytes | `"\xF0\x9F\x9A\x80"` = 4 bytes, 1 character — use `mb_strlen()` | | `substr()` corrupts multibyte | Use `mb_substr()` for slicing | | `strtoupper()` / `strtolower()` | Only handles ASCII — use `mb_strtoupper()` | | Set `default_charset = UTF-8` | In php.ini — replaces deprecated `mbstring.internal_encoding` | | MySQL needs `utf8mb4` | `utf8` is only 3 bytes, can't store emoji | ## Encoding Detection & Conversion ```php // Convert from unknown encoding to UTF-8 $utf8 = mb_convert_encoding($str, 'UTF-8', 'ISO-8859-1'); // Detect encoding (unreliable — prefers explicit metadata) $enc = mb_detect_encoding($str, ['UTF-8', 'ISO-8859-1', 'Windows-1252'], true); // Validate UTF-8 (strict mode returns false for invalid) if (mb_detect_encoding($str, 'UTF-8', true) === false) { $str = mb_convert_encoding($str, 'UTF-8', 'Windows-1252'); } ``` | Gotcha | Detail | |--------|--------| | `mb_detect_encoding()` guesses | Same bytes can be valid in multiple encodings | | Always declare encoding | HTTP headers, HTML meta, DB connection charset | | `iconv()` varies by system | GNU libiconv is more reliable than glibc | | Regex needs `/u` modifier | Without it, PCRE treats input as single-byte | ## Database & JSON Encoding ```php // MySQL: always use utf8mb4 $pdo = new PDO('mysql:host=localhost;dbname=app;charset=utf8mb4', $user, $pass); // PostgreSQL $pdo = new PDO("pgsql:host=localhost;dbname=app;options='-c client_encoding=UTF8'"); // JSON: all strings must be UTF-8 $json = json_encode($data, JSON_UNESCAPED_UNICODE | JSON_THROW_ON_ERROR); // Handle broken UTF-8 in JSON (PHP 7.2+) $json = json_encode($data, JSON_INVALID_UTF8_SUBSTITUTE); // replaces with U+FFFD ``` ## intl Extension (ICU) ```php // Locale-aware string comparison $coll = Collator::create('de_DE'); $coll->compare('a with umlaut', 'z'); // locale-correct ordering $coll->sort($array); // Number formatting $fmt = NumberFormatter::create('de_DE', NumberFormatter::DECIMAL); echo $fmt->format(1234.56); // "1.234,56" $fmt->formatCurrency(99.99, 'EUR'); // "99,99 EUR" // Date formatting $fmt = IntlDateFormatter::create('fr_FR', IntlDateFormatter::LONG, IntlDateFormatter::SHORT); echo $fmt->format(time()); // "1 fevrier 2026 a 14:30" // Pluralization $msg = '{count, plural, =0{no items} one{# item} other{# items}}'; echo MessageFormatter::formatMessage('en_US', $msg, ['count' => 5]); // "5 items" // Transliteration $t = Transliterator::create('Any-Latin; Latin-ASCII'); echo $t->transliterate('Privet mir'); // transliterated output ``` | Class | Use for | |-------|---------| | `Collator` | Locale-aware sorting and comparison | | `NumberFormatter` | Numbers, currencies, percentages | | `IntlDateFormatter` | Dates/times per locale | | `MessageFormatter` | ICU messages with plurals/gender | | `Transliterator` | Script conversion (Cyrillic to Latin, etc.) | | `Normalizer` | Unicode normalization (NFC, NFD, NFKC, NFKD) | ## HTTP Charset Pipeline ``` Input -> Convert to UTF-8 -> Process with mb_* -> Output with charset header -> Store as utf8mb4 ``` | Stage | Action | |-------|--------| | **Input** | Detect/convert: `mb_convert_encoding($input, 'UTF-8', $detected)` | | **Processing** | Use `mb_*` functions; regex with `/u` modifier | | **HTML output** | `header('Content-Type: text/html; charset=UTF-8')` | | **JSON output** | `json_encode($data, JSON_UNESCAPED_UNICODE)` | | **Database** | `charset=utf8mb4` in DSN; `SET NAMES utf8mb4` | | **Files** | Verify encoding on read; BOM handling if needed |