Multibyte encodings, why do we need functions with the mb prefix in PHP
Often, when developing a web application or site, it is necessary to work with text resources. As a rule, the text has its own encoding, so it is important to use the appropriate functions. Today the most popular encoding is UTF8, it is a multibyte encoding.
What does multibyte encoding mean? This means that more than one byte can be allocated per character. Indeed, all characters are represented by bytes, to encode a character, a certain number of them will be required, and one may not be enough. This is especially true for unusual symbols and letters of any languages. Therefore, multibyte encodings are needed, of course PHP supports them.
There are functions that can independently determine the encoding of the text. You can also specify the desired encoding in them, if necessary. There are some functions that begin with the mb_ prefix . They are specially designed to work with text, mb means multibyte .
Let's see what are the main mb functions in PHP , below are only the most used ones:
- mb_convert_case - changes the case of characters in a line,
- mb_convert_encoding - converts character encoding,
- mb_detect_encoding - character encoding detection,
- mb_internal_encoding - setting or getting the internal encoding of the script,
- mb_ord - gets the character code point,
- mb_split - splitting strings in multibyte encodings using a regular expression
- mb_strcut - get part of a string,
- mb_stripos - case-insensitive search for the position of the first occurrence of one string in another,
- mb_strlen - gets the line length,
- mb_strpos - search for the position of the first occurrence of one line in another,
- mb_strripos - search for the last occurrence of one string in another, case insensitive
- mb_strrpos - search for the position of the last occurrence of one line in another,
- mb_strstr - finds the first occurrence of a substring in a string,
- mb_strtolower - converting the string to lower case,
- mb_strtoupper - converting a string to upper case,
- mb_substr - Returns part of a string.
Thus, it is best to use multibyte encodings for working with text. They allow correct operations with symbols.
Latest articles
- 09.07.22IT / Misc Convert office files DOC, DOCX, DOCM, RTF to DOCX, DOCM, DOC, RTF, PDF, HTML, XML, TXT formats without loss and markup changes
- 07.07.22IT / Safety How to protect PHP, JS, HTML, CSS source code - obfuscation, minification, compression and encryption
- 06.07.22IT / Safety Connection not secure, problem with Lets Encrypt - how to fix expired 09/30/2021 DST Root CA X3, remove it manually and install ISRG Root X1. Example on MS Windows 7
- 08.07.21IT / Misc How to make a free translation for a website without an API, translate documents in Google Translate
- 06.07.21IT / Misc How to make a subscription button on a website, a subscriber base and automatic mailing