Every programmer has encountered the issue of having to store objects in a database or cache. You always need some kind of encoding and decoding to turn your complex data into a primitive type that can be interpreted by other systems. Usually, the go-to solution is to use JSON, however, this loses all typing information.
Luckily, PHP has a built-in solution that stores this information and easily unserializes it to any form of PHP object. The format is similar to JSON but with a few special alterations. For a personal project, I needed to dig deeper into how this serialization works and its format. Let’s dive in!
To serialize any data in PHP, the programming language has the well-named function serialize
. You can pass almost any variable or value to this function to have it serialized in the PHP format. The result is something that looks like the example below.
O:15:"App\Models\User":2:{s:8:"username";s:7:"Jerodev";s:3:"age";i:33;}
Scalar values (int
, float
, string
, or bool
) are the easiest to serialize as they always consist of a single value. Each of these is serialized with a prefix depicting the type of the value followed by a colon and the actual value. There is a slight deviation from this format for strings.
serialize(33); // i:33;
serialize(3.14); // d:3.14;
serialize(true); // b:1;
For strings, a number is added between the type and the value that indicates the length of the string that follows.
serialize('lorem ipsum'); // s:11:"lorem ipsum";
null
is not a scalar value, but I include this here as it is serialized to an equal simple string:
serialize(null); // N;
Arrays in PHP are more like maps or dictionaries with key-value pairs and are also serialized as such. Serializing an array starts with an a:
prefix, followed by the length of the array and a list of all key-value elements. This list of elements is just a string of serialized values alternated by key and value. So an array with one single string element will first have an array key 0
followed by the string value.
serialize(['foo']); // a:1:{i:0;s:3:"foo";}
If this array were to contain any more elements, they would simply be added after the foo string. First, the key (probably 1), followed by the value for this key.
Objects are serialized almost as an array where the key is a string containing the property name in the object. In addition, the prefix also contains the fully qualified class name of the object that was serialized. Lastly, the prefix also contains a counter of how many fields the object contains, keep in mind that fields of child objects are also counted. putting this all together, the serialized string for a simple object looks as follows.
class User
{
public string $username = 'jerodev';
public int $age = 33;
}
serialize(new User()); // O:4:"User":2:{s:8:"username";s:7:"Jerodev";s:3:"age";i:33;}
The unserialize
function simply does the exact opposite of the serialize
function. So it should always return exactly the value you serialized. Because of how the serialized string is made up this also works perfectly with objects and namespaces.
> unserialize('O:4:"User":2:{s:8:"username";s:7:"Jerodev";s:3:"age";i:33;}')
= User {#5896}
While unserializing, the function also does a bunch of checks such as verifying that the amount of fields in an object is correct, and checking if the array length and string lengths are correct. If any of these fail the unserialize
function issue an E_WARNING
.
> unserialize('O:4:"User":3:{s:8:"username";s:7:"Jerodev";s:3:"age";i:33;}')
WARNING unserialize(): Unexpected end of serialized data.
WARNING unserialize(): Error at offset 58 of 59 bytes.
= false
If you were to pass a serialized object that does not exist in the current project, unserialize()
will return an object of the type __PHP_Incomplete_Class
with the properties defined in the serialized string
> unserialize('O:7:"Unknown":2:{s:8:"username";s:7:"Jerodev";s:3:"age";i:33;}')
= __PHP_Incomplete_Class(Unknown) {#5893
+username: "Jerodev",
+age: 33,
}
A concern when unserializing objects is that a bad actor could inject another class in the serialized string that has malicious actions. To make sure the unserialized object is an expected object type, you can provide the function a list of allowed classes that may be unserialized. When unserializing objects, this is always recommended.
\unserialize('O:15:"App\Models\User":3:{s:8:"username";s:7:"Jerodev";s:3:"age";i:33;}', [
'allowed_classes' => [
App\Models\User::class,
],
]);
If a class is detected in the serialized string that is not part of the allowed_classes
, a __PHP_Incomplete_Class
will be returned.
PHP contains a few methods that can be added to classes to expand serialization functionality for objects of this class.
If the __sleep
method is defined, it will be called before serializing the object. This method should be used to clean up the object before serialization and must return an array of strings. These strings are all properties that should be serialized for this object. Any property name that is not in the array will be omitted from the serialized string.
You can also define a __wakeup
method. This method is called on the unserialized object directly after creating it. This function can be used to execute any initialization functions before the object is further used in the code.
While the serialization of PHP objects is a great way of storing and restoring data, its greatest problem is that the serialization is proprietary. The string generated in this format cannot be read by any other programming language without creating a parser.
On the other hand, this is currently the best way to store PHP objects without the need for a third-party package.
The rule of thumb seems to be to use this only if you are 100% certain that no other programming language will ever need the data generated by the serialize function. And this is where I was wrong, so now I am creating a parser for serialized PHP objects in Go. 😉