Thinking about key names in PHP

By Mark 22nd May, 2015

PHP, Software

I recently had to write some code which built quite a large structured array in PHP.

I know that PHP performs some amount of internal variable optimisation – for example copying a variable by value won’t create a new instance of that variable’s contents in memory until the variable is modified, which keeps the memory footprint down for a lot of common situations.

I was wondering if there was a similar optimisation for array keys, whereby the key name was stored in only one place in memory, and internally the array index was a fixed-size pointer to this key. If this were the case, choice of key names would make no difference to memory usage when you scale it up.

So, I wrote a quick script (code below) to test this. It compares the memory usage of an array which re-uses the same key thousands of times – once with a very short key name and once with a very long key name. I also added a baseline version where no key name was specified (i.e. numeric indexes were used).

Here are the results:

Start Usage:           363,824
After no keys:       2,909,624
Array reset:           363,904
After short keys:    2,909,616
Array reset:           363,904
After long keys:     3,229,608

From this, it appears that each instance of the key is being stored separately, and therefore when dealing with large, structured arrays, short key names will be preferable if memory-usage is a concern. This may appear obvious, and in some senses it is, but PHP is not necessarily an obvious language, particularly when it comes to the most efficient way of representing data.

This is a bit of a blow, as I strongly feel that maintainability is the most important factor in software design and really don’t want to encourage the use of cryptic short names for array keys ($arr['Children'] is a lot clearer than $arr['c'], for example). However, for very large arrays it may be that this becomes a practical necessity.

Important Note: I don’t recommend switching to short array keys as standard practice, and in fact strongly encourage meaningful identifiers in all cases. The memory saving will be small and the detrimental affect on maintainability will be big. This advice should only be followed for very large structured arrays, and even then you should profile to see how much difference it makes (e.g. using a variant of the script, below), before committing to cryptifying your code.

These tests were carried out on PHP 5.3.8. It will be interesting to see if the memory requirements are improved by later PHP versions.

Source Code