backend/memory-management@v1.0.0
article··10 min read

PHP references: the footgun that ships faster than you think

#php #memory #debugging #references #performance #footguns

PHP references are one of the few language features where the PHP manual explicitly warns you not to use them unnecessarily. The warning is warranted. I have debugged three separate production incidents caused by references, and in two of them the original developer was not aware they had introduced a reference at all.

This is not an article about why & is a code smell. It is about understanding exactly what it does — because you will encounter it in legacy codebases, you will occasionally need it, and you will definitely debug a bug caused by it at some point.

What a reference actually is

PHP's default behaviour is copy-on-write: when you assign one variable to another, they initially share the same underlying value in memory. The copy only happens when one of them is modified. This is already quite efficient for reading data.

A reference bypasses copy-on-write entirely. Two variables that are references to the same value share memory regardless of modification. Modifying either one modifies the underlying value that both point to.

$a = 'original';
$b = $a;   // copy-on-write: $b points to the same memory, but...
$b = 'modified';  // ...the copy happens here. $a is still 'original'.
var_dump($a);     // string(8) "original"

$a = 'original';
$b = &$a;  // reference: $b is an alias for the same memory location as $a
$b = 'modified';  // no copy — modifies the underlying value directly
var_dump($a);     // string(8) "modified"  ← $a changed, not $b

The distinction matters because PHP's reference behaviour is not always obvious when you are reading code. References do not look different from regular variables after they are assigned — $b looks the same in both cases. You have to track back to where the & was introduced.

Incident 1: the foreach that corrupted the array

This is the most common reference bug I have seen in production codebases. It appears in code that predates PHP 7 and was never refactored:

$prices = [100, 200, 300, 400, 500];

// Apply a 10% discount to each price
foreach ($prices as &$price) {
    $price = $price * 0.9;
}
// After the loop: $prices = [90, 180, 270, 360, 450] ✓

// Some other code, three lines later, iterates the same array:
foreach ($prices as $price) {
    echo $price . "\n";
}

Expected output: 90, 180, 270, 360, 450. Actual output: 90, 180, 270, 360, 360.

The last element is wrong. Here is why: after the first foreach, $price is still a reference to the last element of $prices — the value at index 4. The second foreach assigns each value to $price in turn. When it assigns the fourth value (360) to $price, it writes 360 to $prices[4]. Then it tries to read $prices[4] for the fifth iteration — and finds 360, not 450.

// The fix
foreach ($prices as &$price) {
    $price = $price * 0.9;
}
unset($price);  // break the reference before the variable goes out of scope

unset($price) does not destroy the last element of the array. It destroys the reference link between $price and $prices[4]. After unset($price), $prices[4] is still 450 (× 0.9 = 405), and $price is a freed variable.

In every codebase I have seen this bug, the unset($price) was missing. PHP's documentation explicitly mentions this. It is still missing in codebases today.

Incident 2: the function that silently mutated the caller's data

A data transformation pipeline had a function that was supposed to normalise product data. It was called with large arrays, and someone added & to avoid copying:

// Original: safe, no side effects
function normaliseProduct(array $product): array
{
    $product['title'] = trim(strtolower($product['title']));
    $product['price'] = round($product['price'] * 100) / 100;
    return $product;
}

// "Optimised" version: unsafe
function normaliseProduct(array &$product): void
{
    $product['title'] = trim(strtolower($product['title']));
    $product['price'] = round($product['price'] * 100) / 100;
}

The caller:

$products = $this->repository->findAll();

foreach ($products as $product) {
    $normalised = normaliseProduct($product);
    // In the original: $normalised is a modified copy, $product is unchanged
    // In the "optimised" version: $product is modified in place
    // $normalised does not exist — the function returns void

    $this->cache->store($product['id'], $normalised);  // now stores null
}

The cached data was null for every product. The reporting system that read the cache showed nothing. Nobody noticed for two days because the primary read path hit the database, not the cache.

The reference "optimisation" saved approximately zero memory — PHP arrays use copy-on-write anyway, and the function only reads two keys. It introduced a silent mutation and removed the return value that callers depended on.

When references are actually correct

References are appropriate in exactly two situations I have encountered:

1. Large data structures modified in place in a recursive algorithm.

If you are traversing and modifying a deeply nested array (XML parsing, tree manipulation), passing by reference avoids copying the entire structure at each recursion depth. This is a genuine performance concern only at meaningful scale — I would not reach for it under 10MB of data.

function flattenTree(array &$node, array &$result): void
{
    $result[] = $node['value'];
    foreach ($node['children'] as &$child) {
        flattenTree($child, $result);
    }
}

2. Output parameters in low-level C-extension-style functions.

preg_match() uses a reference for the matches array. fscanf() uses references for parsed values. These are C-heritage interfaces. If you are writing a function that genuinely needs multiple return values and cannot return a structured object (rare in modern PHP), a reference output parameter is one option — though returning a typed DTO or [$value, $error] tuple is usually cleaner.

The object reference misconception

A very common misunderstanding: objects in PHP are already "passed by reference." They are not. Objects are passed by handle — a pointer to the object, not the object itself. Reassigning the handle inside a function does not affect the caller's handle. Modifying the object through the handle does.

class Counter
{
    public int $count = 0;
}

function increment(Counter $counter): void
{
    $counter->count++;       // modifies the object — caller sees this
    $counter = new Counter;  // reassigns the handle — caller does NOT see this
}

$c = new Counter;
increment($c);
var_dump($c->count);  // int(1) — the increment happened, the reassignment did not

If you need to replace the caller's object reference entirely (rare but real), you use &:

function replaceWithFresh(Counter &$counter): void
{
    $counter = new Counter;  // now the caller's $c is replaced
}

I have seen this used intentionally once, in an object pooling implementation. In normal application code, it is a signal to redesign the API.

What I watch for in code review

When I see & in a function signature or a foreach, I stop and read the surrounding twenty lines carefully. The questions:

  • Is this reference still active after the loop? (unset() check)
  • Does the caller expect this function to have no side effects on the argument?
  • Is the performance justification real, or is it premature optimisation?

A reference in user-land PHP code is a yellow flag. Not because it is always wrong, but because it means the code is relying on aliasing semantics that are non-obvious to the next reader — and "non-obvious to the next reader" is where bugs live.

end of node