Upgrading PHP upgrades

@crell · 2022-12-09 21:52 · PHP

PHP 8.2 was released on 8 December, to much fanfare. And, as always, to much wailing and gnashing of teeth about how the PHP language is evolving too quickly and breaking everyone's code. More specifically, it was the earlier, twin announcement that PHP 7.4 reached end-of-life on 28 November, as that has, somehow, forced everyone to suddenly rewrite their entire code base in a hurry.

And... while I sympathize with some of the complaints, I am once again left wondering "how?"

This question is inspired in part by a recent post by Ed Barnard as part of 24 Days in December, PHP's advent blog series. To be crystal clear: I like Ed, he's a good guy, very skilled, and this is in no way, shape, or form an attack on him. But I am confused by his assertions, which echo those made by many others this time every year.

Some choice quotes from the post:

PHP 8 has, in my view, mandated that the way we design our PHP software must change for the better.

Remember, it’s not the deprecations–it’s that PHP 8 is no longer PHP.

Ed laments that it used to be easy to write code that worked from early PHP 5 through late PHP 7, but somehow that is no longer possible. That's quite simply not true.

He also falls back on the tired trope that PHP 5 and earlier were the tool of casual non-computer scientists, and then the computer scientists came in and ruined everything by making the language... consistent? Less error prone? This has always been a nonsensical argument, trotted out any time someone doesn't like an improvement in any software. (We heard it a ton when rebuilding Drupal 7 to 8 about how all the fancy computer science types were forcing these fancy PhD-only features down people's throats, like... classes.)

The gist of the article is summed up in this line:

We’re forced into a rewrite, or something very like a rewrite, while at the same time remaining in production and producing new features to deal with rapid growth.

And I have to ask... how?

Before I go further, let's get a few common arguments (either direction) out of the way:

Open Source is not about you

Free Software and Open Source Software are great. I have built my career on them. But an important aspect of Open Source is that in most cases, you're getting someone else's work for free. You have paid nothing. That means, guess what, you're entitled to nothing. Absolutely nothing. The author(s) of the software you're using don't owe you even a response on GitHub, much less support, unless you have some additional contract with them. If you want some kind of guaranteed support, pony up and pay them.

Neither is there any "moral" expectation of indefinite support. That is an entirely made-up concept, mostly by self-interested companies that want the benefits of someone else's free code and none of the liability of not having a "throat to choke" vendor they're paying.

The only reason anyone contributing to the PHP engine itself cares what people think about BC breaks is because either 1) they use PHP themselves and don't want to break their own work or 2) they get warm-and-fuzzies from seeing people using their work. (Or often both.) So that's the level and scope we're talking about.

PHP release managers sign on for 3.5 years of volunteer work to to fix bugs and security issues for free. (Or at least coordinate such fixing.) That's a lot of free labor. Asking them to sign on for longer without pay is abusive entitlement.

"Never break anything" is not a strategy

Or, rather, it cannot be a strategy for a tool like PHP. Rust or Go may be able to get away with that, because they were actually designed, and designed with a small footprint, and with a careful plan for evolution. PHP was none of those things; PHP began life as a series of useful template hacks that outgrew their baby pants and became turing complete. Those hacks are almost universally recognized as being antithetical to stable software. That's been known for 20 years, in most cases.

PHP has technical debt that it needs to clean up. It is completely unreasonable to expect a project to never address technical debt so that you never have to do work, doubly so when you're not paying for it.

If PHP is going to be used to build reliable software, it needs to be a reliable language. And that means, somehow, removing or addressing the inherently-unreliable bits. The question isn't if, but how.

image.png

[via: https://xkcd.com/1172/]

Commercial support is available

If for whatever reason your system cannot be upgraded off an old version of PHP... so be it. PHP 7.4 doesn't suddenly become a buggy mess the instant the volunteers stop agreeing to fix security issues if they are found.

And if you really need support for older versions, that's available. Companies like Zend offer paid LTS support. Many Linux distributions will maintain select versions of PHP (whatever they shipped with) for many years past when the PHP team does, backporting security fixes as needed.

Deprecations are not breaks, yet

Most changes to PHP that have BC implications are first deprecated, meaning nothing changes in their behavior but they will trigger a deprecation message. By default, deprecations do not stop execution; they just leave a note in your log. That means in most cases developers have years to update their code before the behavior actually changes.

Some, perhaps many, systems, however, are configured to treat all deprecations as errors. These systems are Doing It Wrong(tm). Period. They are inventing problems that do not exist, and the fault lies not with the PHP team but with the people who configure their systems wrong.

PHPUnit used to also treat deprecations as test fails by default. It no longer does. It never should have, but it doesn't anymore, so that's no longer an excuse.

Code is an opex

In accounting, there's the concept of a "capital expenditure" (capex) and an "operating expenditure" (opex). A capex is a one-time cost, like buying a new printer. An opex is an ongoing cost that needs to be budgeted for, like the paper and ink for the printer.

Most code is an opex. Many accountants still like to treat it as a capex. They are wrong. All code entails ongoing maintenance. Companies that forget this suddenly find themselves unable to reboot their systems because they fired their developers (or they retired). If you don't schedule time and cost for system maintenance, your system will schedule it for you. That includes routine upgrades for security fixes if nothing else.

Calling yourself a "fast moving startup" doesn't change that fact, and a manager at a fast moving startup that doesn't get the need for maintenance will soon find himself a manager and a fast failing startup. (Or jump ship to another fast moving startup where he can screw up again, but that's a different blog post.)

Open source developers matter, too

Conversely, though, I will give a lot more leeway to people maintaining Open Source PHP libraries than to companies using PHP commercially. They are also often and largely volunteers; Many OSS maintainers lament that they hate releasing code because it becomes a second job. (See the first point.) While the PHP project itself doesn't owe those developers anything, and vice versa, it's still considerate to not make their lives harder unnecessarily. (This applies only to indies, not to companies.) Many of them are maintaining projects small enough that turning maintenance for their libraries into a paid service isn't viable, certainly not enough to compete with what their day job pays.

Tests or go home

This is a simple one. It's 2022. Nearly 2023. I don't care how old your code base is. If you don't have a reasonable test suite in place, at least functional tests, then it's not PHP's problem, it's your problem. "This code is too old to have tests" is BS. You've just not prioritized writing tests for the mystery meat code base you have. (And I say this having inherited such minimal-test code bases before.) This is not an argument, it's a lazy copout, and I will not give it any more air.

The release cycle

I often hear people complaining that when an old PHP release is deprecated that they "suddenly" have to upgrade everything. This is absolutely false, and disingenuously so.

All PHP releases (in the modern version scheduling) get two years of bug and security fixes from the date of their release, plus another year of security-only fixes. That means even if you only run the oldest-supported release (security only), there's a two year window in which you know what needs to be done for compatibility. From the time PHP 8.0 was released to the date free, volunteer support for 7.4 was dropped was about two years. Nothing about what's needed is a surprise if someone is taking the bare minimum responsibility to remain informed. (Just read php.net once or twice a year.)

And PHP has a many-months-long feature freeze before a release, and a many-month dev cycle before that. Most changes have at least an extra 6 months that they're known, maybe even a year, before the actual release.

Anyone who is surprised by some change at the last minute has no one but themselves to blame.

Not all changes are equal

Let's also bear in mind that people tend to be very generic and non-specific in their complaints, but not all changes to the language are equal.

I cannot comprehend a codebase in which the addition of short lambdas (7.4), constructor property promotion (8.0), enums (8.1), or allowing constants in traits (8.2) would break an existing codebase. If that somehow happens, the code in question was beyond broken already.

Others may have very slight changes of breakage, but they're small and usually easily fixed. If someone had a package that used the Random namespace and had a class named Randomizer, that would break with the new "Random extension" in 8.2. However, anyone doing so would be going against 14 years of well documented namespace conventions set up to protect them from exactly that situation. I have no sympathy.

Every new keyword in the language has the potential to conflict with some existing function name or constant. Sometimes there are ways to hack around that, other times not. It's reasonable to ask the PHP team to check before they introduce a new keyword to see how much it would break... and most of the time that is exactly what happens. Not always, perhaps not as much as some would like, but often.

At the same time, though, with billions of lines of PHP code in the world, knowing what keyword might break something is impossible. That's where major projects bear some responsibility to protect themselves by... being engaged and calling out things they know will break, when it's still early enough to fix them. Some projects do a decent job here, others pointedly do not. (Looking at you, Wordpress.)

In some cases there's no good workaround but to rename something in user-space:

I know one esports tournament platform that can't upgrade to PHP 8 because they have a class in their codebase called 'Match'.

[cf: https://mastodon.me.uk/@mintopia/109480647003836301]

Which is a fair criticism! But also not a world-breaker. In most cases it's a single IDE "rename" refactor operation.

And then there's the subtle (or not so subtle) behavior changes, which is what people are usually complaining about, but don't bother to separate from everything else. Things like promoting missing variables from a Notice to a Warning (8.0), or converting some resource values to objects (8.0), or adding types on core interfaces like Iterator (8.1), or deprecating dynamic properties (8.2). Depending on your codebase, those could require small to medium work to address. Grumbling about those is, at times, valid.

But what scale?

What is not valid, though, is this:

We’re forced into a rewrite, or something very like a rewrite, while at the same time remaining in production and producing new features to deal with rapid growth.

This is pure hyperbole. While it is certainly true that the optimal, recommended way to write PHP code has changed over the years, most of that change happened before PHP 5.3 in 2009. The language has just gotten better at encouraging you to write that way since then. But, for instance, "arrays as pseudo-objects" has been a known-bad-practice since at least 2007. That's not new, and yet code that does that still works today.

While newer PHP versions have, in many cases, included changes that necessitated code updates, none of them have required full rewrites. None. And in most cases, well-behaved code didn't even need changes. There's just a lot of not-well-behaved code out there.

As a point of comparison, I did most of the PHP 8.0 compatibility work for TYPO3 v11. TYPO3 is a 20+ year old system. It's one of the very few applications that has a continuous history since PHP 3. It relies very, very heavily on anonymous arrays, still. It has code debt upon code debt. There's over 800,000 lines of code in the core system and almost 4000 classes, not including dependencies. It's huge. And I was able to do the PHP 8.0 compatibility upgrades in about 4-5 weeks. Over 85% of them were some variation on adding ?? null to some line.

Should the code get refactored and updated? Absolutely. Is it necessary for modern PHP-compatibility? Absolutely not.

But let's consider a few recent changes and consider their impact.

Attributes

Attributes were introduced in 8.0. There was some drama around selecting the syntax, but in the end, the syntax that was chosen has a great extra feature: In earlier versions of PHP, it's a comment. It's new syntax that gets ignored. That was a major reason the current syntax was chosen, and it makes it easy to add attributes gracefully without any breakage, even in older versions. That helped a great deal to mitigate other changes, as we'll see below.

Undefined variables

PHP 8.0 raised the error level of undefined variables from a notice to a warning, effectively making them "real errors" when they weren't before. This impacts a lot of old code, it's true.

But relying on undefined variables to silently turn to null has been a known bad-practice that introduces security holes since at least 2005. It's been a good-practice recommendation to develop under E_ALL (report all errors, including notices) since at least 2007, probably longer. 17 years is, I would argue, entirely sufficient time to fix such issues, and anyone who has been developing under E_ALL wouldn't even notice this change.

Even if a code base has been sloppy for two decades, as was the case in TYPO3, then as noted this change doesn't require a rewrite. I fixed it with a few hundred ?? null or similar additions. Tedious, yes, but not world-breaking.

Refactoring the code to avoid those undefined vars being possible in the first place is wise, but not required. And it's been wise for 17 years at least. It's not some new development.

Interface types

PHP 8.1 added parameter and return types to most PHP-provided interfaces, such as Countable or Iterable. Improving the type declarations on those is unquestionably an improvement, as having those types there helps prevent people from doing something that is broken, and may cause unexpected behavior. However, it's also entirely true that it creates a problem for existing classes that have no return types, since that's now a type violation. That requires work to fix.

Work, but not a full rewrite. A full rewrite doesn't even make sense here. And the work needed is mitigated by three factors:

  1. Narrowing the return type is legal, so it's been possible to add return types to those methods since PHP 7.0.
  2. It's a deprecation, not an error. No code actually breaks. See the above point about deprecations.
  3. Even then, there's an opt-in attribute that can be placed on a method, #[\ReturnTypeWillChange], that will suppress the deprecation until PHP 9.0, and the attribute will be ignored on PHP 7 (which didn't have attributes) thanks to it being interpreted as a comment.

So, assuming a worst case scenario, the work involved is "paste #[\ReturnTypeWillChange] a bunch of times around the codebase and get back to it later." Annoying perhaps, but far from forcing a full rewrite.

One place I will raise a note is that some of the return or parameter types added were mixed, which was only introduced a year earlier in 8.0. That means adding those type hints (to avoid the attribute) introduced an incompatibility with 7.4 and earlier, which was, frankly, too short of a window. It might have been better to wait until 8.2 here to allow for a larger number of concurrent versions, although with the attribute such code is valid and error-free from 7.0 until 9.0 so that's still a minor complaint.

Stricter type internal functions

Many of PHP's older standard library functions (which are implemented in C and therefore behave subtly differently than user-space code already) have, historically, silently accepted null arguments and done some one-off "it made sense at the time" logic to avoid throwing an error. Just as with undefined variables, this is a convenience feature from PHP's very early days that is not, it turns out, very convenient. In fact, it's quite inconvenient to potentially get errors when PHP tells you a string is of length 0 when it's not even a string, causing bugs elsewhere in the code that you could have caught earlier. That's the entire value of types, but sometimes those older functions weren't smart about it.

In PHP 8.1, those functions were changed to trigger a deprecation when used with null. This also affected a not-small amount of code.

But again, there are many mitigating factors:

  1. Once again, it's a deprecation. See above.
  2. Typed parameters and returns have been around since PHP 7.0. Code that's been gradually adding proper types over the last seven years has probably caught and fixed a lot of null errors already, to the point that such code paths had already been eliminated.
  3. Passing null to strlen() is a bug. Period. If code out there was doing so, it was already broken and buggy. The developers just didn't realize it because they weren't relying on 7 year old features of the language to catch bugs for them. The noose on bugs has slowly tightened, so... yeah, newer PHP versions do a better job of telling you ahead of time that your code has a bug so you can fix it. Fix your bugs.

Once again, though, this doesn't necessitate a full rewrite. strlen((string)$buggy_might_be_null) will make the deprecation go away, and instead you'll be passing an empty string. That still means your code likely has a bug in it, but PHP won't tell you about it.

Resource to object conversion

PHP has an ancient variable type called resource that is unlike anything else. It's mostly useless, un-introspectable, and breaks the type system in exciting ways. Certain things being resources (like file references, some database connections, etc.) make certain improvements to PHP impossible. In PHP 8.0, many of those were converted to be objects. Not all of them were, due almost entirely to lack of people and time. Another batch was converted in PHP 8.1.

Most use cases won't notice; but there is code that has historically needed to check is_resource($var) or get_resource_type($var)or similar for variou

#php #community
Payout: 0.000 HBD
Votes: 9
More interactions (upvote, reblog, reply) coming soon.