When 'perfect' code fails

Overview

We all strive to write good, correct software. But what happens when a function that looks "perfect", a simple, one-line equality check, ends up creating a massive security hole? In this post, I'm going to share a story about a very nasty bug we found in our Next.js application. A function that should have returned true or false was always returning true.

It's a cautionary tale about how modern framework "magic" can sometimes lead to very surprising problems.

Prelude

IMPORTANT

The following story is based on tech that is around 1 year old where we used Next.js 14.1.3 and React 18. Future versions fixed the following problem but more on that later.

I was recently watching a talk by Andrew Kelly about the Zig programming language where his thesis is that "software should be perfect".

He defines it as:

no bugs
no exceptions
it works every time

Further, he enumerates ways software can fail:

Hardware failure
Cosmic rays
Incorrect software

And we as programmers should strive to write as much correct software as we can.

Starting with the following definition for perfect software:

For every possible input, the software yields the correct output.

As an example, we see the following C snippet:

bool perform_not(bool x) {
  return !x;
}

He reiterates,

This is clearly perfect software!

Granted, this one is a very simple example, but he is not wrong.

And as I was watching this exact segment, I remembered a really nasty bug we had in our code

Perfect code

Let's look at the following Javascript snippet:

const isOwner = (userMail: string, ownerMail: string) => {
  return userMail === ownerMail;
};

I would argue that the following function is also perfect code based on the definition above. We have two inputs that we compare and return a boolean based on the result. This is nearly as perfect as a function can be. No (complex) state, no (complex) branching, no (complex) mutation, etc.

And yet, this one almost broke our security measures thanks to a subtle interaction between Next.js (React) and Javascript.

The problem

As we implemented a feature where we ask if a user is the owner of a resource:

if (isOwner(userMail, resource.ownerMail)) {
  // grant access
}

...we quickly realized in our staging environment that something was dangerously off.

The problem? Our security check was passing for everyone. Everyone could access every possible resource.

All tests were green (yes, we even tested this simple function), but still, the function wasn't doing what we expected.

After a little digging and testing, we couldn't believe our eyes: the function always returned true!

What happened?

The snippet above was called as a server function. This is React's new way of calling server side code from the client side.

Let's look at the function again:

const isOwner = (userMail: string, ownerMail: string) => {
  return userMail === ownerMail;
};

Keen eyes and seasoned users of server functions might spot the devil right now, but let me explain for those who don't.

See, server functions are inherently asynchronous, and in the React documentation, they tell you that:

Because the underlying network calls are always asynchronous, 'use server' can only be used on async functions.

Okay, fair, but I've got a few problems with that:

It is written as if this only applies to functions you mark with use server and not the whole file.
Why were there no warnings? It is very easy to forget or miss this crucial information.
Why did this even run in the first place?

When we called the isOwner function, it returned a Promise even though our function signature did not specify it as an async function. Our synchronous-looking function was invisibly converted into an async function. This meant it no longer returned a boolean, it returned a Promise.

And in Javascript, what is the value of a Promise object inside an if statement?

if (Promise<void>) {
  // always true
}

It's truthy!

Our security check if (isOwner(...)) was no longer checking if (true) or if (false). It was checking if (new Promise(...)), which always evaluates to true, granting access to everyone, every time. Great!

NOTE

I have a small Next.js example on GitHub that illustrates the problem if you want to check it out. The main branch is on 14.1.3 (the bad one) with a vitest, but there is also the latest 14.2.33 version that now yields errors in the file but you can still build it without errors. I tested this also with the latest 15.x and the new 16.x version and with those we also get build time errors.

Learnings

The fix is easy: just declare the function async and await it. Then you'll get the correct value. But it still baffles me that something like that is (or was) even remotely possible.

We wrote a function that looked perfect and synchronous. But the framework was doing magic behind the scenes. Because it was a "server function," Next.js automatically made it async and returned a Promise.

And in Javascript, a Promise object inside an if statement is always true.

That's the whole bug. Our simple if (isOwner(...)) was the same as writing if (true), which is why it let everyone in.

It just shows that all this framework "magic" that's supposed to help us can also create some really weird problems. We were lucky we caught this, and it's a good thing the newer Next.js versions warn you about this now or even refuse to build.

NOTE

It seems as if we were not the only ones with that problem. A Reddit thread from about a year ago mentioned the exact problem we had.