A Better Notation of Derivatives

I’m hoping this is enough of a safe space where I can admit a terrible secret; for most of my undergraduate career I couldn’t explain what exactly the ‘d’ meant in the $\text{d}x$ from differential calculus. One thing I knew for sure is that it’s not a real number. I knew this because mathematicians give you this funny look of remorse, anger and depression anytime you mention “multiplying by $\text{d}x$ on both sides of the equation.” Answering this question will take us on a little journey, so buckle up.

Some History

Modern calculus could be dated to Gottfried Wilhelm Leibniz and Isaac Newton, but the idea of a tangent line describing the derivative of a function goes all the way back to the Greeks. Even though Archimedes used the concept of an infinitesimal, most introductory calculus courses skip the formalism of the infinitesimal.

To see why, let’s set the scene. We have two variables called $x$ and $y=f(x)$ . A perfectly reasonable thing to ask is the following: If I change the value of $x$ by some really small amount, how much will $y$ change as a result? Of course, anyone who’s learned calculus would jump to say the derivative here and they would be absolutely right. For the more rusty readers remember that the derivative is defined like so:

\frac{\text{d}y}{\text{d}x} = f'(x) = \lim_{h\to 0}\frac{f(x + h) - f(x)}{h}

So if we wanted to use this result to get an expression for $\text{d}y$ the most intuitive answer would be to calculate the derivative function and then multiply both sides of the equation by $\text{d}x$ , right? There’s only one problem with that though: we never defined the algebra that $\text{d}x$ follows. In fact, we have no idea what $\text{d}x$ even is! How would we even know how to carry out this magic multiplication? We only know how to do algebra on real numbers. If that’s the case though, then why do we write it as $\frac{\text{d}y}{\text{d}x}$ ? Well, this is actually just a notation convention - Leibniz’s notation, to be exact. As an alternative, Newton used to put dots over the variables to signify the tangent slope, which is still used frequently in classical mechanics literature.

The right way

The first thing we need to is clean up notation. We haven’t yet justified why the directional derivative function should be written as $\frac{\text{d}y}{\text{d}x}$ . So let’s instead write it a different way. To signify the tangent line of the function $f(x)$ in the direction of $x$ we will instead write it as $\partial_x f = \lim_{h\to 0}\frac{f(x + h) - f(x)}{h}$ and for higher order derivatives we add an exponent to the symbol (For people who have done multi-variable calculus, no, using that symbol is definitely not a coincidence and they are in fact the same thing).

Now that we got notation out of the way we can finally tackle the main question. To get an expression for $\text{d}y$ we need to define what ‘d’ means in that expression. Notice that $x$ was an independent variable while $y$ was a dependent variable. So, geometrically speaking, the space we are working with is one-dimensional (i.e. we are only dealing with things that exist on the number line). The easiest model to start with is a linear one, so let’s do exactly that. Let’s define the exterior derivative (at a fixed point $x$ ) as just the best possible linear approximation to the model $\text{d}y \approx m \text{d}x$ . If we extend this idea but for every possible fixed point $x$ , then the result is to replace the number $m$ with a function so that $\text{d}y \approx m\text{d}x$ . Under this model, it should be obvious that the best value of $m$ at any fixed point would be the tangent line. So the function that would best fit this model is just the directional derivative, and so we conclude that as we take the limit to infinity we would get $\text{d}y = (\partial_x f)\text{d}x$ .

It is reasonable to doubt the use of a linear approximation (and why it would work) so let me explain. Recall that any function $f(x)$ has a derivative if and only if it is continuous and its directional derivative is also continuous. When we are taking a linear approximation we are doing it over a small neighborhood of points around the fixed value of $x$ . Clearly, if the function and its derivative are both continuous, then as we make the neighborhood smaller and smaller, the linear approximation will get better and better. It can be rigorously shown that in the limit, the error from the linear approximation goes to zero. This means that we can always make the neighborhood small enough that the error term from a linear approximation is so small that it’s insignificant in any calculations. This makes a linear approximation a very reasonable one.

And the point is…?

Okay, it feels like we haven’t made any progress yet, because if we look at our new equation, $\text{d}y = (\partial_x f)\text{d}x$ , we just confirmed exactly that the “dirty” method gives us the correct result. So what gives? What makes it so “dirty?”

When everything falls apart

This is all well and good but what happens when we move to second order derivatives (i.e. the derivative of a derivative)? Let’s find out. By the quotient rule we have that

\begin{align*} \text{d}\left(\frac{\text{d}y}{\text{d}x}\right) &= \frac{\text{d}x(\text{d}(\text{d}y)) - \text{d}y(\text{d}(\text{d}x))}{(\text{d}x)^2}\\ &=\frac{\text{d}x\text{d}^2 y - \text{d}y \text{d}^2 x}{\text{d} x^2}\\ &=\frac{\text{d} x}{\text{d} x}\frac{\text{d}^2 y}{\text{d} x} - \frac{\text{d} y}{\text{d} x}\frac{\text{d}^2 x}{\text{d} x}\\ &=\frac{\text{d}^2 y}{\text{d} x} - \frac{\text{d} y}{\text{d} x}\frac{\text{d}^2 x}{\text{d} x} \end{align*}

which gives us the following relationship for the second derivative:

\partial_x^2y = \frac{\text{d}^2 y}{\text{d} x^2} - \frac{\text{d} y}{\text{d} x}\frac{\text{d}^2 x}{\text{d} x^2} = \frac{\text{d}^2 y}{\text{d} x^2} - \partial_xy\frac{\text{d}^2 x}{\text{d} x^2}

Aha! So things do change in higher orders. Notice that the derivative operator (which is on the left hand side) does NOT equal the famous notation of $\frac{\text{d}^2{y}}{\text{d}{x}^2}$ . You might be wondering why the last term isn’t zero, but that’s simply because we’re very used to the Leibniz notation. That last term is not describing the second derivative of $x$ with respect to itself (which is certainly zero!) Instead it is the true, algebraic division between the exterior derivative applied twice on the exterior derivative squared.

Moral of the story

So everything is fine and dandy in first-order land but fails for higher order derivatives. This, in a nutshell, is why we don’t consider the derivative notation as algebraically correct. In a perfect world, our notation would make things more intuitive rather than less. But Alas, Leibniz notation is very popular in most non-pure math fields. But hey! At least you now know why mathematicians give you that stink-eye look any time we “divide both sides by $\text{d}x$ .”