Lagrange multipliers are more than mere ghost variables that help to solve constrained optimization problems...

Lagrange multipliers technique, quick recap

Constrained optimization
Image credit: By Nexcis (Own work) [Public domain], via Wikimedia Commons
When you want to maximize (or minimize) a multivariable function subject to the constraint that another multivariable function equals a constant, , follow these steps:
  • Step 1: Introduce a new variable start color greenE, lambda, end color greenE, and define a new function L\mathcal{L} as follows:
    This function L\mathcal{L} is called the "Lagrangian", and the new variable start color greenE, lambda, end color greenE is referred to as a "Lagrange multiplier"
  • Step 2: Set the gradient of L\mathcal{L} equal to the zero vector.
    In other words, find the critical points of L\mathcal{L}.
  • Step 3: Consider each solution, which will look something like . Plug each one into f. Or rather, first remove the start color greenE, lambda, end color greenE, start subscript, 0, end subscript component, then plug it into f, since f does not have start color greenE, lambda, end color greenE as an input. Whichever one gives the greatest (or smallest) value is the maximum (or minimum) point your are seeking.

Budgetary constraints, revisited

The last article covering examples of the Lagrange multiplier technique included the following problem.
  • Problem: Suppose you are running a factory, producing some sort of widget that requires steel as a raw material. Your costs are predominantly human labor, which is dollar sign, 20 per hour for your workers, and the steel itself, which runs for dollar sign, 170 per ton. Suppose your revenue R is loosely modeled by the equation
    R(h,s)=200h2/3s1/3\begin{aligned} \quad R(h, s) = 200 h^{2/3} s^{1/3} \end{aligned}
    Where
    • h represents hours of labor
    • s represents tons of steel
    If your budget is dollar sign, 20, comma, 000, what is the maximum possible revenue?
You can get a feel for this problem using the following interactive diagram, which let's you see which values of left parenthesis, h, comma, s, right parenthesis yield a given revenue (blue curve) and which values satisfy the constraint (red line).
The full details of the solution can be found in the last article. For our purposes here, you just need to know what happens in principle as we follow the steps of the Lagrange multiplier technique.
  • We start by writing the Lagrangian L(h,s,λ)\mathcal{L}(h, s, \lambda) based on the function start color blueE, R, left parenthesis, h, comma, s, right parenthesis, end color blueE and the constraint start color redE, 20, h, plus, 170, s, equals, 20, comma, 000, end color redE.
    L(h,s,λ)=200h2/3s1/3λ(20h+170s20,000)\begin{aligned} \quad \mathcal{L}(h, s, \lambda) = \blueE{200h^{2/3}s^{1/3}}-\lambda(\redE{20h+170s-20{,}000}) \end{aligned}
  • Then we find the critical points of L\mathcal{L}, meaning the solutions to
    L(h,s,λ)=0\begin{aligned} \quad \nabla \mathcal{L}(h, s, \lambda) = 0 \end{aligned}
  • There might be several solutions left parenthesis, h, comma, s, comma, lambda, right parenthesis to this equation,
    (h0,s0,λ0)(h1,s1,λ1)(h2,s2,λ2)\begin{aligned} \quad (h_0, &s_0, \lambda_0) \\ (h_1, &s_1, \lambda_1) \\ (h_2, &s_2, \lambda_2) \\ &\vdots \end{aligned}
    so for each one you plug in the h and s components to the revenue function R, left parenthesis, h, comma, s, right parenthesis to see which one actually corresponds with the maximum.
It's common to write this maximizing critical point as left parenthesis, h, start superscript, times, end superscript, comma, s, start superscript, times, end superscript, comma, lambda, start superscript, times, end superscript, right parenthesis, using asterisk superscripts to indicate that this is a solution. This means h, start superscript, times, end superscript and s, start superscript, times, end superscript represent the hours of labor and tons of steel you should allocate to maximize revenue subject to your budget. But how can we interpret the Lagrange multiplier lambda, start superscript, times, end superscript that comes with these maximizing values? This is the core question of the article.
It turns out that lambda, start superscript, times, end superscript tells us how much more money we can make by changing our budget.
Let's get a feel for what it means to change the budget. The following tool is similar to the one above, but now the red line representing which points left parenthesis, h, comma, s, right parenthesis satisfy the budget constraint will shift as you let the budget vary around dollar sign, 20, comma, 000. This budget is represented with the variable b.
For each value of the budget b, try to maximize R while ensuring that the curves still touch each other. Notice that the maximum R-value you can achieve changes as b changes. We are interested in studying the specifics of that change.
Let M, start superscript, times, end superscript represent the maximum revenue you achieve. In the next interactive diagram, the only variable you can change is b, and you can see how the value of M, start superscript, times, end superscript depends on b.
In other words, this maximum revenue M, start superscript, times, end superscript is a function of the budget b, so we write it as
M(b)\begin{aligned} \quad M^*(b) \end{aligned}
We can now express a truly wonderful fact: The Lagrange multiplier lambda, start superscript, times, end superscript, left parenthesis, b, right parenthesis gives the derivative of M, start superscript, times, end superscript:
dMdb(b)=λ(b)\begin{aligned} \quad \dfrac{dM}{db}(b) = \lambda^*(b) \end{aligned}
In terms of the interactive diagram above, this means lambda, start superscript, times, end superscript, left parenthesis, b, right parenthesis tells you the rate of change of the black dot representing M, start superscript, times, end superscript as you move around the yellow dot representing b.
Showing why this is true is a bit tricky, but first, let's take a moment to interpret it. For example, if we found that lambda, start superscript, times, end superscript, left parenthesis, b, right parenthesis, equals, 2, point, 59, it would mean each additional dollar you spend over your budget would yield another dollar sign, 2, point, 59 in revenue. Conversely, decreasing your budget by a dollar will cost you that much in lost revenue.
This interpretation of lambda, start superscript, times, end superscript comes up commonly enough in economics to deserve a name: "Shadow price". It is the money gained by loosening the constraint by a single dollar, or conversely the price of strengthening the constraint by one dollar.

Generally speaking

Let's generalize what we just did with the budget example and see why it's true. Spelling out the full result is actually quite a mouthful, but it should be made clear by holding the following mantra in the back of your mind: "How does the solution change as the constraint changes?".
We start with the usual Lagrange multiplier setup. There is a function we want to maximize,
f(x,y)\begin{aligned} \quad f(x, y) \end{aligned}
and a constraint,
g(x,y)=c\begin{aligned} \quad g(x, y) = c \end{aligned}
We start by writing the Lagrangian,
L(x,y,λ)=f(x,y)λ(g(x,y)c).\begin{aligned} \quad {\mathcal{L}(x, y, \lambda) = f(x, y) - \lambda (g(x, y)-c)}. \end{aligned}
Let left parenthesis, x, start superscript, times, end superscript, comma, y, start superscript, times, end superscript, comma, lambda, start superscript, times, end superscript, right parenthesis be the critical point of L\mathcal{L}, which solves our constrained optimization problem. In other words,
L(x,y,λ)=0 \nabla \mathcal{L}(x^*, y^*, \lambda^*) = 0
And left parenthesis, x, start superscript, times, end superscript, comma, y, start superscript, times, end superscript, right parenthesis maximizes f (subject to the constraint).
When we start to think of c as a variable, we must account for the fact that the solution left parenthesis, x, start superscript, times, end superscript, comma, y, start superscript, times, end superscript, comma, lambda, start superscript, times, end superscript, right parenthesis changes as the constraint c changes. To do this, we start writing each component as a function of c:
x(c)y(c)λ(c)\begin{aligned} \quad x^*(c) \\ y^*(c) \\ \lambda^*(c) \\ \end{aligned}
In other words, when the constraint equals some value c, the solution triplet to the Lagrange multiplier problem is left parenthesis, x, start superscript, times, end superscript, left parenthesis, c, right parenthesis, comma, y, start superscript, times, end superscript, left parenthesis, c, right parenthesis, comma, lambda, start superscript, times, end superscript, left parenthesis, c, right parenthesis, right parenthesis.
We now let M, start superscript, times, end superscript, left parenthesis, c, right parenthesis represent the (constrained) maximum value of f as a function of c, which can be written in terms of f, x, start superscript, times, end superscript, left parenthesis, c, right parenthesis and y, start superscript, times, end superscript, left parenthesis, c, right parenthesis as follows:
M, start superscript, times, end superscript, left parenthesis, c, right parenthesis, equals, f, left parenthesis, x, start superscript, times, end superscript, left parenthesis, c, right parenthesis, comma, y, start superscript, times, end superscript, left parenthesis, c, right parenthesis, right parenthesis
The core result we wish to show is that
start fraction, d, M, start superscript, times, end superscript, divided by, d, c, end fraction, equals, lambda, start superscript, times, end superscript, left parenthesis, c, right parenthesis
This says that the Lagrange multiplier lambda, start superscript, times, end superscript gives the rate of change of the solution to the constrained maximization problem as the constraint varies.

Want to outsmart your teacher?

Proving this result could be an algebraic nightmare, since there is no explicit formula for the functions x, start superscript, times, end superscript, left parenthesis, c, right parenthesis, y, start superscript, times, end superscript, left parenthesis, c, right parenthesis, lambda, start superscript, times, end superscript, left parenthesis, c, right parenthesis or M, start superscript, times, end superscript, left parenthesis, c, right parenthesis. This means you would have to start with the defining property of x, start superscript, times, end superscript, y, start superscript, times, end superscript and lambda, start superscript, times, end superscript, namely that L(x,y,λ)=0\nabla \mathcal{L}(x^*, y^*, \lambda^*)=0, and reason your way towards start fraction, d, M, start superscript, times, end superscript, divided by, d, c, end fraction. This is not at all straight forward (try it!).
There is a fun story, in which a professor was asked what the harshest truth he ever learned from a student was. He recalled a class he taught when he went through a long and algebraically heavy proof, only to be shown by a student that there is a much simpler approach. The lesson, he said, was that he was not as smart as he thought he was.
The result he was talking about just so happens to be what we are now trying to prove. Although the student's approach is not quite so simple as the story makes it out to be, it is still a clean way to view the problem. More importantly, it is easier to remember than other proofs, so I'll spell it out in full here. As happens so often in math, a little insight can save us from excessive algebra.

The insight

The underlying insight is that evaluating the Lagrangian itself at a solution left parenthesis, x, start superscript, times, end superscript, comma, y, start superscript, times, end superscript, comma, lambda, start superscript, times, end superscript, right parenthesis will give the maximum value M, start superscript, times, end superscript. This is because the "g, left parenthesis, x, comma, y, right parenthesis, minus, c" term in the Lagrangian goes to zero (since a solution must satisfy the constraint), so we have
L(x,y,λ)=f(x,y)λ(g(x,y)c)=f(x,y)+0=M\begin{aligned} \quad \mathcal{L}(x^*, y^*, \lambda^*) &= f(x^*, y^*) - \lambda^*(g(x^*, y^*)-c) \\ &= f(x^*, y^*) + 0 \\ &= M^* \end{aligned}
Given that we want to find start fraction, d, M, start superscript, times, end superscript, divided by, d, start color redE, c, end color redE, end fraction, this suggests that we should find a way to treat L\mathcal{L} as a function of start color redE, c, end color redE. Then we might be able to relate the derivative we want to a derivative of L\mathcal{L} with respect to start color redE, c, end color redE.

The followthrough

Start by treating L\mathcal{L} as a function of four variable instead of three, since start color redE, c, end color redE is now modeled as a changing value:
L(x,y,λ,c)=f(x,y)λ(g(x,y)c).\begin{aligned} \quad {\mathcal{L}(x, y, \lambda, \redE{c}) = f(x, y) - \lambda (g(x, y)-\redE{c})}. \end{aligned}
Reflection question: When L\mathcal{L} is written as a four-variable function like this, what is Lc\dfrac{\partial \mathcal{L}}{\partial \redE{c}}?
Wybierz 1 odpowiedź:
Wybierz 1 odpowiedź:

Lc=c(f(x,y)λ(g(x,y)c))=λ(1)=λ\begin{aligned} \quad \dfrac{\partial \mathcal{L}}{\partial \redE{c}} = \dfrac{\partial}{\partial \redE{c}} (f(x, y) - \lambda (g(x, y)-\redE{c})) = -\lambda\cdot(-1) = \lambda \end{aligned}
This partial derivative is promising, since our goal is to show that start fraction, d, M, start superscript, times, end superscript, divided by, d, start color redE, c, end color redE, end fraction, equals, lambda, start superscript, times, end superscript, and we know that M=LM^* = \mathcal{L} at solutions. However, we still have work to do.
To encode the fact that we only care about the value of L\mathcal{L} at a solutions left parenthesis, x, start superscript, times, end superscript, comma, y, start superscript, times, end superscript, comma, lambda, start superscript, times, end superscript, right parenthesis for a given value of start color redE, start color redE, c, end color redE, end color redE, we replace start color blueE, x, end color blueE, comma, start color greenE, y, end color greenE and start color goldE, lambda, end color goldE with start color blueE, x, start superscript, times, end superscript, left parenthesis, start color redE, c, end color redE, right parenthesis, end color blueE, comma, start color greenE, y, start superscript, times, end superscript, left parenthesis, start color redE, c, end color redE, right parenthesis, end color greenE and start color goldE, lambda, start superscript, times, end superscript, left parenthesis, start color redE, c, end color redE, right parenthesis, end color goldE. These are functions of start color redE, c, end color redE which correspond to the solution of the Lagrangian problem for a given choice of the "constant" start color redE, c, end color redE.
This let's us write M, start superscript, times, end superscript as a function of start color redE, c, end color redE as follows:
M(c)=L(x(c),y(c),λ(c),c)\begin{aligned} \quad M^*(\redE{c}) = \mathcal{L}(\blueE{x^*(\redE{c})}, \greenE{y^*(\redE{c})}, \goldE{\lambda^*(\redE{c})}, {\redE{c}}) \end{aligned}
Even though this expression has only one variable, start color redE, c, end color redE, there is a four-variable function L\mathcal{L} as an intermediary. Therefore, to take its (ordinary) derivative with respect to c, we use the multivariable chain rule:
dMdc=ddcL(x(c),y(c),λ(c),c)=Lxdxdc+Lydydc+Lλdλdc+Lcdcdc\begin{aligned} \quad \dfrac{dM^*}{d\redE{c}} &= \dfrac{d}{d\redE{c}}\mathcal{L}(\blueE{x^*}(\redE{c}), \greenE{y^*}(\redE{c}), \goldE{\lambda^*}(\redE{c}), \redE{c}) \\ \\ &= \dfrac{\partial \mathcal{L}}{\blueE{\partial x}} \dfrac{d\blueE{x^*}}{d\redE{c}} + \dfrac{\partial \mathcal{L}}{\greenE{\partial y}} \dfrac{d\greenE{y^*}}{d\redE{c}} + \dfrac{\partial \mathcal{L}}{\goldE{\partial \lambda}} \dfrac{d\goldE{\lambda^*}}{d\redE{c}} + \dfrac{\partial \mathcal{L}}{\partial \redE{c}} \dfrac{d\redE{c}}{d\redE{c}} \end{aligned}
Note, each partial derivative in the expression above should be evaluated at left parenthesis, start color blueE, x, start superscript, times, end superscript, end color blueE, left parenthesis, start color redE, c, end color redE, right parenthesis, comma, start color greenE, y, start superscript, times, end superscript, end color greenE, left parenthesis, start color redE, c, end color redE, right parenthesis, comma, start color goldE, lambda, start superscript, times, end superscript, end color goldE, left parenthesis, start color redE, c, end color redE, right parenthesis, comma, start color redE, c, end color redE, right parenthesis, but writing that would make the expression more messy than it already is.
This might seem like a lot, but remember where the terms start color blueE, x, start superscript, times, end superscript, end color blueE, start color greenE, y, start superscript, times, end superscript, end color greenE and start color goldE, lambda, start superscript, times, end superscript, end color goldE each came from. Each partial derivative Lx\dfrac{\partial \mathcal{L}}{\blueE{\partial x}}, Ly\dfrac{\partial \mathcal{L}}{\greenE{\partial y}}, and Lλ\dfrac{\partial \mathcal{L}}{\goldE{\partial \lambda}} is zero when evaluated at left parenthesis, start color blueE, x, start superscript, times, end superscript, end color blueE, comma, start color greenE, y, start superscript, times, end superscript, end color greenE, comma, start color goldE, lambda, start superscript, times, end superscript, end color goldE, right parenthesis. That's how a solution left parenthesis, start color blueE, x, start superscript, times, end superscript, end color blueE, comma, start color greenE, y, start superscript, times, end superscript, end color greenE, comma, start color goldE, lambda, start superscript, times, end superscript, end color goldE, right parenthesis is defined! This means the first three terms go to zero.
Moreover, since start fraction, d, start color redE, c, end color redE, divided by, d, start color redE, c, end color redE, end fraction, equals, 1, the entire expression simplifies to
dMdc=Lc\begin{aligned} \quad \dfrac{dM^*}{d\redE{c}} = \dfrac{\partial \mathcal{L}}{\partial \redE{c}} \end{aligned}
It's important to notice that the reason for this simplification relies on the special properties of solution points left parenthesis, start color blueE, x, start superscript, times, end superscript, end color blueE, comma, start color greenE, y, start superscript, times, end superscript, end color greenE, comma, start color goldE, lambda, start superscript, times, end superscript, end color goldE, right parenthesis. Otherwise, working out the full derivative based on the multivariable chain rule could have been a nightmare!
For the sake of notational cleanliness, we left out the inputs to these derivatives, but let's write them in.
dMdc(c)=Lc(x(c),y(c),λ(c),c)\begin{aligned} \quad \dfrac{dM^*}{d\redE{c}}(\redE{c}) = \dfrac{\partial \mathcal{L}}{\partial \redE{c}}(\blueE{x^*}(\redE{c}), \greenE{y^*}(\redE{c}), \goldE{\lambda^*}(\redE{c}), \redE{c}) \end{aligned}
Since we saw in the reflection question above that Lc=λ\dfrac{\partial \mathcal{L}}{\partial \redE{c}} = \lambda, this means
Zrobione!
The anecdote about professor and student originally come from this Quora post.