Examples of the Lagrangian and Lagrange multiplier technique in action.

Lagrange multiplier technique, quick recap

Constrained optimization
Image credit: By Nexcis (Own work) [Public domain], via Wikimedia Commons
When you want to maximize (or minimize) a multivariable function subject to the constraint that another multivariable function equals a constant, , follow these steps:
  • Step 1: Introduce a new variable start color greenE, lambda, end color greenE, and define a new function L\mathcal{L} as follows:
    This function L\mathcal{L} is called the "Lagrangian", and the new variable start color greenE, lambda, end color greenE is referred to as a "Lagrange multiplier"
  • Step 2: Set the gradient of L\mathcal{L} equal to the zero vector.
    In other words, find the critical points of L\mathcal{L}.
  • Step 3: Consider each solution, which will look something like . Plug each one into f. Or rather, first remove the start color greenE, lambda, end color greenE, start subscript, 0, end subscript component, then plug it into f, since f does not have start color greenE, lambda, end color greenE as an input. Whichever one gives the greatest (or smallest) value is the maximum (or minimum) point your are seeking.

Example 1: Budgetary constraints


Suppose you are running a factory, producing some sort of widget that requires steel as a raw material. Your costs are predominantly human labor, which is dollar sign, 20 per hour for your workers, and the steel itself, which runs for dollar sign, 170 per ton. Suppose your revenue R is loosely modeled by the following equation:
R, left parenthesis, h, comma, s, right parenthesis, equals, 200, h, start superscript, 2, slash, 3, end superscript, s, start superscript, 1, slash, 3, end superscript
  • h represents hours of labor
  • s represents tons of steel
If your budget is dollar sign, 20, comma, 000, what is the maximum possible revenue?


The dollar sign, 20 per hour labor costs and dollar sign, 170 per ton steel costs tell us that the total cost of production, in terms of h and s, is
20h+170s\begin{aligned} \quad 20h + 170s \end{aligned}
Therefore the budget of dollar sign, 20, comma, 000 can be translated to the constraint
20h+170s=20,000\begin{aligned} \quad \redE{20h + 170s = 20{,}000} \end{aligned}
Before we dive into the computation, you can get a feel for this problem using the following interactive diagram. You can see which values of left parenthesis, h, comma, s, right parenthesis yield a given revenue (blue curve) and which values satisfy the constraint (red line).
Since we need to maximize a function start color blueE, R, left parenthesis, h, comma, s, right parenthesis, end color blueE, subject to a constraint, start color redE, 20, h, plus, 170, s, equals, 20, comma, 000, end color redE, we begin by writing the Lagrangian function for this setup:
L(h,s,λ)=200h2/3s1/3λ(20h+170s20,000) \mathcal{L}(h, s, \lambda) = \blueE{200h^{2/3}s^{1/3}}-\lambda(\redE{20h+170s-20{,}000})
Next, set the gradient L\nabla \mathcal{L} equal to the 0 vector. This is the same as setting each partial derivative equal to 0. First, we handle the partial derivative with respect to start color blueE, h, end color blueE.
0=Lh0=h(200h2/3s1/3λ(20h+170s20,000))0=20023h1/3s1/320λ\begin{aligned} \quad 0 &= \dfrac{\partial \mathcal{L}}{\partial \blueE{h}} \\ \\ 0 &= \dfrac{\partial}{\partial \blueE{h}}(200\blueE{h}^{2/3}s^{1/3}-\lambda(20\blueE{h}+170s-20{,}000)) \\ \\ 0 &= 200 \cdot \dfrac{2}{3}\blueE{h}^{-1/3}s^{1/3} - 20\lambda \\ \end{aligned}
Next, we handle the partial derivative with respect to start color greenE, s, end color greenE.
0=Ls0=s(200h2/3s1/3λ(20h+170s20,000))0=20013h2/3s2/3170λ\begin{aligned} \quad 0 &= \dfrac{\partial \mathcal{L}}{\partial \greenE{s}} \\ \\ 0 &= \dfrac{\partial}{\partial \greenE{s}}(200h^{2/3}\greenE{s}^{1/3}-\lambda(20h+170\greenE{s}-20{,}000)) \\ \\ 0 &= 200 \cdot \dfrac{1}{3}h^{2/3}\greenE{s}^{-2/3} - 170\lambda \end{aligned}
Finally we set the partial derivative with respect to start color goldE, lambda, end color goldE equal to 0, which as always is just the same thing as the constraint. In practice, you can of course just write the constraint itself, but I'll write out the partial derivative here just to make things clear.
0=Lλ0=λ(200h2/3s1/3λ(20h+170s20,000))0=20h170s+20,00020h+170s=20,000\begin{aligned} \quad 0 &= \dfrac{\partial \mathcal{L}}{\partial \goldE{\lambda}} \\ \\ 0 &= \dfrac{\partial}{\partial \goldE{\lambda}}(200h^{2/3}s^{1/3}-\goldE{\lambda}(20h+170s-20{,}000)) \\ \\ 0 &= -20h-170s+20{,}000 \\ \\ 20h &+ 170s = 20{,}000 \end{aligned}
Putting it together, the system of equations we need to solve is
0=20023h1/3s1/320λ0=20013h2/3s2/3170λ20h+170s=20,000\begin{aligned} \quad 0 &= 200 \cdot \dfrac{2}{3}{h}^{-1/3}s^{1/3} - 20\lambda \\ \\ 0 &= 200 \cdot \dfrac{1}{3}h^{2/3}{s}^{-2/3} - 170\lambda \\ \\ 20h &+ 170s = 20{,}000 \end{aligned}
In practice, you should almost always use a computer once you get to a system of equations like this. Especially because the equation will likely be more complicated than these in real applications. Once you do, you'll find that the answer is
h=2,0003666.667s=2,0005139.2157λ=8,00045932.593\begin{aligned} \quad h &= \dfrac{2{,}000}{3} \approx 666.667 \\ \\ s &= \dfrac{2{,}000}{51} \approx 39.2157 \\ \\ \lambda &= \sqrt[3]{\dfrac{8{,}000}{459}} \approx 2.593 \\ \\ \end{aligned}
This means you should employ about 667 hours of labor, and purchase 39 tons of steel, which will give a maximum revenue of
The interpretation of this constant lambda, equals, 2, point, 593 is left to the next article

Example 2: Maximizing dot product

Problem: Let the three-dimensional vector v, with, vector, on top be defined as follows.
v=[231]\begin{aligned} \quad \vec{\textbf{v}} = \left[ \begin{array}{c} 2 \\ 3 \\ 1 \end{array} \right] \end{aligned}
Consider every possible unit vector u^\hat{\textbf{u}} in three-dimensional space. For which one is the dot product u^v\hat{\textbf{u}} \cdot \vec{\textbf{v}} the greatest?
The diagram below is two-dimensional, but not much changes in the intuition as we move to three dimensions.
Unit vector dot product
Two-dimensional analogy to the three-dimensional problem we have. Which unit vector u^\hat{\textbf{u}} maximizes the dot product u^v\hat{\textbf{u}} \cdot \vec{\textbf{v}}?
If you are fluent with dot products, you may already know the answer. It's one of those mathematical facts worth remembering. If you don't know the answer, all the better! Because we will now find and prove the result using the Lagrange multiplier method.
First, we need to spell out how exactly this is a constrained optimization problem. Write the coordinates of our unit vectors as x, y and z:
u^=[xyz]\begin{aligned} \quad \hat{\textbf{u}} = \left[ \begin{array}{c} x \\ y \\ z \end{array} \right] \end{aligned}
The fact that u^\hat{\textbf{u}} is a unit vector means its magnitude is 1:
u^=x2+y2+z2=1x2+y2+z2=1\begin{aligned} \quad ||\hat{\textbf{u}}|| = \sqrt{x^2 + y^2 + z^2} &= 1 \\ &\Downarrow \\ \redE{x^2 + y^2 + z^2} &\redE{= 1} \end{aligned}
This is our constraint.
Maximizing u^v\hat{\textbf{u}} \cdot \vec{\textbf{v}} means maximizing the following quantity:
[xyz][231]=2x+3y+z \left[ \begin{array}{c} x \\ y \\ z \end{array} \right] \cdot \left[ \begin{array}{c} 2 \\ 3 \\ 1 \end{array} \right] = \blueE{2x + 3y + z}
The Lagrangian, with respect to this function and the constraint above, is
L(x,y,z,λ)=2x+3y+zλ(x2+y2+z21). \mathcal{L}(x, y, z, \lambda) = 2x + 3y + z - \lambda(x^2 + y^2 + z^2 - 1).
We now solve for L=0\nabla \mathcal{L} = \textbf{0} by setting each partial derivative of this expression equal to 0.
x(2x+3y+zλ(x2+y2+z21))=2λ2x=0y(2x+3y+zλ(x2+y2+z21))=3λ2y=0z(2x+3y+zλ(x2+y2+z21))=1λ2z=0\begin{aligned} \dfrac{\partial}{\partial \greenE{x}}(2\greenE{x} + 3y + z - \lambda(\greenE{x}^2 + y^2 + z^2 - 1)) &= 2 - \lambda 2\greenE{x} = 0 \\ \dfrac{\partial}{\partial \goldE{y}}(2x + 3\goldE{y} + z - \lambda(x^2 + \goldE{y}^2 + z^2 - 1)) &= 3 - \lambda 2\goldE{y} = 0 \\ \dfrac{\partial}{\partial \maroonE{z}}(2x + 3y + \maroonE{z} - \lambda(x^2 + y^2 + \maroonE{z}^2 - 1)) &= 1 - \lambda 2\maroonE{z} = 0 \\ \end{aligned}
Remember, setting the partial derivative with respect to lambda equal to 0 just restates the constraint.
λ(2x+3y+zλ(x2+y2+z21))=x2y2z2+1=0\begin{aligned} \dfrac{\partial}{\partial \redE{\lambda}}(2x + 3y + z - \redE{\lambda}(x^2 + y^2 + z^2 - 1)) &= -x^2-y^2-z^2 + 1 = 0 \\ \end{aligned}
Solving for start color greenE, x, end color greenE, start color goldE, y, end color goldE and start color maroonE, z, end color maroonE in the first three equations above, we get
x=212λy=312λz=112λ\begin{aligned} \quad \greenE{x} &= 2\cdot \dfrac{1}{2\lambda} \\ \goldE{y} &= 3\cdot \dfrac{1}{2\lambda} \\ \maroonE{z} &= 1\cdot \dfrac{1}{2\lambda} \end{aligned}
Ah, what beautiful symmetry. Each of these expressions has the same start fraction, 1, divided by, 2, lambda, end fraction factor, and the coefficients 2, 3 and 1 match up with the coordinates of v, with, vector, on top. Being good math students as we are, we won't let good symmetry go to waste. In this case, combining the three equations above into a single vector equation, we can relate u^\hat{\textbf{u}} and v, with, vector, on top as follows:
u^=[xyz]=12λ[231]=12λv\begin{aligned} \quad \hat{\textbf{u}} = \left[ \begin{array}{c} x \\ y \\ z \end{array} \right] = \dfrac{1}{2\lambda} \left[ \begin{array}{c} 2 \\ 3 \\ 1 \end{array} \right] = \dfrac{1}{2\lambda}\vec{\textbf{v}} \end{aligned}
Two-dimensional analogy showing the two unit vectors which maximize and minimize the quantity u^v\hat{\textbf{u}} \cdot \vec{\textbf{v}}.
Therefore u^\hat{\textbf{u}} is proportional to v, with, vector, on top! Geometrically, this means u^\hat{\textbf{u}} points in the same direction as v, with, vector, on top. There are two unit vectors proportional v, with, vector, on top,
  • One which points in the same direction, this is the vector that start color greenE, m, a, x, i, m, i, z, e, s, end color greenE u^v\hat{\textbf{u}} \cdot \vec{\textbf{v}}.
  • One which points in the opposite direction. This one start color redE, m, i, n, i, m, i, z, e, s, end color redE u^v\hat{\textbf{u}} \cdot \vec{\textbf{v}}.
We can write these two unit vectors by normalizing v, with, vector, on top, which just means dividing v, with, vector, on top by its magnitude:
u^max=vvu^min=vv\begin{aligned} \quad \greenE{\hat{\textbf{u}}_{\text{max}}} &= \dfrac{\vec{\textbf{v}}}{||\vec{\textbf{v}}||} \\ \\ \redE{\hat{\textbf{u}}_{\text{min}}} &= -\dfrac{\vec{\textbf{v}}}{||\vec{\textbf{v}}||} \end{aligned}
The magnitude vertical bar, vertical bar, v, with, vector, on top, vertical bar, vertical bar is square root of, 2, start superscript, 2, end superscript, plus, 3, start superscript, 2, end superscript, plus, 1, start superscript, 2, end superscript, end square root, equals, square root of, 14, end square root, so we can write the maximizing unit vector u^max\greenE{\hat{\textbf{u}}_{\text{max}}} explicitly as like this:
u^max=[2/143/141/14] \greenE{\hat{\textbf{u}}_{\text{max}}} = \left[ \begin{array}{c} 2 / \sqrt{14} \\ 3 / \sqrt{14} \\ 1 / \sqrt{14} \end{array} \right]

Just skip the Lagrangian

If you read the last article, you'll recall that the whole point of the Lagrangian L\mathcal{L} is that setting L=0\nabla \mathcal{L} = 0 encodes the two properties a constrained maximum must satisfy:
  • Gradient alignment between the target function and the constraint function,
    f(x,y)=λg(x,y)\begin{aligned} \quad \nabla \blueE{f(x, y)} &= \lambda \nabla \redE{g(x, y)} \\ \end{aligned}
  • The constraint itself,
    g(x,y)=c\begin{aligned} \quad \redE{g(x, y) = c} \end{aligned}
When working through examples, you might wonder why we bother writing out the Lagrangian at all. Wouldn't it be easier to just start with these two equations rather than re-establishing them from L=0\nabla \mathcal{L}=0 every time? The short answer is yes, it would be easier. If you find yourself solving a constrained optimization problem by hand, and you remember the idea of gradient alignment, feel free to go for it without worrying about the Lagrangian.
In practice, it's often a computer solving these problems, not a human. Given that there are many highly optimized programs for finding when the gradient of a given function is 0, it's both clean and useful to encapsulate our problem into the equation L=0\nabla \mathcal{L} = 0.
Furthermore, the Lagrangian itself, as well as several functions deriving from it, arise frequently in the theoretical study of optimization. In this light, reasoning about the single object L\mathcal{L} rather than multiple conditions makes it easier to see the connection between high-level​ ideas. Not to mention, it's quicker to write down on a blackboard.
In either case, whatever your future relationship with constrained optimization might be, it is good to be able to think about the Lagrangian itself and what it does. The examples above illustrate how it works, and hopefully help to drive home the point that L=0\nabla \mathcal{L} = 0 encapsulates both del, f, equals, lambda, del, g and g, left parenthesis, x, comma, y, right parenthesis, equals, c in a single equation.