Unizor is a site where students can learn the high school math (and in the future some other subjects) in a thorough and rigorous way. It allows parents to enroll their children in educational programs and to control the learning process.
We will illustrate the application of Lagrangian mechanics by analyzing the movement of a mathematical pendulum - a problem we have already discussed in the Physics 4 Teens - Mechanics - Pendulum, Spring - Pendulum using the Newtonian approach.
We recommend reviewing the lecture mentioned above and refresh the Newtonian method of deriving the main equation of motion of the pendulum: α"(t) = −(g/l)·sin(α(t))
which was obtained from properly determining the force F that moves a pendulum as a vector sum of the gravity force directed vertically down and the tension of an unstretchable thread that keeps an object at the free end of a thread on a constant distance from the fixed end of a thread.
The two forces involved in formation of a resulting force F, gravity P=m·g and tension of a thread T, had to be combined using the rules for addition of vectors, which required some thinking.
Let's apply the Lagrangian mechanics to this problem using an angle of a thread with a vertical α as the one and only parameter that determines a position of an object.
This way of identification of a position is more convenient than Cartesian coordinates originated at the fixed end of a thread because both of them can be easily derived from α x = l·sin(α) y = −l·cos(α)
The Lagrangian is the difference between kinetic and potential energies. L(α(t),α'(t)) =
= Ekin(α'(t)) − Epot(α(t))
Kinetic energy depends on mass M and linear speed of an object along its circular trajectory v=l·α'(t) Ekin = ½M·v² =
= ½M·l²·[α'(t)]²
Potential energy depends on a mass of an object M, its height over the ground h(t) and an acceleration of free fall g. Epot(t) = M·g·h(t)
If the origin of our coordinates, the fixed end of a thread, is at height H over the ground, h = H − l·cos(α)
and, therefore, Epot(t) = M·g·[H−l·cos(α(t))]
Now we can construct the Euler-Lagrange equation (∂/∂α)L(α(t),α'(t)) =
= (d/dt)(∂/∂α')L(α(t),α'(t))
Calculate left and right sides separately. (∂/∂α)L(α(t),α'(t)) =
= (∂/∂α)[Ekin(α'(t)) − Epot(α(t))] =
= (∂/∂α)[−Epot(α(t))] =
= (∂/∂α)[−M·g·[H−l·cos(α(t))]] =
= −M·g·l·sin(α(t))
The Euler-Lagrange equation is −M·g·l·sin(α(t)) = M·l²·α"(t)
or α"(t) = −(g/l)·sin(α(t))
which is exactly as applying Newtonian mechanics.
If you don't think the Lagrangian mechanics is simpler than Newtonian for those who are familiar with Calculus of partial derivatives, consider the next example.
Spring Pendulum
A weightless spring replaces an unstretchable thread of the previous problem.
The spring and an object on its end are in a weightless frictionless tube that maintains a straight form, so an object has two degrees of freedom - radial inside a tube stretching and squeezing a spring and pseudo-circular as it moves together with a tube in a pendulum like motion.
The problem of specifying the motion of an object is much more complex here because the spring tension is changing not only with an angle α(t) but also because of the movement of an object within a tube.
However, using the Langrangian mechanics, this problem can be analyzed with much less efforts and the corresponding differential equation can be constructed relatively easy.
As before, let's calculate the kinetic and potential energies of an object.
The object's kinetic energy can be calculated as a sum of its radial movement's kinetic energy inside the tube and kinetic energy of its pseudo-circular movement perpendicularly to the tube.
The reason is simple. The object's linear velocity vector can be represented as a sum of two perpendicular to each other vectors, one is inside the tube and another perpendicular to it. v = v|| + v⊥
Since kinetic energy depends on a square of the linear speed, according to Pythagorean Theorem v² = v||² + v⊥²
from which follows that Ekin = ½M·v² =
= ½M·v||² + ½M·v⊥²
Distance of an object from the fixed point of oscillation l is variable and depends on time: l=l(t).
Therefore, v||(t)=l'(t).
Perpendicular to l component of an object speed is v⊥(t)=l(t)·α'(t).
Therefore, kinetic energy of an object is Ekin = ½M·[l'(t)²+l(t)²·α'(t)²]
The object's potential energy can be calculated as a sum of its potential energy due to gravity and potential energy of a stretched or a squeezed spring.
If the fixed point of oscillation is at the height H above the ground, an object is at height h(t)=H−l(t)·cos(α(t)) above the ground, and potential energy of an object related to its position in the gravitational field is Egrav = M·g·[H−l(t)·cos(α(t))]
Potential energy of a spring depends on the degree of its stretching or squeezing.
Assume, the length of a spring in a neutral state is l0. Then the length of stretching or squeezing at time t is l(t)−l0.
Therefore, potential energy of an object related to a spring is Espr = ½k·[l(t)−l0]²
where k is a coefficient of elasticity of a spring.
Total potential energy of an object is Epot = Egrav + Espr =
= M·g·[H−l(t)·cos(α(t))] +
+ ½k·[l(t)−l0]²
All which remains is to write the Euler-Lagrange equation for this Lagrangian.
The problem is, we are familiar only with the Euler-Lagrange equation for a system with one degree of freedom, like x(t). Now we have two degrees of freedom - l(t) and α(t).
Fortunately, the Euler-Lagrange equation can be specified for each degree of freedom independently, which will be proven in the next lecture.
Therefore, we can write two independent Euler-Lagrange equations (skipping (t) for brevity): (∂/∂l)L(l,l',α,α') =
= (d/dt)(∂/∂l')L(l,l',α,α')
and (∂/∂α)L(l,l',α,α') =
= (d/dt)(∂/∂α')L(l,l',α,α')
The first equation is M·l·α'²+M·g·cos(α)−k·(l−l0) =
= M·l"
The second equation is −M·g·l·sin(α) = M·l²·α"
or −g·sin(α) = l·α"
or −(g/l)·sin(α) = α"
which looks exactly the same as in the case above with unstretchable thread instead of a spring.
First of all, let's stipulate that the Laws of Newton are based on experiment, they are not derived from some more fundamental theories.
Lagrangian mechanics presents a different approach to analyze the motion than Newtonian mechanics.
In many cases it presents a simpler, more universal way to describe the motion of a mechanical system than Newtonian one.
Let's start with an example where both methodologies lead to the same result.
Spring Oscillation
Consider an ideal spring with one end fixed and a point-mass attached to another end.
The oscillations will occur along the length of a spring that coincides with X-axis.
Position of a point-mass on the spring's end will be described by it X-coordinate x(t) as a function of time t with initial position at time t=0 being an origin of X-coordinate, that is x(0)=0.
According to the Hooke's Law, the force F of a spring applied to an object attached to its end is proportional to a length x by which this spring is stretched or squeezed from its neutral position and directed always towards a neutral point x=0 of no stretching or squeezing. F = −k·x
where k is a coefficient of elasticity that characterizes physical properties of a spring.
According to the Newton's Second Law, the acceleration a of an object is proportional to a force F applied to it F = m·a
where m is the object's mass being a coefficient of proportionality.
A linear acceleration a(t), as a function of time, is a derivative of a linear speed v(t) by time t a(t) = dv(t)/dt = v'(t)
A linear speed v(t) is, in turn, a derivative of a position of an object x(t) by time v(t) = dx(t)/dt = x'(t)
Therefore, an acceleration is a second derivative of a position by time a(t) = d²x(t)/dt² = x"(t)
Equating the value of force by Hooke's Law to that of Newton's Second Law, we get a differential equation that defines a motion of the object −k·x(t) = m·a(t)
or, equivalently, −k·x(t) = m·x"(t)
Solution to this differential equation of the second order is a trajectory of our object.
Let's approach the same problem from another side.
An object attached to a spring's end that has mass m and linear speed v has kinetic energy that is equal to Ekin = ½m·v²
Since speed v(t), as a function of time t is just a derivative of a position x(t) by time, we can express kinetic energy in terms of position, as in the case of potential energy above Ekin = ½m·[x'(t)]²
NOTICE: Ekin depends explicitly only on speed x'(t) and d/dt[∂(Ekin)/∂x'] =
= d/dt[m·x'] = mx"(t) = F
A stretched or a squeezed spring has potential energy equal to the amount of work needed to stretch or squeeze it against the force of its elasticity (you can refer to a lecture Physics 4 Teens - Energy - Potential Energy - Spring on UNIZOR.COM).
Thus, a potential energy of a spring squeezed or stretched by the length x(t), as a function of time t, equals to Epot = ½k·[x(t)]²
where k is the same coefficient of elasticity as above that characterizes the physical properties of a spring.
NOTICE: Epot depends explicitly only on position x(t) and ∂(−Epot)/∂x = −k·x(t) = F
Based on two NOTICEs above, it is IMPORTANT to see that ∂(−Epot)/∂x =
= d/dt[∂(Ekin)/∂x'] = F
Since Epot depends explicitly only on position x(t), not on speed x'(t), and Ekin depends explicitly only on speed x'(t), not on position x(t), ∂(Ekin−Epot)/∂x =
= ∂(−Epot)/∂x =
= d/dt[∂(Ekin)/∂x'] =
= d/dt[∂(Ekin−Epot)/∂x']
At this point it's essential to recall the Euler-Lagrange equation (you can refer to a lecture Physics+ 4 All - Variations - Euler-Lagrange on UNIZOR.COM) - a differential equation of the second order that defines a function f0(x) that minimizes or maximizes a functional Φ[f(x)] = ∫[a,b]F[x,f(x),f '(x)]dx
where F[...] is some known smooth real function of three arguments - real variable x, real value of function f(x) and real value of derivative f '(x).
This Euler-Lagrange differential equation looks like this: (∂/∂f)F [x,f0(x),f '0(x)] =
= (d/dx)(∂/∂f ')F [x,f0(x),f '0(x)]
Let's change more abstract symbols x and f(x) to those applicable to our task.
The argument will be time t instead of abstract x. The function will be a position x(t) instead of abstract f0(x).
Now the Euler-Lagrange equation looks like (∂/∂x)F [t,x(t),x'(t)] =
= (d/dt)(∂/∂x')F [t,x(t),x'(t)]
Compare this to the equation above that equates partial derivative from Ekin−Epot by x with its partial derivative by x'.
Obviously, L=Ekin−Epot satisfies the Euler-Lagrange equation (∂/∂x)L[t,x(t),x'(t)] =
= (d/dt)(∂/∂x')L[t,x(t),x'(t)]
which in this case is exactly the same as the equation obtained from the Newton's Second Law −k·x(t) = m·x"(t)
Expression L[x(t),x'(t)]=Ekin(x')−Epot(x)
is called Lagrangian.
Consider an object moving along some trajectory x(t) from the moment of time t1 to the moment of time t2.
At any moment it has certain kinetic and potential energy, so we can constract a Lagrangian L(t) = Ekin(x'(t)) − Epot(x(t))
Consider an integral of this Lagrangian by time S = ∫[t1,t2]L(t)·dt
This integral is call action.
The trajectory that minimizes or maximizes this action is a solution to an Euler-Lagrange equation (∂/∂x)L[x(t),x'(t)] =
= (d/dt)(∂/∂x')L[t,x(t),x'(t)]
which has the same solution as Newtonian F=m·a.
Therefore, the trajectory obtained using a Lagrangian approach is the same the one from Newtonian mechanics. BUT IN SOME CASES IT MIGHT BE MUCH MORE CONVENIENT.
The equivalence of a differential equation obtained from the Newton's Second Law and the Euler-Lagrange equation is not just a coincidence peculiar for springs.
In general, kinetic energy always depends on mass and speed Ekin = ½m·v²
In general, derivative of Ekin by speed v is a momentum of motion p p = m·v = ∂/∂v(Ekin) =
= ∂/∂v(½mv²)
In general, derivative of momentum p by time t is the force F dp/dt = d(m·v)/dt = m·a = F
In general, potential energy is, actually, an amount of work W.
Since dW=F·dx, its derivative by x is the force, and a derivative of potential energy by coordinates gives the force as well.
So, the Newton's Second Law and Euler-Lagrange equation are equivalent.
Why do we need both?
Practical mechanical problems are rarely as simple as we are taught at high school.
It appears that the more complicated problems with more than one object involved are easier to solve using Lagrangian L=Ekin−Epot than to deal with complicated forces and their interaction constructing the equations of F=m·a type.
The next lectures will be dedicated to a few important physical problems and their solutions using Euler-Lagrange equation.
The approach to choose a path along which any system progresses (light propagates, planet moves around the Sun etc.) based on minimizing some numeric function defined for each path appears to be very valuable in Physics, and it helps to solve certain tasks faster and more efficiently than using only the classic Newton's Laws.
Before generalizing this idea, let's consider a specific problem suggested by Johann Bernoulli in 1696.
It's called the Brachistochrone problem (from Greek 'brachistos' + 'chronus' = 'short' + 'time') and is formulated as follows.
Consider two points A and B in the uniform gravitational field (like near the surface of the Earth) with force of gravity directed vertically down. These points are positioned on different heights and not on the same vertical.
A small object should slide from the top point A(a,A) to the lower point B(b,B) along some frictionless supporting track.
We use a standard Cartesian reference frame with Y-coordinates increasing upwards, and X-coordinates increasing from left to right on a picture above.
The vector of gravity force is directed down along Y-axis.
Therefore, Y-coordinateA is greater then B, and X-coordinatea is less than b.
The supporting track can go straight from A to B or take some curved form.
The straight brown line of descend on a picture above is shorter, but the curved blue or purple lines, while longer, allow for an object to gain speed faster and the resulting time of descend might still be shorter than for a straight line.
The problem is to determine the shape of a supporting track to minimize the time of sliding.
Mathematically speaking, we have to consider all smooth functions f(x) on a segment [a,b] that satisfy the conditions: f(a) = A and f(b) = B
Then, out of all these functions, we have to find such that represents the curve of fastest descend from A(a,A) to B(b,B).
This simply formulated problem is far from having a simple solution.
Best mathematicians of 17th century worked on it and solved using different methodologies.
Let's solve it using the apparatus developed for finding a minimum of a functional - the Euler-Lagrange equation. This methodology was discussed in the previous lectures of this course.
We have to express the time T of moving from point A to point B as a functional of a trajectory represented by function f(x): T = Φ[f(x)]
and find a function y=f0(x) that minimizes this functional.
Hopefully, our functional will look like Φ[f(x)] = ∫[a,b]F[x,f(x),f '(x)]dx
where F[...] is some known smooth real function of three arguments - real variable x, real value of function f(x) and real value of derivative f '(x)
and we will be able to apply Euler-Lagrange equation to find y=f0(x) as its solution.
The picture above illustrates a trajectory of a movement of an object in a uniform gravitational field along a supporting curved track described by a function y=f(x).
The object's weight (the force of gravity vector) is P=m·g, where m is its mass and g is an acceleration of a free falling in the gravitational field.
Besides the gravitational force, a reaction of a supporting curved track vector of force R acts on this object - the force always directed perpendicularly to a tangential line to a curve.
Both the force of gravity P and the reaction force of a curved track R result in the force vector F moving an object along a trajectory and directed along a tangential line to a curved track.
Consider a segment of a trajectory from x to x+dx, where dx is an infinitesimal increment of argument x.
This segment has a length ds and its value satisfies the Pythagorean Theorem (ds)² = (dx)² + (dy)²
where dy=d(f(x))=f '(x)·dx
so (ds)²=[1+(f '(x))²]·(dx)²
and ds=√1+(f '(x))²·dx
Assume, at point x the linear sliding speed of an object along its trajectory is v(x).
Then the time an object spends passing a segment ds equals to ds/v(x) = √1+(f '(x))²·dx / v(x)
To find speed of an object v(x), recall the Conservation of Energy Law.
Potential energy of an object depends on its mass and the height over some zero level.
Assume, the zero level of potential energy is at y=0.
Then the initial potential energy Ua of our object at the beginning of its motion is Ua = m·g·A
where m is an object's mass, g is an acceleration of free falling and A is its initial Y-coordinate.
Its kinetic energy Ka at the beginning is zero because its speed along a trajectory is zero at that point.
Then the total initial mechanical energy of an object (potential + kinetic) is Ea = m·g·A
When our object moved along a curve from its initial position at (a,A) to position (x,f(x)), its potential energy diminished and kinetic energy grew by the same amount because of the Energy Conservation Law.
New potential energy equals to Ux = m·g·f(x)
New kinetic energy equals to Kx =m·v²(x)/2.
The decrease in potential energy Ua−Ux should be compensated by an increase in kinetic energy Kx.
From the Energy Conservation Law the total energy should remain the same
Ea = Ex
which leads us to an equation m·g·[A−f(x)] = m·v²(x)/2
Therefore, v(x) = √2g·[A−f(x)]
Now the time an object spends passing a segment ds equals to
dT(x) =
√1+(f '(x))²
√2g·[A−f(x)]
dx
Integrating this by x from a to b gives a total time of moving along a trajectory - the functional we need to minimize Φ[f(x)] = ∫[a,b]dT(x)
or
Φ[f(x)]=∫[a,b]
√1+(f '(x))²
√2g·[A−f(x)]
dx
At this point we can drop 2g from the denominator, as this does not change the function-argument f0(x) that minimizes functional Φ[f(x)].
So, our task is to minimize a functional
Φ[f(x)]=∫[a,b]
√1+(f '(x))²
√A−f(x)
dx
Recall from the previous lecture that for a given functional Φ[f(x)] = ∫[a,b]F[x,f(x),f '(x)]dx
the function f0(x) that minimizes or maximizes it should satisfy the Euler-Lagrange differential equation (∂/∂f)F [x,f0(x),f '0(x)] −
− (d/dx)(∂/∂f ')F [x,f0(x),f '0(x)] = 0
Let's construct this equation for our case.
To shorten formulas, let's temporarily use h(x) instead of f '(x) and
omit (x) from both f(x) and h(x).
Using this substitution, our functional looks like
Φ[f(x)]=∫[a,b]
√1+h²
√A−f
dx
From this follows that an expression under an integral is
F[x,f,h]=
√1+h²
√A−f
In terms of these functions, the Euler-Lagrange equation looks like (∂/∂f)F [x,f,h] =
= (d/dx)(∂/∂h)F [x,f,h]
Right side of an equation is (d/dx)(∂/∂h)F [x,f,h] =
= (d/dx)(∂/∂h)
√1+h²
√A−f
=
= (d/dx)
h
√1+h²·√A−f
=
=
h'
√1+h²·√A−f
−
−
h·2h·h'
2√(1+h²)³ ·√A−f
+
+
h·f '
2√1+h²·√(A−f)³
Note that we've agreed to substitute h(x) for f '(x) in the numerator of the last term.
Equating left and right sides of the Euler-Lagrange equation, multiplying both sides by 2√(1+h²)³ ·√(A−f)³ and opening the parenthesis leads to a simpler equation 1 + h² − 2h'·(A−f) = 0
Returning back to original symbols, replace f with f(x) and h with f '(x) getting a second order differential equation for function y=f(x) 1+[f '(x)]²−2f "(x)·[A−f(x)] = 0
or, equivalently, 1+y'²−2y"·(A−y) = 0
The solution y=f0(x) to this second order differential equation is the function that minimizes the functional Φ[f(x)].
A not so obvious transformation can reduce this second order differential equation to the first order one.
Integration is easy if a subject of an integration is a derivative of some function
∫ s'(x)·dx = s(x) + C
where C is some constant.
The left side of the equation above is not a derivative of any function, but let's see what happens if we multiply it by −y'. −y'·[1+y'²−2y"·(A−y)] =
= −y'·(1+y'²)+2y'·y"·(A−y)
Notice that we can substitute −y' with (A−y)' and 2y'·y" with (1+y'²)' to get (A−y)'·(1+y'²) +
+ (A−y)·(1+y'²)'
which is a derivative of a product of (A−y) by (1+y'²).
Therefore, if y=f0(x) is a solution to Euler-Lagrange equation 1+y'²−2y"·(A−y) = 0,
it's also a solution to the equation −y'·[1+y'²−2y"·(A−y)] =
[(A−y)·(1+y'²)]' = 0
or,
{[A−f0(x)]·[1+f0'(x)²]}' = 0
Integrating this, we get the first order differential equation
[A−f0(x)]·[1+f0'(x)²] = C
or, in a shorter notation, (A−y)·(1+y'²) = C
where C is some constant.
We can solve this differential equation for function y=f0(x) as follows.
Let's resolve this differential equation for y':
dy/dx = ±√
C−A+y
A−y
From physical considerations, our function must decrease with increase of x. Therefore, it's derivative must be negative, and we will choose a minus sign in all transformations below.
This differential equation can be transformed to integrate separately by x and y:
−√
A−y
C−A+y
·dy = dx
We can integrate now both sides separately.
Without getting into the details of integration, we can just write the result of the integration by y of the left size.
− ∫√
A−y
C−A+y
·dy =
= − C·arctan√
C−A+y
A−y
−
− √(A−y)·(C−A+y) + D
where C and D are some constants.
Integration of dx produces x+E, where E is yet another constant, but we can combine it into constant D that appears in integration by dy.
Therefore, our function y=f0(x) that minimizes the object's time of sliding down can be expressed as
x = −C·arctan√
C−A+y
A−y
−
− √(A−y)·(C−A+y) + D
where C and D are some constants.
It's not really expressed as y being a function of x, just the opposite way, but it should not stop us to explore this curve.
As we see, the function depends on two constants C and D. At the same time we have two initial conditions: f0(a) = A and f0(b) = B
Substituting x=a, y=A into an equation above will produce on equation for C and D.
Substituting x=b, y=B into this equation will produce another equation for C and D.
These two equations determine the values of C and D needed to specify the full solution to a problem.
As an example, we used points A(1,5) and B(4,1) and calculated approximate values C=5.8 and D=−10.1.
Here is the graph that, in particular, crosses points A(1,5) and B(4,1).
At the end of this discussion about brachistochrone it's appropriate to mention that the curve we have found as a solution to the Euler-Lagrange equation is a cycloid - a trajectory of a point on a circle that is rolling along a straight line.
We have not discussed this because our main purpose was just to show how to use the Euler-Lagrange equation to find an extremum of a functional.
Based on theoretical knowledge of functionals, their extremums and calculus of variations (method of finding these extremums using directional derivatives), let's see what is the result of application of this method in many cases occurring in Physics.
The spectrum of functions, that these functionals are defined on, in many cases is reduced to smooth (sufficiently differentiable) real functions defined on some segments [a,b] with fixed values at the ends of this segment: f(a)=A; f(b)=B.
For examples, to find the best in some sense trajectory we have to know the starting and ending points of this trajectory and search for the best one among functions with fixed values at the beginning (start) point and at the ending (finish) point of movement.
Many problems in Physics are related to finding extremums of specific type of functionals of the above mentioned functions: Φ[f(x)] = ∫[a,b]F[x,f(x),f '(x)]dx
where F[...] is some known smooth real function of three arguments - real variable x, real value of function f(x) and real value of derivative f '(x).
This function F[...] is derived from known laws of Physics.
The function f(x), the argument to a functional Φ[f(x)], is the function from a class of smooth functions defined on some segment with fixed values at the ends described above.
Our task is to find function f0(x) where functional Φ[f(x)] has an extremum.
The plan is:
1. Assuming f0(x) is a point where Φ[f(x)] reaches its extremum, increment this function by some Δ(x) to f1(x)=f0(x)+Δ(x), keeping in mind that f1(x) should belong to the same class as f0(x), that is it should be smooth, defined on the same segment [a,b] and have the same values A and B at the ends of this segment, which means that Δ(x) should be smooth, defined on the same [a,b] and be equal to zero at the ends of this segment.
2. Consider all functions of type g(x,t)=f0(x)+t·Δ(x).
This set of functions is parameterized by parameter t and g(x,0)=f0(x).
This set of functions can be considered as filling some neighborhood of function f0(x) on a "line" from f0(x) in a direction defined by increment Δ(x).
3. Since f0(x) is a point where functional Φ[f(x)] has an extremum, changing the value of t closer and closer to zero would result in moving Φ[g(x,t)] closer and closer to Φ[f0(x)], which is an extremum.
Therefore, a derivative of Φ[g(x,t)] by t at t=0 (that is, when g(x,t)=g(x,0)=f0(x)) should be equal to zero.
That gives an equation where f0(x) and Δ(x) participate.
4. Since increment Δ(x) can be chosen relatively freely (as long as it's a smooth function with values of zero on both ends of [a,b]), the equation obtained at step 3 above should be true for any Δ(x), which gives additional condition that might lead to identification of f0(x).
Let's follow the plan step by step.
1. f1(x)=f0(x)+Δ(x)
Δ(a)=Δ(b)=0
2. g(x,t)=f0(x)+t·Δ(x) Φ[g(x,t)] =
= ∫[a,b]F[x,g(x,t),g'(x,t)]dx
(apostrophe is a derivative by variable x)
where g(x,t)=f0(x)+t·Δ(x).
Now the functional Φ[g(x,t)] can be considered as a function of one real argument t that characterizes how close function g(x,t) is to an assumed point of extremum f0(x).
3. The derivative of functional Φ[g(x,t)] considered as a function of one real argument t by parameter t must be equal to zero at t=0 and g(x,0)=f0(x): (d/dt)Φ[g(x,t)]|t=0 = 0
For many practical problems in Physics functional Φ[g(x,t)] is an integral by x presented in the beginning, while differentiation above is by t.
These two variables (x and t) and corresponding operations (integration by x and differentiation by t) are independent. If, instead of integration by x we had a sum by some index i, we would not hesitate to replace a derivative of a sum with a sum of derivatives.
Integration is just a sum of infinitesimal parts and the same rule of interchanging the order of operations can be applied.
Therefore, (d/dt)Φ[g(x,t)] =
= (d/dt)∫[a,b]F[x,g(x,t),g'(x,t)]dx =
= ∫[a,b](d/dt)F[x,g(x,t),g'(x,t)]dx
Since argument of differentiation t is contained inside a function F[...] of multiple arguments, the derivative by t should be taken using partial derivative by each argument (∂F/∂x, ∂F/∂g, ∂F/∂g') multiplied by an inner derivative of this argument by t (correspondingly, dx/dt, dg/dt, dg'/dt):
(d/dt)F[x,g(x,t),g'(x,t)] =
= (∂/∂x)F[x,g(x,t),g'(x,t)]·(dx/dt) +
+ (∂/∂g)F[x,g(x,t),g'(x,t)]·(dg/dt) +
+ (∂/∂g')F[x,g(x,t),g'(x,t)]·(dg'/dt)
Since the first argument of function F[...] is just x and does not depend on t, its derivative by t is zero: dx/dt = 0
Since g(x,t)=f0(x)+t·Δ(x), its derivative by t is dg(x,t)/dt = Δ(x)
Since g'(x,t)=f '0(x)+t·Δ'(x), dg'(x,t)/dt = Δ'(x)
Now the derivative by t looks like (d/dt)F[x,g(x,t),g'(x,t)] =
= (∂/∂g)F[x,g(x,t),g'(x,t)]·Δ(x) +
+ (∂/∂g')F[x,g(x,t),g'(x,t)]·Δ'(x)
Therefore, the expression for a derivative of our functional Φ[g(x,t)] by parameter t is:
As mentioned above, the condition (d/dt)Φ[g(x,t)]|t=0 = 0
is necessary for f0(x)=g(x,0) to be a function-argument where our functional reaches its extremum.
Substituting t=0 and changing g(x,0) to f0(x), we have an equation
To shorten the formulas, let's replace (∂/∂f)F[x,f0(x),f '0(x)]
with F∂f [x,f0(x),f '0(x)]
and (∂/∂f ')F[x,f0(x),f '0(x)]
with F∂f ' [x,f0(x),f '0(x)]
With this substitution the equation above looks like 0 = (d/dt)Φ[f0(x)] =
=∫[a,b] F∂f [x,f0(x),f '0(x)]·Δ(x)dx+
+∫[a,b] F∂f ' [x,f0(x),f '0(x)]·Δ'(x)dx
If only the first integral in the above expression participated in the equation, the requirement that this integral should be equal to zero regardless of Δ(x) would cause the smooth function under an integral to be zero everywhere on [a,b].
Existence of the second integral complicates the picture, but we can change the second integral to contain only Δ(x) instead of Δ'(x) by using the formula of integrating by parts for two functions u(x) and v(x):
∫[a,b]u·dv = u·v|[a,b] − ∫[a,b]v·du
Let's apply this formula for the second integral in the equation above, substituting u(x) = F∂f '[x,f0(x),f '0(x)] v(x) = Δ(x)
Then du(x) = u'(x)·dx =
= (d/dx)F∂f '[x,f0(x),f '0(x)]·dx dv(x) = v'(x)·dx = Δ'(x)·dx
Using these substitutions we transform the second integral in the above equation as follows
∫[a,b] F∂f ' [x,f0(x),f '0(x)]·Δ'(x)dx =
= ∫[a,b] F∂f ' [x,f0(x),f '0(x)]·dΔ(x) =
= F∂f ' [x,f0(x),f '0(x)]·Δ(x)|[a,b] −
− ∫[a,b] Δ(x)·dF∂f ' [x,f0(x),f '0(x)] =
The first component of the above expression equals to zero: F∂f ' [x,f0(x),f '0(x)]·Δ(x)|[a,b] = 0
because Δ(a)=Δ(b)=0
So, the final expression for a variation of our functional contains two integrals. both with Δ(x) as a factor: 0 = (d/dt)Φ[f0(x)] =
=∫[a,b] F∂f [x,f0(x),f '0(x)]·Δ(x)dx−
−∫[a,b]
Δ(x)·dF∂f ' [x,f0(x),f '0(x)] =
= ∫[a,b] h(x)·Δ(x)·dx
where h(x) = F∂f [x,f0(x),f '0(x)] −
− (d/dx)F∂f ' [x,f0(x),f '0(x)]
Since the last integral must be equal to zero for any Δ(x), function h(x) must be equal to zero for any x∈[a,b].
Therefore, the necessary condition for function-argument f0(x) to be a point where functional Φ[f(x)] reaches its extremum is that f0(x) is a solution to a differential equation h(x)=0
or, in terms of original functional Φ[f(x)], F∂f [x,f0(x),f '0(x)] −
− (d/dx)F∂f ' [x,f0(x),f '0(x)] = 0
CONCLUSION
Consider a class Ω of all sufficiently differentiable real functions f(x) on segment [a,b] that take fixed values on the ends of this segment: f(a)=A and f(b)=B.
Given functional Φ[f(x)] = ∫[a,b]F[x,f(x),f '(x)]dx
defined for all f(x)∈Ω
and where F[...] is some known sufficiently differentiable real function of three arguments
real variable x∈[a,b],
real value of function f(x)∈Ω
and real value of its derivative f '(x).
If function f0(x) from the same class Ω is where the above functional reaches its extremum (minimum or maximum) than this function should be a solution to Euler-Lagrange differential equation (∂/∂f)F [x,f0(x),f '0(x)] −
− (d/dx)(∂/∂f ')F [x,f0(x),f '0(x)] = 0
Among all smooth (sufficiently differentiable) functions f(x) defined on segment [a,b] and taking values at endpoints f(a)=A and f(b)=B find the one with the shortest graph between points (a,A) and (b,B).
Solution
First of all the length of a curve representing a graph of a function is a functional with that function as an argument. Let's determine its explicit formula in our case.
The length ds of an infinitesimal segment of a curve that represents a graph of function y=f(x) is ds = [(dx)² + (dy)²]½ =
= [(dx)² + (df(x))²]½ =
= [1 + (df(x)/dx)²]½·(dx) =
= [1 + f '(x)²]½·(dx)
The length of an entire curve would then be represented by the following functional of function f(x): Φ[f(x)] = ∫[a,b][1 + f '(x)²]½dx
We have to minimize this functional within a family of smooth functions defined on segment [a,b] and satisfying initial conditions f(a)=A and f(b)=B
As explained in the previous lecture, if functional Φ[f(x)] has local minimum at function-argument f0(x), the variation (directional derivative) d/dt Φ[f0(x)+t(f1(x)−f0(x))]
at function-argument f0(x) (that is, for t=0) in the direction from f0(x) to f1(x) should be equal to zero regardless of location of f1(x) in the neighborhood of f0(x).
Assume, f0(x) is a function that minimizes the functional Φ[f(x)] above.
Let f1(x) be another function from the family of functions defined on segment [a,b] and satisfying initial conditions f(a)=A and f(b)=B
Let Δ(x) = f1(x) − f0(x).
It is also defined on segment [a,b] and, according to its definition, satisfies the initial conditions Δ(a)=0 and Δ(b)=0.
Using an assumed point (function-argument) of minimum f0(x) of our functional Φ[f(x)], another point f1(x) that defines the direction of an increment of a function-argument and real parameter t, we can describe a subset of points (function-arguments) linearly dependent on f0(x) and f1(x) as f0(x)+t·(f1(x)−f0(x)) =
= f0(x) + t·Δ(x)
Let's calculate the variation (we will use symbol δ for variation) of functional Φ[f(x)] at any point (function-argument) defined above by function-argument f0(x) minimizing our functional, directional point f1(x) and real parameter t : δ[f0,f1,t]Φ[f(x)] =
= d/dt Φ[f0(x)+t(f1(x)−f0(x))] =
= d/dt Φ[f0(x)+t·Δ(x)] =
(use the formula for a length of a curve)
= d/dt ∫[a,b][1+((f0+t·Δ)')²]½dx
In the above expression we dropped (x) to shorten it.
The derivative indicated by an apostrophe is by argument x of functions f(x) and Δ(x).
Under very broad conditions, when smooth functions are involved, a derivative d/dt and integral by dx are interchangeable.
So, let's take a derivative d/dt from an expression under an integral first, and then we will do the integration by dx.
Now we can integrate the above expression by x on segment [a,b].
Let's use the integrating by parts using a known formula for two functions u(x) and v(x)
∫[a,b]u·dv = u·v|[a,b] − ∫[a,b]v·du
Use it for
u(x) =
f0'(x)+t·Δ'(x)
[1+(f0'(x)+t·Δ'(x))²]½
v(x) = Δ(x)
and, therefore, dv(x) = dΔ(x) = Δ'(x)·dx
with all participating functions assumed to be sufficiently smooth (differentiable to, at least, second derivative).
Since v(a) = Δ(a) = 0 and v(b) = Δ(b) = 0,
the first component of integration is zero u·v|[a,b]=u(b)·v(b)−u(a)·v(a)=0
Now the variation of our functional is δ[f0,f1,t]Φ[f(x)] =
= −∫[a,b]v(x)·du(x,t)
where
u(x,t) =
f0'(x)+t·Δ'(x)
[1+(f0'(x)+t·Δ'(x))²]½
v(x) = Δ(x)
As we know, the necessary condition for a local minimum of functional Φ[f(x)] at function-argument f0(x) is equality to zero of all its directional derivatives at point f0(x) (that is at t=0).
It means that for any direction defined by function f1(x) or, equivalently, defined by any Δ(x)=f1(x)−f0(x), the derivative by t of functional Φ[f0(x)+t·Δ(x)] should be zero at t=0.
So, in our case of minimizing the length of a curve between two points in space, the proper order of steps would be
(1) calculate the integral above getting variation of a functional δ[f0,f1,t]Φ[f0(x)+t·Δ(x)] which is a functional of three variables:
- real parameter t,
- function f0(x) that is an argument to an assumed minimum of functional Φ[f(x)],
- function Δ(x) that signifies an increment of function f0(x) in the direction of function f1(x);
(2) set t=0 obtaining a directional derivative of functional Φ[f(x)] at assumed minimum function-argument f0(x) and increment Δ(x): δ[f0,f1,t=0]Φ[f0(x)];
(3) equate this functional to zero and find f0(x), that solves this equation regardless of argument shift to f1(x).
Integration in step (1) above is by x, while step (2) sets the value of t.
Since x and t are independent variables, we can exchange the order and, first, set t=0 and then do the integration.
This simplifies the integration to the following δ[f0,f1,t=0]Φ[f(x)] =
= −∫[a,b]Δ(x)·du(x,0)
where
u(x,0)=
f0'(x)+0·Δ'(x)
[1+(f0'(x)+0·Δ'(x))²]½
=
f0'(x)
[1+(f0'(x))²]½
Therefore, δ[f0,f1,t=0]Φ[f(x)] =
= −∫[a,b]Δ(x)·d
f0'(x)
[1+(f0'(x))²]½
And the final formula for variation δ[f0,f1,t=0]Φ[f(x)] is
−∫[a,b]Δ(x)·[
f0'(x)
[1+f0'(x)²]½
]'dx
For δ[f0,f1,t=0]Φ[f(x)] to be equal to zero for t=0 regardless of Δ(x), or, in other words, for the integral above to be equal to zero at t=0 regardless of function Δ(x), function u'(x,0) (the one in [...]') must be identically equal to zero for all x∈[a,b].
If u'(x,0) is not zero at some point x (and in some neighborhood of this point since we deal with smooth functions), there can always be constructed such function Δ(x) that makes the integral above not-zero.
From this follows that u(x,0)=const.
Therefore,
f0'(x)
[1+f0'(x)²]½
= u(x,0) = const
from which easily derived that f0'(x)=const and, therefore, the function f0(x), where our functional has a minimum, is a linear function of x.
All that remains is to find a linear function f0(x) that satisfies initial conditions f0(a)=A and f0(b)=B.
Obviously, it's one and only function f0(x) = (B−A)·(x−a)/(b−a) + A
whose graph in (X,Y) Cartesian coordinates is a straight line from (a,A) to (b,B).
In this lecture we continue discussing the problem of finding a local extremum (minimum or maximum) of a functional that we introduced in the previous lectures.
To find a local extremum point x0 of a smooth real function of one argument F(x) we usually do the following.
(a) find a derivative F'(x) of function F(x) by x;
(b) if x0 is a local extremum point, the derivative at this point should be equal to zero, which means that x0 must be a solution of the following equation F'(x) = 0
Let's try to find a local extremum of a functional Φ[f(x)] using the same approach.
The first step presents the first problem: how to take a derivative of a functional Φ[f(x)] by its function-argument f(x)?
The answer is: WE CANNOT.
So, we need another approach, and we would like to explain it using an analogy with finding an extremum of a real function of two arguments F(x,y) defined on a two-dimensional XY plane with Cartesian coordinates, that is finding such a point P0(x0,y0) that the value F(P0)=F(x0,y0) is greater (for maximum) or smaller (for minimum) than all other F(P)=F(x,y), where point P(x,y) is in a small neighborhood of point P0(x0,y0).
As in case of functionals, we cannot differentiate function F(P) by P because, geometrically, there is no such thing as differentiating by point and, algebraically, we cannot simultaneously differentiate by two coordinates.
Yes, we can apply partial differentiation ∂F(x,y)/∂x by x separately from partial differentiation ∂F(x,y)/∂y by y, which will give a point of extremum in one or another direction. But what about other directions?
Fortunately, there is a theorem that, if both partial derivatives are zero at some point, it's really a point of extremum in all directions, but this is not applicable to functionals, and we will not talk about it at this moment.
An approach we choose to find an extremum of function F(P)=F(x,y) defined on a plane and that will be used to find an extremum of functionals is as follows.
Assume, point P0(x0,y0) is a point of a local minimum of function F(P)=F(x,y) (with local maximum it will be analogous).
Choose any other point P(x,y) in a small neighborhood of P0 and draw a straight line between points P0 and P.
Consider a point Q(q1,q2) moving along this line from P to P0 and beyond.
As point Q moves towards an assumed point of minimum P0 along the line from P to P0, the value of F(Q) should diminish. After crossing P0 the value of F(Q) will start increasing.
What's important is that this behavior of function F(Q) (decreasing going towards P0 and increasing after crossing it) should be the same regardless of a choice of point P from a small neighborhood of P0, because P0 is a local minimum in its neighborhood, no matter from which side it's approached.
The trajectory of point Q is a straight line - a one-dimensional space. So, we can parameterize it with a single variable t like this: Q(t) = P0 + t·(P−P0)
In coordinate form: q1(t) = x0 + t·(x−x0) q2(t) = y0 + t·(y−y0)
At t=1 point Q coincides with point P because Q(1)=P0+1·(P−P0)=P.
At t=0 point Q coincides with point P0 because Q(0)=P0+0·(P−P0)=P0.
Now F(Q(t)) can be considered a function of one argument t that is supposed to have a minimum at t=0 when Q(0)=P0.
That means that the derivative d/dt F(Q(t)), as a function of points P0, P and parameter t, must be equal to zero for t=0, that is at point P0 with a chosen direction towards P.
This is great, but what about a different direction defined by a different choice of point P?
If P0 is a true minimum, change of direction should not affect the fact that directional derivative at P0 towards another point P equals to zero.
So, d/dt F(Q(t)) must be equal to zero for t=0 regardless of the position of point P in the small neighborhood of P0.
It's quite appropriate to demonstrate this technique that involves directional derivatives on a simple example.
Consider a function defined on two-dimensional space f(x,y) = (x−1)² + (y−2)²
Let's find a point P0(x0,y0) where it has a local minimum.
Let's step from point P0(x0,y0) to a neighboring one P(x,y) and parameterize all points on a straight line between P0 and P Q(t) = P0 + t·(P−P0) =
= (x0+t·(x−x0), y0+t·(y−y0))
The value of our function f() at point Q(t) is f(Q(t)) = (x0+t·(x−x0)−1)² + (y0+t·(y−y0)−2)²
The directional derivative of this function by t will then be f 't (Q(t)) = 2(x0+t·(x−x0)−1)·(x−x0) +
+ 2(y0+t·(y−y0)−2)·(y−y0)
If P0(x0,y0) is a point of minimum, this directional derivative from P0 towards P for t=0, that is at point P0(x0,y0), should be equal to zero for any point P(x,y).
At t=0 f 't (Q(0)) = 2(x0−1)·(x−x0) + 2(y0−2)·(y−y0)
If P0(x0,y0) is a point of minimum, the expression above must be equal to zero for any x and y, and the only possible values for x0 and y0 are x0=1 and y0=2.
Therefore, point P0(1,2) is a point of minimum.
The same result can be obtained by equating all partial derivatives to zero, as mentioned above. ∂f(x,y)/∂x = 2(x−1) ∂f(x,y)/∂y = 2(y−2)
System of equations ∂f(x,y)/∂x = 0 ∂f(x,y)/∂y = 0
is 2(x−1) = 0 2(y−2) = 0
Its solutions are x = 1 y = 2
Of course, this was obvious from the expression of function f(x,y)=(x−1)²+(y−2)² as it represents a paraboloid z=x²+y² with its vertex (the minimum) shifted to point (1,2).
Variation of Functionals
Let's follow the above logic that uses directional derivatives and apply it to finding a local minimum of functionals.
To find a local minimum of a functional Φ[f(x)], we should know certain properties of a function-argument f(x) where this minimum takes place.
In the above case of a function defined on two-dimensional space we used the fact that a directional derivative at a point of minimum in any direction is zero.
We do analogously with functionals.
Assume, functional Φ[f(x)] has a local minimum at function-argument f0(x) and takes value Φ[f0(x)] at this function.
Also, assume that we have defined some metrics in the space of all functions f(x) where our functional is defined. This metrics or norm with symbol ||.|| can be defined in many ways, like ||f(x)|| = max[a,b]{f(x), f '(x)}
or other ways mentioned in previous lectures.
This norm is needed to determine a "distance" between two functions: ||f0(x)−f1(x)||
which, in turn, determines what we mean when saying that one function is in the small neighborhood of another.
Shifting an argument from f0(x) to f1(x) causes change of a functional's value from Φ[f0(x)] to Φ[f1(x)], and we know that, since f0(x) is a point of a local minimum, within a small neighborhood around f0(x) the value Φ[f1(x)] cannot be less than Φ[f0(x)].
More rigorously, there exists a positive δ such that for any f1(x) that satisfies
||f1(x)−f0(x)|| ≤ δ
this is true: Φ[f0(x)] ≤ Φ[f1(x)]
Consider a parameterized family of function-arguments g(t,x) (t is a parameter) defined by a formula g(t,x) = f0(x) + t·[f1(x)−f0(x)]
For t=0 g(t,x)=f0(x).
For t=1 g(t,x)=f1(x).
For t=−1 g(t,x)=2f0(x)−f1(x), which is a function symmetrical to f1(x) relatively to f0(x) in a sense that ½[g(−1,x)+f1(x)] = f0(x).
For all real t function g(t,x) is a linear combination of f0(x) and f1(x) and for each pair of these two functions or, equivalently, for each direction from f0(x) towards f1(x) functional Φ[g(t,x)] =
= Φ[f0(x) + t·(f1(x)−f0(x))]
can be considered a real function of a real argument t.
Let's concentrate on a behavior of real function Φ[g(t,x)] of real argument t where g(t,x) is a parameterized by t linear combination of f0(x) and f1(x) and analyze it using a classical methodology of Calculus.
We can take a derivative of Φ[f0(x) + t·(f1(x)−f0(x))] by t and, since f0(x) is a local minimum, this derivative must be equal to zero at this function-argument f0(x), that is at t=0.
This derivative constitutes a directional derivative or variation of functional Φ[f(x)] at function-argument f0(x) along a direction defined by a location of function-argument f1(x).
Variation of functional Φ[f(x)] is usually denoted as δΦ[f(x)].
If we shift the location of function-argument f1(x) in the neighborhood of f0(x), a similar approach would show that this variation (directional derivative by t) would still be zero at t=0 because functional Φ[f(x)] has minimum at f0(x) regardless of the direction of a shift.
Our conclusion is that, if functional Φ[f(x)] has local minimum at function-argument f0(x), the variation (directional derivative) at function-argument f0(x) in the direction from f0(x) to f1(x) should be equal to zero regardless of location of f1(x) in the neighborhood of f0(x).
Examples of handling local minimum of functionals will be presented in the next lecture.
The following represents additional information not covered in the associated video.
It contains the comparison of properties of functionals and real functions defined in N-dimensional space with Cartesian coordinates (functions of N real arguments) to emphasize common techniques to find points of extremum by using the directional derivatives.
Certain important details of what was explained above are repeated here in more details.
On the first reading these details can be skipped, but it's advisable to eventually go through them.
We assume that our N-dimensional space has Cartesian coordinates and every point P there is defined by its coordinates.
This allows us to do arithmetic operations with points by applying the corresponding operations to their coordinates - addition of points, subtraction and multiplying by real constants are defined through these operations on their corresponding coordinates.
To make a concept of a functional, its minimum and approaches to find this minimum easier to understand, let's draw a parallel between
(a) finding a local minimum of a real function F(P) of one argument P, where P is a point in N-dimensional space with Cartesian coordinates with each such point-argument P mapped by function F(P) to a real number, and
(b) finding a local minimum of a functional Φ[f(x)] of one argument f(x), where f(x) is a real function of a real argument with each such function-argument f(x) mapped by functional Φ[f(x)] to a real number.
Note that function-arguments f(x) of a functional Φ[f(x)] have a lot in common with points in the N-dimensional space that are arguments to functions of N arguments.
Both can be considered as elements of corresponding sets with operations of addition and multiplication by a real number that can be easily defined.
Thus, if two points P and Q in N-dimensional space with Cartesian coordinates are given, the linear combination Q−P represents the vector from P to Q and P+t·(Q−P) represents all the points on a line going through P and Q.
In a similar fashion, we can call function-arguments of a functional points in the space of all functions for which this functional is defined (for example, all functions defined on segment [a,b] and differentiable to a second derivative).
With these functions we can also use arithmetic operations of addition, subtraction and multiplication by a real number.
Also, we can use geometric word line to characterize a set of functions defined by a linear combination f(x)+t·(g(x)−f(x)), where f(x) and g(x) are two functions and t-any real parameter.
This approach will demonstrate that dealing with functionals, in principle, follows the same logic as dealing with regular real functions.
As a side note, wherever we will use limits, differentials or derivatives in this lecture, we assume that the functions we deal with do allow these operations, and all limits, differentials or derivatives exist. Our purpose is to explain the concept, not to present mathematically flawless description with all the bells and whistles of 100% rigorous presentation.
(a1) Distance between points
Let's talk about a distance between two arguments of a function (distance between two points in N-dimensional space).
The arguments of a real function F(P), as points in N-dimensional space, can be represented in Cartesian coordinates (x1,x2,...,xN) with a known concept of a distance between two arguments. This is used to define a neighborhood of some point-argument P - all points Q within certain distance from P.
Thus, a distance between P(x1,x2,...,xN) and Q(y1,y2,...,yN) that we will denote as ||P−Q|| is defined as
||P−Q|| = [Σi∈(1,N)(yi−xi)²]½
The definition of a distance will lead to a concept of neighborhood which is essential to define and to find a local minimum of real functions using a derivative.
(b1) Distance between functions
Let's talk about a distance between two arguments of a functional (a distance between two real functions).
The arguments of a functional Φ[f(x)], as real functions from some class of functions, also should have this concept of a distance to be able to talk about a local minimum in a neighborhood of some particular function-argument.
This distance can be defined in many ways to quantitatively measure the "closeness" of one function to another. This was discussed in the previous lecture and one of the ways to define this distance was suggested there by using a concept of a scalar product of functions as an integral of their algebraic product.
Let's suggest some other ways to define this distance.
First of all, as in case of real numbers, the distance between functions f(x) and g(x) must be based upon their algebraic difference h(x)=f(x)−g(x).
Secondly, we have to quantify this difference, a function h(x), with a single real value.
There are a few traditional ways to apply a real number (called a norm and denoted ||h(x)||) to a function to signify how close this function is to zero.
Here are some for functions defined on segment [a,b]:
||h(x)|| = max[a,b]|h(x)|
||h(x)|| = max[a,b]{|h(x)|,|h'(x)|}
||h(x)|| = ∫[a,b]h²(x)·dx
Let's assume that some norm ||f(x)|| is defined for any function-argument of functional Φ[f(x)].
So, ||h(x)|| is a measure of how close function h(x) is to a function that equals to zero everywhere and ||g(x)−f(x)|| is a measure of how close function g(x) is to function f(x).
(a2) Increment
Now we will introduce a concept of an increment of an argument to a function F(P) (an increment to a point P in N-dimensional space) and caused by it an increment of function f(P) itself.
Let's fix an argument of function F(P): P=P0.
Consider these two points P0(x1,x2,...,xN) and P(y1,y2,...,yN) and their "difference" R(y1−x1,y2−x2,...,yN−xN).
This "difference" R is an increment of an argument of function f() from point P0 to P because in coordinate form P=P0+R.
We will denote it as
ΔP=R=P−P0 - an increment of argument P0.
At the same time, the difference
ΔF(P)=F(P)−F(P0) is an increment of a function f() at point P0 when we increment an argument by ΔP to point P=P0+ΔP.
(b2) Increment
Now we will introduce a concept of increment of a function-argument to a functional Φ[f(x)] and an increment of a functional itself.
Let's fix a function-argument of functional Φ[f(x)]: f(x)=f0(x).
If we consider another function-argument f(x), the difference
Δf(x)=f(x)−f0(x) is an increment of function-argument f0(x).
At the same time, the difference
ΔΦ[f(x)]=Φ[f(x)]−Φ[f0(x)| is an increment of a functional Φ[f(x0)], when we increment its argument from f0(x) by Δf(x) to f(x)=f0(x)+Δf(x).
(a3) Neighborhood
A neighborhood of positive radius δ around point-argument P0 of function F(P) is a set of all arguments P such that a defined above increment ΔP=P−P0 from P0 to P has the norm||ΔP|| that does not exceed radius δ.
(b3) Neighborhood
A neighborhood of positive radius δ around a function-argument f0(x) of functional Φ[f(x)] is a set of all function-arguments f(x) such that a defined above increment Δf(x)=f(x)−f0(x) from f0(x) to f(x) has the norm ||Δf(x)|| that does not exceed δ.
(a4) Linear Function
Recall that multiplication of a point in N-dimensional space by a real number and addition of points are done on a coordinate basis, that is each coordinate is multiplied and corresponding coordinates are added.
Function F(P) is linear if for any of its point-arguments and any real multiplier k the following is true: f(k·P) = k·F(P) and F(P1+P2) = F(P1) + F(P2)
(b4) Linear Functional
Functional Φ[f(x)] is linear if for any function-arguments and any real multiplier k the following is true: Φ[k·f(x)] = k·Φ[f(x)] and Φ[f1(x)+f2(x)] =
= Φ[f1(x)] + Φ[f2(x)]
(a5) Continuous Function
Function F(P) is continuous at point P0 if a small increment of an argument from P0 to some neighboring point P causes a small increment of the value of function from F(P0) to F(P).
More precisely, function F(P) is continuous at point P0 if for any positive function increment ε there exist a positive δ such that if ||ΔP|| = ||P−P0|| ≤ δ then
|ΔF(P)| = |F(P)−F(P0)| ≤ ε.
(b5) Continuous Functional
Functional Φ[f(x)] is continuous at point f0(x) if a small increment of a function-argument from f0(x) to some neighboring function-argument f(x) causes a small increment of the value of functional from Φ[f0(x)] to Φ[f(x)].
More precisely, functional Φ[f(x)] is continuous at point f0(x) if for any positive functional increment ε there exist a positive δ such that if ||Δf(x)|| = ||f(x)−f0(x)|| ≤ δ then
|ΔΦ[f(x)]| =
= | Φ[f(x)]−Φ[f0(x)] | ≤ ε.
(a6) Differentiation of Functions
To find a local minimum of a function F(P), we should know certain properties of a point where this minimum takes place. Then, using these properties, we will be able to find a point of a local minimum.
In case of a function of one argument (that is, if dimension N=1) we know that a derivative of a function at a point of local minimum equals to zero. So, we take a derivative, equate it to zero and solve the equation.
With greater dimensions of a space where our function is defined this approach would not work, because we cannot take a derivative by a few arguments at the same time.
However, we can do something clever to overcome this problem.
Assume, function F(P) has a local minimum at point P0 and takes value F(P0) at this point.
Shifting an argument from P0 to P1 causes change of a function value from F(P0) to F(P1), and we know that, since P0 is a point of a local minimum, within a small neighborhood around P0 the value F(P1) cannot be less than F(P0).
More rigorously, there exists a positive δ such that for any P1 that satisfies
||P1−P0|| ≤ δ
this is true: F(P0) ≤ F(P1)
Consider a straight line between P0 and P1 in our N-dimensional space.
Its points Q can be parameterized as Q(t) = P0 + t·(P1−P0)
For t=0Q(t)=Q(0)=P0.
For t=1Q(t)=Q(1)=P1.
For t=−1Q(−1) is a point symmetrical to P1 relatively to point P0.
For all other t point Q(t) lies somewhere on a line that goes through P0 and P1.
If we concentrate on a behavior of function F(Q(t))=F(P0+t·(P1−P0))
where Q(t) is a point on a line going through P0 and P1, it can be considered as a function of only one variable t and, therefore, can be analyzed using a classical methodology of Calculus.
We can take a derivative of F(P0+t·(P1−P0)) by t and, since P0 is a local minimum, this derivative must be equal to zero at this point P0, that is at t=0.
This derivative constitutes a directional derivative of function F(P) at point P0 along a direction defined by a location of point P1.
What's more interesting is that, if we shift the location of point P1 in the neighborhood of P0, a similar approach would show that this directional derivative by t would still be zero at t=0 because function F(P) has minimum at P0 regardless of the direction of a shift.
Our conclusion is that, if function F(P) has local minimum at point P0, the directional derivative at point P0 in the direction from P0 to P1 should be equal to zero regardless of location of P1 in the neighborhood of P0.
Let's see how it works if, instead of points P0 and P1 in N-dimensional space, we use Cartesian coordinates.
Note: This is a method that can be used for functions of N arguments, but not for functionals, where we will use only directional derivatives.
See item (b6) below.
Let P0(...x0i...) - i∈[1,N] P1(...x1i...) - i∈[1,N] Q1(...qi...) - i∈[1,N]
where qi = x0i+t·(x1i−x0i)
Directional derivative of F(Q(t)) = f(q1,...qN) by t, using the chain rule, equals to
Σi∈[1,N][∂f(q1,...qN)/∂qi]·(dqi /dt)
or
Σi∈[1,N][∂f(q1,...qN)/∂qi]·(x1i−x0i)
If P0(...x0i...) is the point of minimum, the above expression for a directional derivative must be zero for t=0, that is when Q(t)=Q(0)=P0(...x0i...), for any direction defined by point P1(...x1i...).
The only way it can be true is if every ∂f(q1,...qN)/∂qi] equals to zero at point P0.
We came to a conclusion that, when a function defined on N-dimensional space has a minimum at some point, all partial derivatives of this function equal to zero at this point.
It's quite appropriate to demonstrate this technique that involves directional derivatives on a simple example.
Consider a function defined on two-dimensional space f(x,y) = (x−1)² + (y−2)²
Let's find a point P0(x0,y0) where it has a local minimum.
Let's step from this function to a neighboring one P1(x1,y1) and parameterize all points on a straight line between P0 and P1 Q(t) = P0 + t·(P1−P0) =
= (x0+t·(x1−x0), y0+t·(y1−y0))
The value of our function f() at point Q(t) is F(Q(t)) = (x0+t·(x1−x0)−1)² + (y0+t·(y1−y0)−2)²
The directional derivative of this function by t will then be f 't (Q(t)) = 2(x0+t·(x1−x0)−1)·(x1−x0) +
+ 2(y0+t·(y1−y0)−2)·(y1−y0)
If P0(x0,y0) is a point of minimum, this directional (from P0 towards P1) derivative at t=0 should be equal to zero for any point P1(x1,y1).
At t=0 f 't (Q(0)) = 2(x0−1)·(x1−x0) + 2(y0−2)·(y1−y0)
If P0(x0,y0) is a point of minimum, the expression above must be equal to zero for any x1 and y1, and the only possible values for x0 and y0 are x0=1 and y0=2.
Therefore, point P0(1,2) is a point of minimum.
The same result can be obtained by equating all partial derivatives to zero, as mentioned above. ∂f(x,y)/∂x = 2(x−1) ∂f(x,y)/∂y = 2(y−2)
System of equations ∂f(x,y)/∂x = 0 ∂f(x,y)/∂y = 0
is 2(x−1) = 0 2(y−2) = 0
Its solutions are x = 1 y = 2
Of course, this was obvious from the expression of function f(x,y)=(x−1)²+(y−2)² as it represents a paraboloid z=x²+y² with its vertex (the minimum) shifted to point (1,2).
(b6) Variation of Functionals
Let's follow the above logic that uses directional derivatives and apply it to finding a local minimum of functionals.
To find a local minimum of a functional Φ[f(x)], we should know certain properties of a function-argument f(x) where this minimum takes place.
In the above case of a function defined on N-dimensional space we used the fact that a directional derivative from a point of minimum in any direction is zero.
We do analogously with functionals.
Assume, functional Φ[f(x)] has a local minimum at function-argument f0(x) and takes value Φ[f0(x)] at this function.
Shifting an argument from f0(x) to f1(x) causes change of a functional's value from Φ[f0(x)] to Φ[f1(x)], and we know that, since f0(x) is a point of a local minimum, within a small neighborhood around f0(x) the value Φ[f1(x)] cannot be less than Φ[f0(x)].
More rigorously, there exists a positive δ such that for any f1(x) that satisfies
||f1(x)−f0(x)|| ≤ δ
this is true: Φ[f0(x)] ≤ Φ[f1(x)]
Consider a parameterized family of function-arguments g(t,x) (t is a parameter) defined by a formula g(t,x) = f0(x) + t·[f1(x)−f0(x)]
For t=0 g(t,x)=f0(x).
For t=1 g(t,x)=f1(x).
For t=−1 g(t,x)=2f0(x)−f1(x), which is a function symmetrical to f1(x) relatively to f0(x) in a sense that ½[g(−1,x)+f1(x)] = f0(x).
For all real t function g(t,x) is a linear combination of f0(x) and f1(x) and for each pair of these two functions or, equivalently, for each direction from f0(x) towards f1(x) functional Φ[g(t,x)] =
= Φ[f0(x) + t·(f1(x)−f0(x))]
can be considered a real function of a real argument t.
Let's concentrate on a behavior of real function Φ[g(t,x)] of real argument t where g(t,x) is a parameterized by t linear combination of f0(x) and f1(x) and analyze it using a classical methodology of Calculus.
We can take a derivative of Φ[f0(x) + t·(f1(x)−f0(x))] by t and, since f0(x) is a local minimum, this derivative must be equal to zero at this function-argument f0(x), that is at t=0.
This derivative constitutes a directional derivative or variation of functional Φ[f(x)] at function-argument f0(x) along a direction defined by a location of function-argument f1(x).
If we shift the location of function-argument f1(x) in the neighborhood of f0(x), a similar approach would show that this variation (directional derivative by t) would still be zero at t=0 because functional Φ[f(x)] has minimum at f0(x) regardless of the direction of a shift.
Our conclusion is that, if functional Φ[f(x)] has local minimum at function-argument f0(x), the variation (directional derivative) at function-argument f0(x) in the direction from f0(x) to f1(x) should be equal to zero regardless of location of f1(x) in the neighborhood of f0(x).
Examples of handling local minimum of functionals will be presented in the next lecture.