Min/Max and Variation
In this lecture we continue discussing the problem of finding a local extremum (minimum or maximum) of a functional that we introduced in the previous lectures.
To find a local extremum point x0 of a smooth real function of one argument F(x) we usually do the following.
(a) find a derivative F'(x) of function F(x) by x;
(b) if x0 is a local extremum point, the derivative at this point should be equal to zero, which means that x0 must be a solution of the following equation
F'(x) = 0
Let's try to find a local extremum of a functional Φ[f(x)] using the same approach.
The first step presents the first problem: how to take a derivative of a functional Φ[f(x)] by its function-argument f(x)?
The answer is: WE CANNOT.
So, we need another approach, and we would like to explain it using an analogy with finding an extremum of a real function of two arguments F(x,y) defined on a two-dimensional XY plane with Cartesian coordinates, that is finding such a point P0(x0,y0) that the value F(P0)=F(x0,y0) is greater (for maximum) or smaller (for minimum) than all other F(P)=F(x,y), where point P(x,y) is in a small neighborhood of point P0(x0,y0).
As in case of functionals, we cannot differentiate function F(P) by P because, geometrically, there is no such thing as differentiating by point and, algebraically, we cannot simultaneously differentiate by two coordinates.
Yes, we can apply partial differentiation
Fortunately, there is a theorem that, if both partial derivatives are zero at some point, it's really a point of extremum in all directions, but this is not applicable to functionals, and we will not talk about it at this moment.
An approach we choose to find an extremum of function F(P)=F(x,y) defined on a plane and that will be used to find an extremum of functionals is as follows.
Assume, point P0(x0,y0) is a point of a local minimum of function F(P)=F(x,y) (with local maximum it will be analogous).
Choose any other point P(x,y) in a small neighborhood of P0 and draw a straight line between points P0 and P.
Consider a point Q(q1,q2) moving along this line from P to P0 and beyond.
As point Q moves towards an assumed point of minimum P0 along the line from P to P0, the value of F(Q) should diminish. After crossing P0 the value of F(Q) will start increasing.
What's important is that this behavior of function F(Q) (decreasing going towards P0 and increasing after crossing it) should be the same regardless of a choice of point P from a small neighborhood of P0, because P0 is a local minimum in its neighborhood, no matter from which side it's approached.
The trajectory of point Q is a straight line - a one-dimensional space. So, we can parameterize it with a single variable t like this:
Q(t) = P0 + t·(P−P0)
In coordinate form:
q1(t) = x0 + t·(x−x0)
q2(t) = y0 + t·(y−y0)
At t=1 point Q coincides with point P because
Q(1)=P0+1·(P−P0)=P.
At t=0 point Q coincides with point P0 because
Q(0)=P0+0·(P−P0)=P0.
Now F(Q(t)) can be considered a function of one argument t that is supposed to have a minimum at t=0 when Q(0)=P0.
That means that the derivative d/dt F(Q(t)), as a function of points P0, P and parameter t, must be equal to zero for t=0, that is at point P0 with a chosen direction towards P.
This is great, but what about a different direction defined by a different choice of point P?
If P0 is a true minimum, change of direction should not affect the fact that directional derivative at P0 towards another point P equals to zero.
So, d/dt F(Q(t)) must be equal to zero for t=0 regardless of the position of point P in the small neighborhood of P0.
It's quite appropriate to demonstrate this technique that involves directional derivatives on a simple example.
Consider a function defined on two-dimensional space
f(x,y) = (x−1)² + (y−2)²
Let's find a point P0(x0,y0) where it has a local minimum.
Let's step from point P0(x0,y0) to a neighboring one P(x,y) and parameterize all points on a straight line between P0 and P
Q(t) = P0 + t·(P−P0) =
= (x0+t·(x−x0), y0+t·(y−y0))
The value of our function f() at point Q(t) is
f(Q(t)) = (x0+t·(x−x0)−1)² + (y0+t·(y−y0)−2)²
The directional derivative of this function by t will then be
f 't (Q(t)) = 2(x0+t·(x−x0)−1)·(x−x0) +
+ 2(y0+t·(y−y0)−2)·(y−y0)
If P0(x0,y0) is a point of minimum, this directional derivative from P0 towards P for t=0, that is at point P0(x0,y0), should be equal to zero for any point P(x,y).
At t=0
f 't (Q(0)) = 2(x0−1)·(x−x0) + 2(y0−2)·(y−y0)
If P0(x0,y0) is a point of minimum, the expression above must be equal to zero for any x and y, and the only possible values for x0 and y0 are x0=1 and y0=2.
Therefore, point P0(1,2) is a point of minimum.
The same result can be obtained by equating all partial derivatives to zero, as mentioned above.
∂f(x,y)/∂x = 2(x−1)
∂f(x,y)/∂y = 2(y−2)
System of equations
∂f(x,y)/∂x = 0
∂f(x,y)/∂y = 0
is
2(x−1) = 0
2(y−2) = 0
Its solutions are
x = 1
y = 2
Of course, this was obvious from the expression of function f(x,y)=(x−1)²+(y−2)² as it represents a paraboloid z=x²+y² with its vertex (the minimum) shifted to point (1,2).
Variation of Functionals
Let's follow the above logic that uses directional derivatives and apply it to finding a local minimum of functionals.
To find a local minimum of a functional Φ[f(x)], we should know certain properties of a function-argument f(x) where this minimum takes place.
In the above case of a function defined on
We do analogously with functionals.
Assume, functional Φ[f(x)] has a local minimum at function-argument f0(x) and takes value Φ[f0(x)] at this function.
Also, assume that we have defined some metrics in the space of all functions f(x) where our functional is defined. This metrics or norm with symbol ||.|| can be defined in many ways, like
||f(x)|| = max[a,b]{f(x), f '(x)}
or other ways mentioned in previous lectures.
This norm is needed to determine a "distance" between two functions:
||f0(x)−f1(x)||
which, in turn, determines what we mean when saying that one function is in the small neighborhood of another.
Shifting an argument from f0(x) to f1(x) causes change of a functional's value from Φ[f0(x)] to Φ[f1(x)], and we know that, since f0(x) is a point of a local minimum, within a small neighborhood around f0(x) the value Φ[f1(x)] cannot be less than Φ[f0(x)].
More rigorously, there exists a positive δ such that for any f1(x) that satisfies
||f1(x)−f0(x)|| ≤ δ
this is true: Φ[f0(x)] ≤ Φ[f1(x)]
Consider a parameterized family of function-arguments g(t,x) (t is a parameter) defined by a formula
g(t,x) = f0(x) + t·[f1(x)−f0(x)]
For t=0 g(t,x)=f0(x).
For t=1 g(t,x)=f1(x).
For t=−1 g(t,x)=2f0(x)−f1(x), which is a function symmetrical to f1(x) relatively to f0(x) in a sense that
½[g(−1,x)+f1(x)] = f0(x).
For all real t function g(t,x) is a linear combination of f0(x) and f1(x) and for each pair of these two functions or, equivalently, for each direction from f0(x) towards f1(x) functional
Φ[g(t,x)] =
= Φ[f0(x) + t·(f1(x)−f0(x))]
can be considered a real function of a real argument t.
Let's concentrate on a behavior of real function Φ[g(t,x)] of real argument t where g(t,x) is a parameterized by t linear combination of f0(x) and f1(x) and analyze it using a classical methodology of Calculus.
We can take a derivative of Φ[f0(x) + t·(f1(x)−f0(x))] by t and, since f0(x) is a local minimum, this derivative must be equal to zero at this function-argument f0(x), that is at t=0.
This derivative constitutes a directional derivative or variation of functional Φ[f(x)] at function-argument f0(x) along a direction defined by a location of function-argument f1(x).
Variation of functional Φ[f(x)] is usually denoted as
δ Φ[f(x)].
Our conclusion is that, if functional Φ[f(x)] has local minimum at function-argument f0(x), the variation (directional derivative) at function-argument f0(x) in the direction from f0(x) to f1(x) should be equal to zero regardless of location of f1(x) in the neighborhood of f0(x).
Examples of handling local minimum of functionals will be presented in the next lecture.The following represents additional information not covered in the associated video.
It contains the comparison of properties of functionals and real functions defined in N-dimensional space with Cartesian coordinates (functions of N real arguments) to emphasize common techniques to find points of extremum by using the directional derivatives.
Certain important details of what was explained above are repeated here in more details.
On the first reading these details can be skipped, but it's advisable to eventually go through them.
We assume that our
This allows us to do arithmetic operations with points by applying the corresponding operations to their coordinates - addition of points, subtraction and multiplying by real constants are defined through these operations on their corresponding coordinates.
To make a concept of a functional, its minimum and approaches to find this minimum easier to understand, let's draw a parallel between
(a) finding a local minimum of a real function F(P) of one argument P, where P is a point in
(b) finding a local minimum of a functional Φ[f(x)] of one argument f(x), where f(x) is a real function of a real argument with each such function-argument f(x) mapped by functional Φ[f(x)] to a real number.
Note that function-arguments f(x) of a functional Φ[f(x)] have a lot in common with points in the
Both can be considered as elements of corresponding sets with operations of addition and multiplication by a real number that can be easily defined.
Thus, if two points P and Q in
In a similar fashion, we can call function-arguments of a functional points in the space of all functions for which this functional is defined (for example, all functions defined on segment [a,b] and differentiable to a second derivative).
With these functions we can also use arithmetic operations of addition, subtraction and multiplication by a real number.
Also, we can use geometric word line to characterize a set of functions defined by a linear combination f(x)+t·(g(x)−f(x)), where f(x) and g(x) are two functions and t-any real parameter.
This approach will demonstrate that dealing with functionals, in principle, follows the same logic as dealing with regular real functions.
As a side note, wherever we will use limits, differentials or derivatives in this lecture, we assume that the functions we deal with do allow these operations, and all limits, differentials or derivatives exist. Our purpose is to explain the concept, not to present mathematically flawless description with all the bells and whistles of 100% rigorous presentation.
(a1) Distance between points
Let's talk about a distance between two arguments of a function (distance between two points in
The arguments of a real function F(P), as points in
Thus, a distance between P(x1,x2,...,xN) and Q(y1,y2,...,yN) that we will denote as ||P−Q|| is defined as
||P−Q|| = [Σi∈(1,N)(yi−xi)²]½
The definition of a distance will lead to a concept of neighborhood which is essential to define and to find a local minimum of real functions using a derivative.
(b1) Distance between functions
Let's talk about a distance between two arguments of a functional (a distance between two real functions).
The arguments of a functional Φ[f(x)], as real functions from some class of functions, also should have this concept of a distance to be able to talk about a local minimum in a neighborhood of some particular function-argument.
This distance can be defined in many ways to quantitatively measure the "closeness" of one function to another. This was discussed in the previous lecture and one of the ways to define this distance was suggested there by using a concept of a scalar product of functions as an integral of their algebraic product.
Let's suggest some other ways to define this distance.
First of all, as in case of real numbers, the distance between functions f(x) and g(x) must be based upon their algebraic difference h(x)=f(x)−g(x).
Secondly, we have to quantify this difference, a function h(x), with a single real value.
There are a few traditional ways to apply a real number (called a norm and denoted ||h(x)||) to a function to signify how close this function is to zero.
Here are some for functions defined on segment [a,b]:
||h(x)|| = max[a,b]|h(x)|
||h(x)|| = max[a,b]{|h(x)|,|h'(x)|}
||h(x)|| = ∫[a,b]h²(x)·dx
Let's assume that some norm ||f(x)|| is defined for any function-argument of functional Φ[f(x)].
So, ||h(x)|| is a measure of how close function h(x) is to a function that equals to zero everywhere and ||g(x)−f(x)|| is a measure of how close function g(x) is to function f(x).
(a2) Increment
Now we will introduce a concept of an increment of an argument to a function F(P) (an increment to a point P in
Let's fix an argument of function F(P): P=P0.
Consider these two points P0(x1,x2,...,xN) and P(y1,y2,...,yN) and their "difference" R(y1−x1,y2−x2,...,yN−xN).
This "difference" R is an increment of an argument of function f() from point P0 to P because in coordinate form P=P0+R.
We will denote it as
ΔP=R=P−P0 - an increment of argument P0.
At the same time, the difference
ΔF(P)=F(P)−F(P0) is an increment of a function f() at point P0 when we increment an argument by ΔP to point P=P0+ΔP.
(b2) Increment
Now we will introduce a concept of increment of a function-argument to a functional Φ[f(x)] and an increment of a functional itself.
Let's fix a function-argument of functional Φ[f(x)]: f(x)=f0(x).
If we consider another function-argument f(x), the difference
Δf(x)=f(x)−f0(x) is an increment of function-argument f0(x).
At the same time, the difference
ΔΦ[f(x)]=Φ[f(x)]−Φ[f0(x)| is an increment of a functional Φ[f(x0)], when we increment its argument from f0(x) by Δf(x) to f(x)=f0(x)+Δf(x).
(a3) Neighborhood
A neighborhood of positive radius δ around point-argument P0 of function F(P) is a set of all arguments P such that a defined above increment ΔP=P−P0 from P0 to P has the norm
(b3) Neighborhood
A neighborhood of positive radius δ around a function-argument f0(x) of functional Φ[f(x)] is a set of all function-arguments f(x) such that a defined above increment Δf(x)=f(x)−f0(x) from f0(x) to f(x) has the norm
(a4) Linear Function
Recall that multiplication of a point in
Function F(P) is linear if for any of its point-arguments and any real multiplier k the following is true:
f(k·P) = k·F(P) and
F(P1+P2) = F(P1) + F(P2)
(b4) Linear Functional
Functional Φ[f(x)] is linear if for any function-arguments and any real multiplier k the following is true:
Φ[k·f(x)] = k·Φ[f(x)] and
Φ[f1(x)+f2(x)] =
= Φ[f1(x)] + Φ[f2(x)]
(a5) Continuous Function
Function F(P) is continuous at point P0 if a small increment of an argument from P0 to some neighboring point P causes a small increment of the value of function from F(P0) to F(P).
More precisely, function F(P) is continuous at point P0 if for any positive function increment ε there exist a positive δ such that if
|
(b5) Continuous Functional
Functional Φ[f(x)] is continuous at point f0(x) if a small increment of a function-argument from f0(x) to some neighboring function-argument f(x) causes a small increment of the value of functional from Φ[f0(x)] to Φ[f(x)].
More precisely, functional Φ[f(x)] is continuous at point f0(x) if for any positive functional increment ε there exist a positive δ such that if
|ΔΦ[f(x)]| =
= | Φ[f(x)]−Φ[f0(x)] | ≤ ε.
(a6) Differentiation of Functions
To find a local minimum of a function F(P), we should know certain properties of a point where this minimum takes place. Then, using these properties, we will be able to find a point of a local minimum.
In case of a function of one argument (that is, if dimension N=1) we know that a derivative of a function at a point of local minimum equals to zero. So, we take a derivative, equate it to zero and solve the equation.
With greater dimensions of a space where our function is defined this approach would not work, because we cannot take a derivative by a few arguments at the same time.
However, we can do something clever to overcome this problem.
Assume, function F(P) has a local minimum at point P0 and takes value F(P0) at this point.
Shifting an argument from P0 to P1 causes change of a function value from F(P0) to F(P1), and we know that, since P0 is a point of a local minimum, within a small neighborhood around P0 the value F(P1) cannot be less than F(P0).
More rigorously, there exists a positive δ such that for any P1 that satisfies ||P1−P0|| ≤ δ
this is true: F(P0) ≤ F(P1)
Consider a straight line between P0 and P1 in our
Its points Q can be parameterized as
Q(t) = P0 + t·(P1−P0)
For t=0 Q(t)=Q(0)=P0.
For t=1 Q(t)=Q(1)=P1.
For t=−1 Q(−1) is a point symmetrical to P1 relatively to point P0.
For all other t point Q(t) lies somewhere on a line that goes through P0 and P1.
If we concentrate on a behavior of function
F(Q(t))=F(P0+t·(P1−P0))
where Q(t) is a point on a line going through P0 and P1, it can be considered as a function of only one variable t and, therefore, can be analyzed using a classical methodology of Calculus.
We can take a derivative of F(P0+t·(P1−P0)) by t and, since P0 is a local minimum, this derivative must be equal to zero at this point P0, that is at t=0.
This derivative constitutes a directional derivative of function F(P) at point P0 along a direction defined by a location of point P1.
What's more interesting is that, if we shift the location of point P1 in the neighborhood of P0, a similar approach would show that this directional derivative by t would still be zero at t=0 because function F(P) has minimum at P0 regardless of the direction of a shift.
Our conclusion is that, if function F(P) has local minimum at point P0, the directional derivative at point P0 in the direction from P0 to P1 should be equal to zero regardless of location of P1 in the neighborhood of P0.
Let's see how it works if, instead of points P0 and P1 in
Note: This is a method that can be used for functions of N arguments, but not for functionals, where we will use only directional derivatives.
See item (b6) below.
Let
P0(...x0i...) - i∈[1,N]
P1(...x1i...) - i∈[1,N]
Q1(...qi...) - i∈[1,N]
where qi = x0i+t·(x1i−x0i)
Directional derivative of
F(Q(t)) = f(q1,...qN) by t, using the chain rule, equals to
Σi∈[1,N][∂f(q1,...qN)/∂qi]·(dqi /dt)
or
Σi∈[1,N][∂f(q1,...qN)/∂qi]·(x1i−x0i)
If P0(...x0i...) is the point of minimum, the above expression for a directional derivative must be zero for t=0, that is when Q(t)=Q(0)=P0(...x0i...), for any direction defined by point P1(...x1i...).
The only way it can be true is if every ∂f(q1,...qN)/∂qi] equals to zero at point P0.
We came to a conclusion that, when a function defined on
It's quite appropriate to demonstrate this technique that involves directional derivatives on a simple example.
Consider a function defined on two-dimensional space
f(x,y) = (x−1)² + (y−2)²
Let's find a point P0(x0,y0) where it has a local minimum.
Let's step from this function to a neighboring one P1(x1,y1) and parameterize all points on a straight line between P0 and P1
Q(t) = P0 + t·(P1−P0) =
= (x0+t·(x1−x0), y0+t·(y1−y0))
The value of our function f() at point Q(t) is
F(Q(t)) = (x0+t·(x1−x0)−1)² + (y0+t·(y1−y0)−2)²
The directional derivative of this function by t will then be
f 't (Q(t)) = 2(x0+t·(x1−x0)−1)·(x1−x0) +
+ 2(y0+t·(y1−y0)−2)·(y1−y0)
If P0(x0,y0) is a point of minimum, this directional (from P0 towards P1) derivative at t=0 should be equal to zero for any point P1(x1,y1).
At t=0
f 't (Q(0)) = 2(x0−1)·(x1−x0) + 2(y0−2)·(y1−y0)
If P0(x0,y0) is a point of minimum, the expression above must be equal to zero for any x1 and y1, and the only possible values for x0 and y0 are x0=1 and y0=2.
Therefore, point P0(1,2) is a point of minimum.
The same result can be obtained by equating all partial derivatives to zero, as mentioned above.
∂f(x,y)/∂x = 2(x−1)
∂f(x,y)/∂y = 2(y−2)
System of equations
∂f(x,y)/∂x = 0
∂f(x,y)/∂y = 0
is
2(x−1) = 0
2(y−2) = 0
Its solutions are
x = 1
y = 2
Of course, this was obvious from the expression of function f(x,y)=(x−1)²+(y−2)² as it represents a paraboloid z=x²+y² with its vertex (the minimum) shifted to point (1,2).
(b6) Variation of Functionals
Let's follow the above logic that uses directional derivatives and apply it to finding a local minimum of functionals.
To find a local minimum of a functional Φ[f(x)], we should know certain properties of a function-argument f(x) where this minimum takes place.
In the above case of a function defined on
We do analogously with functionals.
Assume, functional Φ[f(x)] has a local minimum at function-argument f0(x) and takes value Φ[f0(x)] at this function.
Shifting an argument from f0(x) to f1(x) causes change of a functional's value from Φ[f0(x)] to Φ[f1(x)], and we know that, since f0(x) is a point of a local minimum, within a small neighborhood around f0(x) the value Φ[f1(x)] cannot be less than Φ[f0(x)].
More rigorously, there exists a positive δ such that for any f1(x) that satisfies
||f1(x)−f0(x)|| ≤ δ
this is true: Φ[f0(x)] ≤ Φ[f1(x)]
Consider a parameterized family of function-arguments g(t,x) (t is a parameter) defined by a formula
g(t,x) = f0(x) + t·[f1(x)−f0(x)]
For t=0 g(t,x)=f0(x).
For t=1 g(t,x)=f1(x).
For t=−1 g(t,x)=2f0(x)−f1(x), which is a function symmetrical to f1(x) relatively to f0(x) in a sense that
½[g(−1,x)+f1(x)] = f0(x).
For all real t function g(t,x) is a linear combination of f0(x) and f1(x) and for each pair of these two functions or, equivalently, for each direction from f0(x) towards f1(x) functional
Φ[g(t,x)] =
= Φ[f0(x) + t·(f1(x)−f0(x))]
can be considered a real function of a real argument t.
Let's concentrate on a behavior of real function Φ[g(t,x)] of real argument t where g(t,x) is a parameterized by t linear combination of f0(x) and f1(x) and analyze it using a classical methodology of Calculus.
We can take a derivative of Φ[f0(x) + t·(f1(x)−f0(x))] by t and, since f0(x) is a local minimum, this derivative must be equal to zero at this function-argument f0(x), that is at t=0.
This derivative constitutes a directional derivative or variation of functional Φ[f(x)] at function-argument f0(x) along a direction defined by a location of function-argument f1(x).
If we shift the location of function-argument f1(x) in the neighborhood of f0(x), a similar approach would show that this variation (directional derivative by t) would still be zero at t=0 because functional Φ[f(x)] has minimum at f0(x) regardless of the direction of a shift.Our conclusion is that, if functional Φ[f(x)] has local minimum at function-argument f0(x), the variation (directional derivative) at function-argument f0(x) in the direction from f0(x) to f1(x) should be equal to zero regardless of location of f1(x) in the neighborhood of f0(x).
Examples of handling local minimum of functionals will be presented in the next lecture.
No comments:
Post a Comment