Lagrangian - Definition of
Min/Max of Functional
In this lecture we will discuss a concept of a local minimum or maximum of a functional.
Consider an
We can always assume that these N real numbers are represented by a point in
What is the meaning of a statement that this function has a local minimum at point
P(x1,...,xN)?
In plain language it means that within a sufficiently small neighborhood around point P, no matter where we move from point P, the value of our function at that new point will be greater or equal than f(P).
Let's formalize this definition in a way that will be used to define a local minimum of a functional.
Firstly, for convenience, we will use vectors originated at the origin of coordinates and ending at some point instead of
So, vector OP that stretches from the origin of coordinates O to point P will replace coordinates (x1,...,xN) of that point.
Using this, our function can be viewed as f(OP).
A "sufficiently small neighborhood" of point P(x1,...,xN) (or of vector OP) can be described as all points Q(x1,...,xN) on a sufficiently small distance from point P according to a regular definition of distance in Cartesian coordinates (or as all vectors OQ also originated at the origin of coordinates such that magnitude of a difference vector |OQ−OP|=|PQ| is sufficiently small).
Since we are dealing with
We can also approach it differently getting an equivalent definition of a minimum.
Consider any vector e of unit length.
Now, all vectors OP+t·e, where t is some "sufficiently small" real values and e can be any vector of unit length, describe "sufficiently small neighborhood" of vector OP.
This representation of "sufficiently small neighborhood" might be more convenient since it depends on a single real value t.
Using the above, we can define a local minimum of
Vector OP is defined as a local minimum of function f() if there exists a real positive τ such that
f(OP) ≤ f(OP+t·e)
for all
for any unit vector e.
Obviously, local maximum can be defined analogously by changing "less than" to "greater than" in the above definition.
For a function of one argument we usually look for a local minimum by solving an equation with a function's derivative equal to zero.
This is a very geometrical approach of looking for minimum because on one side of a minimum our function is decreasing, on another - increasing, so a derivative is changing its sign from negative for decreasing interval to positive for increasing, and, therefore, must be equal to zero at the point of minimum itself.
With functions of two and more arguments this geometric logic is not so obvious, but our alternative way of defining a minimum using unit vector e originated at the point of minimum and scalar multiplier t helps to return to geometrical meaning of a local minimum.
You can imagine that each direction of unit vector e defines a plane parallel to Z-axis going through point P and unit vector e cutting a paraboloid on a picture above with a parabola as an intersection.
This parabola is a function of one variable t and, therefore, at point of minimum P must have its derivative by t equal to zero.
This derivative is called directional derivative with vector e being a direction.
Indeed, if a directional derivative by t of f(OP+t·e) is zero at point P for each unit vector e, regardless of its direction, then point P is a good candidate for a local minimum (or maximum).
This approach allows us to deal with one-dimensional case many times (for each direction of unit vector e) instead of once but for more complicated case of multiple dimensions.
Of course, the number of possible directions of unit vector e is infinite, but it's not difficult to prove that if directional derivatives along each and every coordinate axis (partial derivative) is zero, a derivative along any other direction will be zero as well.
So, for a function of N variables it's sufficient to check N first derivatives of this function, and finding all minimum and maximum points requires solving a system of N partial derivative equation with N unknowns. Might not be simple but doable.
Let's try to transfer the above definition of a minimum of a function defined on
First of all, we will concentrate only on sets of "nice" functions - those defined on some segment [a,b] (including the ends) and differentiable, at least to a derivative of a second order.
Secondly, to make analogy with vectors even better, we introduce a scalar product [·] of these "nice" two functions
[f(x)·g(x)] = ∫[a,b] f(x)·g(x)·dx
Now our functions behave pretty much like vectors and we will try to transfer the definition of a minimum from a function defined on
Consider functional F(f) defined for each function f(x) from a set of "nice" functions defined above.
Assume, we define a function f0(x) as a point where the functional F(f) has a local minimum.
It implies that there is a neighborhood of function f0(x) such that for any function f(x) located within this neighborhood
F(f0(x)) ≤ F(f(x))
The problem with this definition is that we have not defined a concept of "neighborhood" yet.
But that is not difficult provided we have defined a scalar product of two functions.
Recall that a magnitude of a vector can be defined as a square root of its scalar product with itself
|v| = √[v·v]
So, the distance between two points P and Q in
Replacing function with a functional and vector in
||g−f|| = √[(g−f)·(g−f)]
is sufficiently small.
As in a case of N-dimensional vector space, let's consider an alternative definition that will allow us to use differentiation to find a point of local minimum of a functional.
Consider any "nice" function f0(x) in a sense described above where functional F(f) has a local minimum.
Also consider any other "nice" function h(x) that defines a direction we can shift from point f0(x).
The neighborhood of this function f0(x) in the direction h(x) of radius τ are all functions
f(x)+t·h(x)
where 0 ≤ t ≤ τ.
Now we can define a point f0(x) as a local minimum of a functional F if has a minimum at this point regardless of a choice of direction h(x).
In other words,
Function f0(x) is a local minimum of functional F() if for any direction defined by function h(x) there exist a real positive number τ such that
F(f0(x)) ≤ F(f0(x)+t·h(x))
for all
Function f0(x) is a local maximum of functional F() if for any direction defined by function h(x) there exist a real positive number τ such that
F(f0(x)) ≥ F(f0(x)+t·h(x))
for all
The usefulness of these definitions is in our ability to differentiate by parameter t, assuming that derivative should be zero at points of local minimum or maximum.
But that is a subject of the next lecture.
No comments:
Post a Comment