Sunday, July 27, 2025

Physics+ MIN/MAX Problem 1: UNIZOR.COM - Physics+ 4 All - Variations

Notes to a video lecture on UNIZOR.COM

Min/Max Variation Problem 1

Problem 1

Among all smooth (sufficiently differentiable) functions f(x) defined on segment [a,b] and taking values at endpoints f(a)=A and f(b)=B find the one with the shortest graph between points (a,A) and (b,B).

Solution

First of all the length of a curve representing a graph of a function is a functional with that function as an argument. Let's determine its explicit formula in our case.

The length ds of an infinitesimal segment of a curve that represents a graph of function y=f(x) is
ds = [(dx)² + (dy)²]½ =
=
[(dx)² + (df(x))²]½ =
=
[1 + (df(x)/dx)²]½·(dx) =
=
[1 + f '(x)²]½·(dx)
The length of an entire curve would then be represented by the following functional of function f(x):
Φ[f(x)] = [a,b][1 + f '(x)²]½dx

We have to minimize this functional within a family of smooth functions defined on segment [a,b] and satisfying initial conditions
f(a)=A and f(b)=B

As explained in the previous lecture, if functional Φ[f(x)] has local minimum at function-argument f0(x), the variation (directional derivative)
d/dt Φ[f0(x)+t(f1(x)−f0(x))]
at function-argument f0(x) (that is, for t=0) in the direction from f0(x) to f1(x) should be equal to zero regardless of location of f1(x) in the neighborhood of f0(x).

Assume, f0(x) is a function that minimizes the functional Φ[f(x)] above.
Let f1(x) be another function from the family of functions defined on segment [a,b] and satisfying initial conditions
f(a)=A and f(b)=B
Let Δ(x) = f1(x) − f0(x).
It is also defined on segment [a,b] and, according to its definition, satisfies the initial conditions Δ(a)=0 and Δ(b)=0.

Using an assumed point (function-argument) of minimum f0(x) of our functional Φ[f(x)], another point f1(x) that defines the direction of an increment of a function-argument and real parameter t, we can describe a subset of points (function-arguments) linearly dependent on f0(x) and f1(x) as
f0(x)+t·(f1(x)−f0(x)) =
= f0(x) + t·
Δ(x)

Let's calculate the variation (we will use symbol δ for variation) of functional Φ[f(x)] at any point (function-argument) defined above by function-argument f0(x) minimizing our functional, directional point f1(x) and real parameter t :
δ[f0,f1,t] Φ[f(x)] =
= d/dt Φ[f0(x)+t(f1(x)−f0(x))] =
= d/dt Φ[f0(x)+t·Δ(x)] =
(use the formula for a length of a curve)
= d/dt [a,b][1+((f0+t·Δ)']½dx
In the above expression we dropped (x) to shorten it.
The derivative indicated by an apostrophe is by argument x of functions f(x) and Δ(x).

Under very broad conditions, when smooth functions are involved, a derivative d/dt and integral by dx are interchangeable.
So, let's take a derivative d/dt from an expression under an integral first, and then we will do the integration by dx.

d/dt [1+((f0(x)+t·Δ(x))']½ =
= d/dt [1+(f0'(x)+t·Δ'(x))²]½ =
=
2(f0'(x)+t·Δ'(x))·Δ'(x)
2[1+(f0'(x)+t·Δ'(x))²]½
=
=
(f0'(x)+t·Δ'(x))·Δ'(x)
[1+(f0'(x)+t·Δ'(x))²]½
Now we can integrate the above expression by x on segment [a,b].

Let's use the integrating by parts using a known formula for two functions u(x) and v(x)
[a,b]u·dv = u·v|[a,b][a,b]v·du

Use it for
u(x) =
f0'(x)+t·Δ'(x)
[1+(f0'(x)+t·Δ'(x))²]½
v(x) = Δ(x)
and, therefore,
dv(x) = dΔ(x) = Δ'(x)·dx
with all participating functions assumed to be sufficiently smooth (differentiable to, at least, second derivative).

Since
v(a) = Δ(a) = 0 and
v(b) = Δ(b) = 0,
the first component of integration is zero
u·v|[a,b]=u(b)·v(b)−u(a)·v(a)=0

Now the variation of our functional is
δ[f0,f1,t] Φ[f(x)] =
= −[a,b]v(x)·du(x,t)
where
u(x,t) =
f0'(x)+t·Δ'(x)
[1+(f0'(x)+t·Δ'(x))²]½
v(x) = Δ(x)

As we know, the necessary condition for a local minimum of functional Φ[f(x)] at function-argument f0(x) is equality to zero of all its directional derivatives at point f0(x) (that is at t=0).
It means that for any direction defined by function f1(x) or, equivalently, defined by any Δ(x)=f1(x)−f0(x), the derivative by t of functional Φ[f0(x)+t·Δ(x)] should be zero at t=0.

So, in our case of minimizing the length of a curve between two points in space, the proper order of steps would be

(1) calculate the integral above getting variation of a functional δ[f0,f1,t]Φ[f0(x)+t·Δ(x)] which is a functional of three variables:
- real parameter t,
- function f0(x) that is an argument to an assumed minimum of functional Φ[f(x)],
- function Δ(x) that signifies an increment of function f0(x) in the direction of function f1(x);
(2) set t=0 obtaining a directional derivative of functional Φ[f(x)] at assumed minimum function-argument f0(x) and increment Δ(x):
δ[f0,f1,t=0]Φ[f0(x)];
(3) equate this functional to zero and find f0(x), that solves this equation regardless of argument shift to f1(x).

Integration in step (1) above is by x, while step (2) sets the value of t.
Since x and t are independent variables, we can exchange the order and, first, set t=0 and then do the integration.

This simplifies the integration to the following
δ[f0,f1,t=0] Φ[f(x)] =
= −[a,b]Δ(x)·du(x,0)
where
u(x,0)=
f0'(x)+0·Δ'(x)
[1+(f0'(x)+0·Δ'(x))²]½
=
f0'(x)
[1+(f0'(x))²]½
Therefore,
δ[f0,f1,t=0] Φ[f(x)] =
= −[a,b]Δ(x)·d
f0'(x)
[1+(f0'(x))²]½
And the final formula for variation δ[f0,f1,t=0] Φ[f(x)] is
[a,b]Δ(x)·[
f0'(x)
[1+f0'(x)²]½
]'dx

For δ[f0,f1,t=0] Φ[f(x)] to be equal to zero for t=0 regardless of Δ(x), or, in other words, for the integral above to be equal to zero at t=0 regardless of function Δ(x), function u'(x,0) (the one in [...]') must be identically equal to zero for all x∈[a,b].

If u'(x,0) is not zero at some point x (and in some neighborhood of this point since we deal with smooth functions), there can always be constructed such function Δ(x) that makes the integral above not-zero.

From this follows that u(x,0)=const.
Therefore,
f0'(x)
[1+f0'(x)²]½
= u(x,0) = const
from which easily derived that f0'(x)=const and, therefore, the function f0(x), where our functional has a minimum, is a linear function of x.

All that remains is to find a linear function f0(x) that satisfies initial conditions f0(a)=A and f0(b)=B.

Obviously, it's one and only function
f0(x) = (B−A)·(x−a)/(b−a) + A
whose graph in (X,Y) Cartesian coordinates is a straight line from (a,A) to (b,B).

Physics+ MIN/MAX & Variation: UNIZOR.COM - Physics+ 4 All - Variations

Notes to a video lecture on UNIZOR.COM

Min/Max and Variation

In this lecture we continue discussing the problem of finding a local extremum (minimum or maximum) of a functional that we introduced in the previous lectures.

To find a local extremum point x0 of a smooth real function of one argument F(x) we usually do the following.
(a) find a derivative F'(x) of function F(x) by x;
(b) if x0 is a local extremum point, the derivative at this point should be equal to zero, which means that x0 must be a solution of the following equation
F'(x) = 0

Let's try to find a local extremum of a functional Φ[f(x)] using the same approach.
The first step presents the first problem: how to take a derivative of a functional Φ[f(x)] by its function-argument f(x)?
The answer is: WE CANNOT.

So, we need another approach, and we would like to explain it using an analogy with finding an extremum of a real function of two arguments F(x,y) defined on a two-dimensional XY plane with Cartesian coordinates, that is finding such a point P0(x0,y0) that the value F(P0)=F(x0,y0) is greater (for maximum) or smaller (for minimum) than all other F(P)=F(x,y), where point P(x,y) is in a small neighborhood of point P0(x0,y0).

As in case of functionals, we cannot differentiate function F(P) by P because, geometrically, there is no such thing as differentiating by point and, algebraically, we cannot simultaneously differentiate by two coordinates.

Yes, we can apply partial differentiation ∂F(x,y)/∂x by x separately from partial differentiation ∂F(x,y)/∂y by y, which will give a point of extremum in one or another direction. But what about other directions?
Fortunately, there is a theorem that, if both partial derivatives are zero at some point, it's really a point of extremum in all directions, but this is not applicable to functionals, and we will not talk about it at this moment.

An approach we choose to find an extremum of function F(P)=F(x,y) defined on a plane and that will be used to find an extremum of functionals is as follows.

Assume, point P0(x0,y0) is a point of a local minimum of function F(P)=F(x,y) (with local maximum it will be analogous).
Choose any other point P(x,y) in a small neighborhood of P0 and draw a straight line between points P0 and P.
Consider a point Q(q1,q2) moving along this line from P to P0 and beyond.

As point Q moves towards an assumed point of minimum P0 along the line from P to P0, the value of F(Q) should diminish. After crossing P0 the value of F(Q) will start increasing.

What's important is that this behavior of function F(Q) (decreasing going towards P0 and increasing after crossing it) should be the same regardless of a choice of point P from a small neighborhood of P0, because P0 is a local minimum in its neighborhood, no matter from which side it's approached.

The trajectory of point Q is a straight line - a one-dimensional space. So, we can parameterize it with a single variable t like this:
Q(t) = P0 + t·(P−P0)
In coordinate form:
q1(t) = x0 + t·(x−x0)
q2(t) = y0 + t·(y−y0)

At t=1 point Q coincides with point P because
Q(1)=P0+1·(P−P0)=P.
At t=0 point Q coincides with point P0 because
Q(0)=P0+0·(P−P0)=P0.

Now F(Q(t)) can be considered a function of one argument t that is supposed to have a minimum at t=0 when Q(0)=P0.
That means that the derivative d/dt F(Q(t)), as a function of points P0, P and parameter t, must be equal to zero for t=0, that is at point P0 with a chosen direction towards P.

This is great, but what about a different direction defined by a different choice of point P?
If P0 is a true minimum, change of direction should not affect the fact that directional derivative at P0 towards another point P equals to zero.
So, d/dt F(Q(t)) must be equal to zero for t=0 regardless of the position of point P in the small neighborhood of P0.

It's quite appropriate to demonstrate this technique that involves directional derivatives on a simple example.
Consider a function defined on two-dimensional space
f(x,y) = (x−1)² + (y−2)²
Let's find a point P0(x0,y0) where it has a local minimum.
Let's step from point P0(x0,y0) to a neighboring one P(x,y) and parameterize all points on a straight line between P0 and P
Q(t) = P0 + t·(P−P0) =
= (x0+t·(x−x0), y0+t·(y−y0))

The value of our function f() at point Q(t) is
f(Q(t)) = (x0+t·(x−x0)−1)² + (y0+t·(y−y0)−2)²
The directional derivative of this function by t will then be
f 't (Q(t)) = 2(x0+t·(x−x0)−1)·(x−x0) +
+ 2(y0+t·(y−y0)−2)·(y−y0)

If P0(x0,y0) is a point of minimum, this directional derivative from P0 towards P for t=0, that is at point P0(x0,y0), should be equal to zero for any point P(x,y).
At t=0
f 't (Q(0)) = 2(x0−1)·(x−x0) + 2(y0−2)·(y−y0)
If P0(x0,y0) is a point of minimum, the expression above must be equal to zero for any x and y, and the only possible values for x0 and y0 are x0=1 and y0=2.
Therefore, point P0(1,2) is a point of minimum.

The same result can be obtained by equating all partial derivatives to zero, as mentioned above.
∂f(x,y)/∂x = 2(x−1)
∂f(x,y)/∂y = 2(y−2)
System of equations
∂f(x,y)/∂x = 0
∂f(x,y)/∂y = 0
is
2(x−1) = 0
2(y−2) = 0
Its solutions are
x = 1
y = 2

Of course, this was obvious from the expression of function f(x,y)=(x−1)²+(y−2)² as it represents a paraboloid z=x²+y² with its vertex (the minimum) shifted to point (1,2).

Variation of Functionals

Let's follow the above logic that uses directional derivatives and apply it to finding a local minimum of functionals.

To find a local minimum of a functional Φ[f(x)], we should know certain properties of a function-argument f(x) where this minimum takes place.
In the above case of a function defined on two-dimensional space we used the fact that a directional derivative at a point of minimum in any direction is zero.
We do analogously with functionals.

Assume, functional Φ[f(x)] has a local minimum at function-argument f0(x) and takes value Φ[f0(x)] at this function.

Also, assume that we have defined some metrics in the space of all functions f(x) where our functional is defined. This metrics or norm with symbol ||.|| can be defined in many ways, like
||f(x)|| = max[a,b]{f(x), f '(x)}
or other ways mentioned in previous lectures.
This norm is needed to determine a "distance" between two functions:
||f0(x)−f1(x)||
which, in turn, determines what we mean when saying that one function is in the small neighborhood of another.

Shifting an argument from f0(x) to f1(x) causes change of a functional's value from Φ[f0(x)] to Φ[f1(x)], and we know that, since f0(x) is a point of a local minimum, within a small neighborhood around f0(x) the value Φ[f1(x)] cannot be less than Φ[f0(x)].

More rigorously, there exists a positive δ such that for any f1(x) that satisfies
||f1(x)−f0(x)|| ≤ δ
this is true: Φ[f0(x)] ≤ Φ[f1(x)]

Consider a parameterized family of function-arguments g(t,x) (t is a parameter) defined by a formula
g(t,x) = f0(x) + t·[f1(x)−f0(x)]
For t=0 g(t,x)=f0(x).
For t=1 g(t,x)=f1(x).
For t=−1 g(t,x)=2f0(x)−f1(x), which is a function symmetrical to f1(x) relatively to f0(x) in a sense that
½[g(−1,x)+f1(x)] = f0(x).

For all real t function g(t,x) is a linear combination of f0(x) and f1(x) and for each pair of these two functions or, equivalently, for each direction from f0(x) towards f1(x) functional
Φ[g(t,x)] =
= Φ[f0(x) + t·(f1(x)−f0(x))]
can be considered a real function of a real argument t.

Let's concentrate on a behavior of real function Φ[g(t,x)] of real argument t where g(t,x) is a parameterized by t linear combination of f0(x) and f1(x) and analyze it using a classical methodology of Calculus.

We can take a derivative of Φ[f0(x) + t·(f1(x)−f0(x))] by t and, since f0(x) is a local minimum, this derivative must be equal to zero at this function-argument f0(x), that is at t=0.

This derivative constitutes a directional derivative or variation of functional Φ[f(x)] at function-argument f0(x) along a direction defined by a location of function-argument f1(x).
Variation of functional Φ[f(x)] is usually denoted as
δ Φ[f(x)].

If we shift the location of function-argument f1(x) in the neighborhood of f0(x), a similar approach would show that this variation (directional derivative by t) would still be zero at t=0 because functional Φ[f(x)] has minimum at f0(x) regardless of the direction of a shift.

Our conclusion is that, if functional Φ[f(x)] has local minimum at function-argument f0(x), the variation (directional derivative) at function-argument f0(x) in the direction from f0(x) to f1(x) should be equal to zero regardless of location of f1(x) in the neighborhood of f0(x).

Examples of handling local minimum of functionals will be presented in the next lecture.

The following represents additional information not covered in the associated video.
It contains the comparison of properties of functionals and real functions defined in N-dimensional space with Cartesian coordinates (functions of N real arguments) to emphasize common techniques to find points of extremum by using the directional derivatives.
Certain important details of what was explained above are repeated here in more details.
On the first reading these details can be skipped, but it's advisable to eventually go through them.

We assume that our N-dimensional space has Cartesian coordinates and every point P there is defined by its coordinates.
This allows us to do arithmetic operations with points by applying the corresponding operations to their coordinates - addition of points, subtraction and multiplying by real constants are defined through these operations on their corresponding coordinates.

To make a concept of a functional, its minimum and approaches to find this minimum easier to understand, let's draw a parallel between

(a) finding a local minimum of a real function F(P) of one argument P, where P is a point in N-dimensional space with Cartesian coordinates with each such point-argument P mapped by function F(P) to a real number, and

(b) finding a local minimum of a functional Φ[f(x)] of one argument f(x), where f(x) is a real function of a real argument with each such function-argument f(x) mapped by functional Φ[f(x)] to a real number.

Note that function-arguments f(x) of a functional Φ[f(x)] have a lot in common with points in the N-dimensional space that are arguments to functions of N arguments.
Both can be considered as elements of corresponding sets with operations of addition and multiplication by a real number that can be easily defined.
Thus, if two points P and Q in N-dimensional space with Cartesian coordinates are given, the linear combination Q−P represents the vector from P to Q and P+t·(Q−P) represents all the points on a line going through P and Q.

In a similar fashion, we can call function-arguments of a functional points in the space of all functions for which this functional is defined (for example, all functions defined on segment [a,b] and differentiable to a second derivative).
With these functions we can also use arithmetic operations of addition, subtraction and multiplication by a real number.
Also, we can use geometric word line to characterize a set of functions defined by a linear combination f(x)+t·(g(x)−f(x)), where f(x) and g(x) are two functions and t-any real parameter.

This approach will demonstrate that dealing with functionals, in principle, follows the same logic as dealing with regular real functions.

As a side note, wherever we will use limits, differentials or derivatives in this lecture, we assume that the functions we deal with do allow these operations, and all limits, differentials or derivatives exist. Our purpose is to explain the concept, not to present mathematically flawless description with all the bells and whistles of 100% rigorous presentation.

(a1) Distance between points

Let's talk about a distance between two arguments of a function (distance between two points in N-dimensional space).
The arguments of a real function F(P), as points in N-dimensional space, can be represented in Cartesian coordinates (x1,x2,...,xN) with a known concept of a distance between two arguments. This is used to define a neighborhood of some point-argument P - all points Q within certain distance from P.
Thus, a distance between P(x1,x2,...,xN) and Q(y1,y2,...,yN) that we will denote as ||P−Q|| is defined as
||P−Q|| = i∈(1,N)(yi−xi]½
The definition of a distance will lead to a concept of neighborhood which is essential to define and to find a local minimum of real functions using a derivative.

(b1) Distance between functions

Let's talk about a distance between two arguments of a functional (a distance between two real functions).
The arguments of a functional Φ[f(x)], as real functions from some class of functions, also should have this concept of a distance to be able to talk about a local minimum in a neighborhood of some particular function-argument.
This distance can be defined in many ways to quantitatively measure the "closeness" of one function to another. This was discussed in the previous lecture and one of the ways to define this distance was suggested there by using a concept of a scalar product of functions as an integral of their algebraic product.
Let's suggest some other ways to define this distance.
First of all, as in case of real numbers, the distance between functions f(x) and g(x) must be based upon their algebraic difference h(x)=f(x)−g(x).
Secondly, we have to quantify this difference, a function h(x), with a single real value.
There are a few traditional ways to apply a real number (called a norm and denoted ||h(x)||) to a function to signify how close this function is to zero.
Here are some for functions defined on segment [a,b]:
||h(x)|| = max[a,b]|h(x)|
||h(x)|| = max[a,b]{|h(x)|,|h'(x)|}
||h(x)|| = [a,b]h²(x)·dx
Let's assume that some norm ||f(x)|| is defined for any function-argument of functional Φ[f(x)].
So, ||h(x)|| is a measure of how close function h(x) is to a function that equals to zero everywhere and ||g(x)−f(x)|| is a measure of how close function g(x) is to function f(x).

(a2) Increment

Now we will introduce a concept of an increment of an argument to a function F(P) (an increment to a point P in N-dimensional space) and caused by it an increment of function f(P) itself.
Let's fix an argument of function F(P): P=P0.
Consider these two points P0(x1,x2,...,xN) and P(y1,y2,...,yN) and their "difference" R(y1−x1,y2−x2,...,yN−xN).
This "difference" R is an increment of an argument of function f() from point P0 to P because in coordinate form P=P0+R.
We will denote it as
ΔP=R=P−P0 - an increment of argument P0.
At the same time, the difference
ΔF(P)=F(P)−F(P0) is an increment of a function f() at point P0 when we increment an argument by ΔP to point P=P0+ΔP.

(b2) Increment

Now we will introduce a concept of increment of a function-argument to a functional Φ[f(x)] and an increment of a functional itself.
Let's fix a function-argument of functional Φ[f(x)]: f(x)=f0(x).
If we consider another function-argument f(x), the difference
Δf(x)=f(x)−f0(x) is an increment of function-argument f0(x).
At the same time, the difference
ΔΦ[f(x)]=Φ[f(x)]−Φ[f0(x)| is an increment of a functional Φ[f(x0)], when we increment its argument from f0(x) by Δf(x) to f(x)=f0(x)+Δf(x).

(a3) Neighborhood

A neighborhood of positive radius δ around point-argument P0 of function F(P) is a set of all arguments P such that a defined above increment ΔP=P−P0 from P0 to P has the norm ||ΔP|| that does not exceed radius δ.

(b3) Neighborhood

A neighborhood of positive radius δ around a function-argument f0(x) of functional Φ[f(x)] is a set of all function-arguments f(x) such that a defined above increment Δf(x)=f(x)−f0(x) from f0(x) to f(x) has the norm ||Δf(x)|| that does not exceed δ.

(a4) Linear Function

Recall that multiplication of a point in N-dimensional space by a real number and addition of points are done on a coordinate basis, that is each coordinate is multiplied and corresponding coordinates are added.
Function F(P) is linear if for any of its point-arguments and any real multiplier k the following is true:
f(k·P) = k·F(P) and
F(P1+P2) = F(P1) + F(P2)

(b4) Linear Functional

Functional Φ[f(x)] is linear if for any function-arguments and any real multiplier k the following is true:
Φ[k·f(x)] = k·Φ[f(x)] and
Φ[f1(x)+f2(x)] =
= Φ[f1(x)] + Φ[f2(x)]

(a5) Continuous Function

Function F(P) is continuous at point P0 if a small increment of an argument from P0 to some neighboring point P causes a small increment of the value of function from F(P0) to F(P).
More precisely, function F(P) is continuous at point P0 if for any positive function increment ε there exist a positive δ such that if
||ΔP|| = ||P−P0|| ≤ δ then
|ΔF(P)| = | F(P)−F(P0) |ε.

(b5) Continuous Functional

Functional Φ[f(x)] is continuous at point f0(x) if a small increment of a function-argument from f0(x) to some neighboring function-argument f(x) causes a small increment of the value of functional from Φ[f0(x)] to Φ[f(x)].
More precisely, functional Φ[f(x)] is continuous at point f0(x) if for any positive functional increment ε there exist a positive δ such that if
||Δf(x)|| = ||f(x)−f0(x)|| ≤ δ then
Φ[f(x)]| =
= | Φ[f(x)]−Φ[f0(x)] |ε.

(a6) Differentiation of Functions

To find a local minimum of a function F(P), we should know certain properties of a point where this minimum takes place. Then, using these properties, we will be able to find a point of a local minimum.

In case of a function of one argument (that is, if dimension N=1) we know that a derivative of a function at a point of local minimum equals to zero. So, we take a derivative, equate it to zero and solve the equation.

With greater dimensions of a space where our function is defined this approach would not work, because we cannot take a derivative by a few arguments at the same time.
However, we can do something clever to overcome this problem.

Assume, function F(P) has a local minimum at point P0 and takes value F(P0) at this point.
Shifting an argument from P0 to P1 causes change of a function value from F(P0) to F(P1), and we know that, since P0 is a point of a local minimum, within a small neighborhood around P0 the value F(P1) cannot be less than F(P0).
More rigorously, there exists a positive δ such that for any P1 that satisfies ||P1−P0|| ≤ δ
this is true: F(P0) ≤ F(P1)

Consider a straight line between P0 and P1 in our N-dimensional space.
Its points Q can be parameterized as
Q(t) = P0 + t·(P1−P0)
For t=0 Q(t)=Q(0)=P0.
For t=1 Q(t)=Q(1)=P1.
For t=−1 Q(−1) is a point symmetrical to P1 relatively to point P0.
For all other t point Q(t) lies somewhere on a line that goes through P0 and P1.

If we concentrate on a behavior of function
F(Q(t))=F(P0+t·(P1−P0))
where Q(t) is a point on a line going through P0 and P1, it can be considered as a function of only one variable t and, therefore, can be analyzed using a classical methodology of Calculus.
We can take a derivative of F(P0+t·(P1−P0)) by t and, since P0 is a local minimum, this derivative must be equal to zero at this point P0, that is at t=0.
This derivative constitutes a directional derivative of function F(P) at point P0 along a direction defined by a location of point P1.

What's more interesting is that, if we shift the location of point P1 in the neighborhood of P0, a similar approach would show that this directional derivative by t would still be zero at t=0 because function F(P) has minimum at P0 regardless of the direction of a shift.

Our conclusion is that, if function F(P) has local minimum at point P0, the directional derivative at point P0 in the direction from P0 to P1 should be equal to zero regardless of location of P1 in the neighborhood of P0.

Let's see how it works if, instead of points P0 and P1 in N-dimensional space, we use Cartesian coordinates.

Note: This is a method that can be used for functions of N arguments, but not for functionals, where we will use only directional derivatives.
See item (b6) below.

Let
P0(...x0i...) - i∈[1,N]
P1(...x1i...) - i∈[1,N]
Q1(...qi...) - i∈[1,N]
where qi = x0i+t·(x1i−x0i)

Directional derivative of
F(Q(t)) = f(q1,...qN) by t, using the chain rule, equals to
Σi∈[1,N][∂f(q1,...qN)/∂qi]·(dqi /dt)
or
Σi∈[1,N][∂f(q1,...qN)/∂qi]·(x1i−x0i)

If P0(...x0i...) is the point of minimum, the above expression for a directional derivative must be zero for t=0, that is when Q(t)=Q(0)=P0(...x0i...), for any direction defined by point P1(...x1i...).
The only way it can be true is if every ∂f(q1,...qN)/∂qi] equals to zero at point P0.

We came to a conclusion that, when a function defined on N-dimensional space has a minimum at some point, all partial derivatives of this function equal to zero at this point.

It's quite appropriate to demonstrate this technique that involves directional derivatives on a simple example.
Consider a function defined on two-dimensional space
f(x,y) = (x−1)² + (y−2)²
Let's find a point P0(x0,y0) where it has a local minimum.
Let's step from this function to a neighboring one P1(x1,y1) and parameterize all points on a straight line between P0 and P1
Q(t) = P0 + t·(P1−P0) =
= (x0+t·(x1−x0), y0+t·(y1−y0))

The value of our function f() at point Q(t) is
F(Q(t)) = (x0+t·(x1−x0)−1)² + (y0+t·(y1−y0)−2)²
The directional derivative of this function by t will then be
f 't (Q(t)) = 2(x0+t·(x1−x0)−1)·(x1−x0) +
+ 2(y0+t·(y1−y0)−2)·(y1−y0)

If P0(x0,y0) is a point of minimum, this directional (from P0 towards P1) derivative at t=0 should be equal to zero for any point P1(x1,y1).
At t=0
f 't (Q(0)) = 2(x0−1)·(x1−x0) + 2(y0−2)·(y1−y0)
If P0(x0,y0) is a point of minimum, the expression above must be equal to zero for any x1 and y1, and the only possible values for x0 and y0 are x0=1 and y0=2.
Therefore, point P0(1,2) is a point of minimum.

The same result can be obtained by equating all partial derivatives to zero, as mentioned above.
∂f(x,y)/∂x = 2(x−1)
∂f(x,y)/∂y = 2(y−2)
System of equations
∂f(x,y)/∂x = 0
∂f(x,y)/∂y = 0
is
2(x−1) = 0
2(y−2) = 0
Its solutions are
x = 1
y = 2

Of course, this was obvious from the expression of function f(x,y)=(x−1)²+(y−2)² as it represents a paraboloid z=x²+y² with its vertex (the minimum) shifted to point (1,2).

(b6) Variation of Functionals

Let's follow the above logic that uses directional derivatives and apply it to finding a local minimum of functionals.

To find a local minimum of a functional Φ[f(x)], we should know certain properties of a function-argument f(x) where this minimum takes place.
In the above case of a function defined on N-dimensional space we used the fact that a directional derivative from a point of minimum in any direction is zero.
We do analogously with functionals.

Assume, functional Φ[f(x)] has a local minimum at function-argument f0(x) and takes value Φ[f0(x)] at this function.

Shifting an argument from f0(x) to f1(x) causes change of a functional's value from Φ[f0(x)] to Φ[f1(x)], and we know that, since f0(x) is a point of a local minimum, within a small neighborhood around f0(x) the value Φ[f1(x)] cannot be less than Φ[f0(x)].

More rigorously, there exists a positive δ such that for any f1(x) that satisfies
||f1(x)−f0(x)|| ≤ δ
this is true: Φ[f0(x)] ≤ Φ[f1(x)]

Consider a parameterized family of function-arguments g(t,x) (t is a parameter) defined by a formula
g(t,x) = f0(x) + t·[f1(x)−f0(x)]
For t=0 g(t,x)=f0(x).
For t=1 g(t,x)=f1(x).
For t=−1 g(t,x)=2f0(x)−f1(x), which is a function symmetrical to f1(x) relatively to f0(x) in a sense that
½[g(−1,x)+f1(x)] = f0(x).

For all real t function g(t,x) is a linear combination of f0(x) and f1(x) and for each pair of these two functions or, equivalently, for each direction from f0(x) towards f1(x) functional
Φ[g(t,x)] =
= Φ[f0(x) + t·(f1(x)−f0(x))]
can be considered a real function of a real argument t.

Let's concentrate on a behavior of real function Φ[g(t,x)] of real argument t where g(t,x) is a parameterized by t linear combination of f0(x) and f1(x) and analyze it using a classical methodology of Calculus.

We can take a derivative of Φ[f0(x) + t·(f1(x)−f0(x))] by t and, since f0(x) is a local minimum, this derivative must be equal to zero at this function-argument f0(x), that is at t=0.

This derivative constitutes a directional derivative or variation of functional Φ[f(x)] at function-argument f0(x) along a direction defined by a location of function-argument f1(x).

If we shift the location of function-argument f1(x) in the neighborhood of f0(x), a similar approach would show that this variation (directional derivative by t) would still be zero at t=0 because functional Φ[f(x)] has minimum at f0(x) regardless of the direction of a shift.

Our conclusion is that, if functional Φ[f(x)] has local minimum at function-argument f0(x), the variation (directional derivative) at function-argument f0(x) in the direction from f0(x) to f1(x) should be equal to zero regardless of location of f1(x) in the neighborhood of f0(x).

Examples of handling local minimum of functionals will be presented in the next lecture.

Thursday, July 10, 2025

Physics+ MIN/MAX of Functional: UNIZOR.COM - Physics+ 4 All - Lagrangian

Notes to a video lecture on UNIZOR.COM

Lagrangian - Definition of
Min/Max of Functional


In this lecture we will discuss a concept of a local minimum or maximum of a functional.

Consider an N-dimensional real function f(x1,...,xN) defined on a domain of sets of N real numbers.
We can always assume that these N real numbers are represented by a point in N-dimensional vector space with Cartesian coordinates and point O(0,...0) as the origin.

What is the meaning of a statement that this function has a local minimum at point
P(x1,...,xN)?

In plain language it means that within a sufficiently small neighborhood around point P, no matter where we move from point P, the value of our function at that new point will be greater or equal than f(P).

Let's formalize this definition in a way that will be used to define a local minimum of a functional.

Firstly, for convenience, we will use vectors originated at the origin of coordinates and ending at some point instead of N-dimensional coordinates of that point.
So, vector OP that stretches from the origin of coordinates O to point P will replace coordinates (x1,...,xN) of that point.
Using this, our function can be viewed as f(OP).

A "sufficiently small neighborhood" of point P(x1,...,xN) (or of vector OP) can be described as all points Q(x1,...,xN) on a sufficiently small distance from point P according to a regular definition of distance in Cartesian coordinates (or as all vectors OQ also originated at the origin of coordinates such that magnitude of a difference vector |OQOP|=|PQ| is sufficiently small).

Since we are dealing with N-dimensional Cartesian space, we know how to determine the distance between two points P and Q or a magnitude of a vector PQ that represents a difference between two vectors OQOP.

We can also approach it differently getting an equivalent definition of a minimum.
Consider any vector e of unit length.
Now, all vectors OP+t·e, where t is some "sufficiently small" real values and e can be any vector of unit length, describe "sufficiently small neighborhood" of vector OP.

This representation of "sufficiently small neighborhood" might be more convenient since it depends on a single real value t.

Using the above, we can define a local minimum of N-dimensional function f() as follows.
Vector OP is defined as a local minimum of function f() if there exists a real positive τ such that
f(OP)f(OP+t·e)
for all 0 ≤ t ≤ τ and
for any unit vector e.

Obviously, local maximum can be defined analogously by changing "less than" to "greater than" in the above definition.

For a function of one argument we usually look for a local minimum by solving an equation with a function's derivative equal to zero.
This is a very geometrical approach of looking for minimum because on one side of a minimum our function is decreasing, on another - increasing, so a derivative is changing its sign from negative for decreasing interval to positive for increasing, and, therefore, must be equal to zero at the point of minimum itself.

With functions of two and more arguments this geometric logic is not so obvious, but our alternative way of defining a minimum using unit vector e originated at the point of minimum and scalar multiplier t helps to return to geometrical meaning of a local minimum.

You can imagine that each direction of unit vector e defines a plane parallel to Z-axis going through point P and unit vector e cutting a paraboloid on a picture above with a parabola as an intersection.
This parabola is a function of one variable t and, therefore, at point of minimum P must have its derivative by t equal to zero.
This derivative is called directional derivative with vector e being a direction.

Indeed, if a directional derivative by t of f(OP+t·e) is zero at point P for each unit vector e, regardless of its direction, then point P is a good candidate for a local minimum (or maximum).
This approach allows us to deal with one-dimensional case many times (for each direction of unit vector e) instead of once but for more complicated case of multiple dimensions.

Of course, the number of possible directions of unit vector e is infinite, but it's not difficult to prove that if directional derivatives along each and every coordinate axis (partial derivative) is zero, a derivative along any other direction will be zero as well.

So, for a function of N variables it's sufficient to check N first derivatives of this function, and finding all minimum and maximum points requires solving a system of N partial derivative equation with N unknowns. Might not be simple but doable.

Let's try to transfer the above definition of a minimum of a function defined on N-dimensional vector space to a functional defined on a set of functions.

First of all, we will concentrate only on sets of "nice" functions - those defined on some segment [a,b] (including the ends) and differentiable, at least to a derivative of a second order.

Secondly, to make analogy with vectors even better, we introduce a scalar product [·] of these "nice" two functions
[f(x)·g(x)] = [a,b] f(x)·g(x)·dx

Now our functions behave pretty much like vectors and we will try to transfer the definition of a minimum from a function defined on N-dimensional vector space to a functional defined on an infinite set of "nice" functions.

Consider functional F(f) defined for each function f(x) from a set of "nice" functions defined above.
Assume, we define a function f0(x) as a point where the functional F(f) has a local minimum.
It implies that there is a neighborhood of function f0(x) such that for any function f(x) located within this neighborhood
F(f0(x)) ≤ F(f(x))

The problem with this definition is that we have not defined a concept of "neighborhood" yet.
But that is not difficult provided we have defined a scalar product of two functions.

Recall that a magnitude of a vector can be defined as a square root of its scalar product with itself
|v| = √[v·v]
So, the distance between two points P and Q in N-dimensional space, which is the length of vector PQ=OQOP, can be expressed as the magnitude of this vector using its scalar product with itself.

Replacing function with a functional and vector in N-dimensional space with a "nice" function, we can define a neighborhood of a function f as a set of all functions g such that magnitude of a difference between functions
||g−f|| = √[(g−f)·(g−f)]
is sufficiently small.

As in a case of N-dimensional vector space, let's consider an alternative definition that will allow us to use differentiation to find a point of local minimum of a functional.

Consider any "nice" function f0(x) in a sense described above where functional F(f) has a local minimum.
Also consider any other "nice" function h(x) that defines a direction we can shift from point f0(x).
The neighborhood of this function f0(x) in the direction h(x) of radius τ are all functions
f(x)+t·h(x)
where 0 ≤ t ≤ τ.

Now we can define a point f0(x) as a local minimum of a functional F if has a minimum at this point regardless of a choice of direction h(x).
In other words,

Function f0(x) is a local minimum of functional F() if for any direction defined by function h(x) there exist a real positive number τ such that
F(f0(x))F(f0(x)+t·h(x))
for all 0 ≤ t ≤ τ

Analogously,

Function f0(x) is a local maximum of functional F() if for any direction defined by function h(x) there exist a real positive number τ such that
F(f0(x))F(f0(x)+t·h(x))
for all 0 ≤ t ≤ τ

The above definitions simplify a complicated dependency of a functional on an infinite set of argument functions to a much simpler dependency on a single real variable.

The usefulness of these definitions is in our ability to differentiate by parameter t, assuming that derivative should be zero at points of local minimum or maximum.
But that is a subject of the next lecture.

Monday, June 30, 2025

Physics+ Functional, Variation: UNIZOR.COM - Physics+ 4 All - Lagrangian

Notes to a video lecture on UNIZOR.COM

Lagrangian -
Functional and Variation


To introduce concepts of Functional (a noun, not an adjective) and Variation which happen to be very important mathematical tools of Physics, let's consider the following problem.

Imagine yourself on a river bank at point A.
River banks are two parallel straight lines with distance d between them.
You have a motor boat that can go with some constant speed V relative to water.
The river has a uniform current with known speed v which we assume to be less than the speed of a boat V.
You want to cross a river to get to point B exactly opposite to point A, so segment AB is perpendicular to the river's current.

Problem:

How should you navigate your boat from point A to point B to reduce the time to cross the river to a minimum?

It sounds like a typical problem to find a minimum of a function (to minimize time). But this resemblance is only on a surface.

In Calculus we used to find minimum or maximum of a real function of real argument by differentiating it and checking when its first derivative equals to zero.

In our case the problem is much more complex, because we are not dealing with a function (time to cross the river) whose argument is a real number. The argument to our function (time to cross the river) is a trajectory of a boat from point A to point B.
And what is a trajectory of a boat?

Trajectory is a set of positions of a boat, which is, in its own rights, can be a function of some argument (trajectory can be a function of time, of an angle with segment AB or a distance from line AB in a direction of a river's current).
Trajectory is definitely not a single real number
.

In our case the trajectory is determined by two velocity vectors:
velocity vector of a boat V and
velocity vector of a river's current v.

The boat's velocity vector, while having a constant magnitude V can have variable direction depending on navigation scenario.
The current's velocity vector has constant direction along a river bank and constant magnitude v.

So, the time to cross the river is not a function in our traditional meaning as a function of real argument, it's "a function of a function", which is called Functional (a noun, not an adjective).
Examples of Functionals as "functions of functions" are
- definite integral of a real function on some interval,
- maximum or minimum of a real function on some interval, - average value of a real function on some interval,
- length of a curve that represents a graph of a real function on some interval,
etc.

It is impossible to determine minimum or maximum of a Functional by differentiating it by its argument using traditional Calculus, because its argument is not a real number, it's a function (in our case, it's a trajectory as a function of time or some other parameter).
We need new techniques, more advanced Calculus - the Calculus of Variations to accomplish this goal.

We have just introduced two new concepts - a Functional (a noun, not an adjective) as a "a function of a function" and Calculus of Variations as a new technique (similar but more advanced than Calculus) that allows to find minimum or maximum of a Functional.
These concepts are very important and we will devote a few lectures to address these concepts from purely mathematical point before starting using them for problems of Physics.

Before diving into a completely new math techniques, let's mention that in some cases, when solving a problem of finding minimum or maximum of a functional, we can still use classic approach of Calculus.
This can be done if an argument to a functional (a function in its own rights) can be defined by a single parameter. In this case a functional can be viewed as a regular function of that parameter and, as such, can be analyzed by classic Calculus techniques.

Here is an example that is based on a problem above, but with an additional condition about trajectories.
Instead of minimizing the time to cross the river among all possible trajectories, we will consider only a special class of trajectories achieved by a specific scenario of navigation that allows one single real number to define the whole trajectory.

Assume, your navigation strategy is to maintain a constant angle φ between your course and segment AB with positive φ going counterclockwise from segment AB.

Obviously, angle φ should be in the range (−π/2,π/2).

With an angle φ chosen, a boat will reach the opposite side of a river, but not necessarily at point B, in which case the second segment of a boat's trajectory is to go along the opposite bank of a river up or down a stream to get to point B.

The problem now can be stated as follows.
Find the angle φ to minimize traveling time from A to B.
For this problem a functional (time to travel from A to B), which depends on trajectory from A to B, can be considered as a regular function (time to travel from A to B) with real argument (an angle φ).

Solution for Constant Angle φ:

If you maintain this constant angle φ, you can represent the velocity vector of a boat going across a river before it reaches the opposite bank as a sum of two constant vectors
V = V + V||
where
V is a component of the velocity vector directed across the river (perpendicularly to its current) and
V|| is a component of the velocity vector directed along the river (parallel to its current).
The magnitudes of these vectors are
|V| = V·cos(φ)
|V||| = V·sin(φ)

The time for a boat to reach the opposite bank across a river is
T(φ)=d / |V|=d/ [V·cos(φ)]

In addition to moving in a direction perpendicular to a river's current across a river with always positive speed V=V·cos(φ), a boat will move along a river because of two factors: a river's current v and because of its own component V|| of velocity.
The resulting speed of a boat in a direction parallel to a river's current is V·sin(φ)−v, which can be positive, zero or negative.

Therefore, when a boat reaches the opposite side of a river, depending on angle φ, it might deviate from point B up or down the current by the distance
h(φ) = [V·sin(φ)−v]·T(φ)
This expression equals to zero if the point of reaching the other bank coincides with point B, our target.
The condition for this is
V·sin(φ)−v=0 or
sin(φ)=v/V or
φ=arcsin(v/V)=φ0.
So if we choose a course with angle φ0=arcsin(v/V), we will hit point B, and no additional movement will be needed.

Positive h(φ) is related to crossing a river upstream from point B when the angle of navigation φ is greater than φ0 and negative h(φ) signifies that we crossed the river downstream of B when the angle of navigation φ is less than φ0.

In both cases, after crossing a river we will have to travel along a river's bank up or down the current to cover this distance h(φ) to get to point B.

Our intuition might tell that an angle φ0 of direct hit of point B at the moment we reach the opposite side of a river, when h(φ)=0, should give the best time because we do not have to cover additional distance from a point we reached the opposite bank to point B.

It's also important that the actual trajectory of a boat in this case will be a single straight segment - segment AB - the shortest distance between the river banks.

In general, the time to reach the other side of a river depends only on component V of the boat's velocity and it equals to
T(φ) = d / [V·cos(φ)]

If we choose a course with angle φ=φ0 to reach the opposite side exactly at point B, the following equations take place
sin(φ0)=v/V
V²·sin²(φ0) = v²
V²·cos²(φ0) = V²−v²
V·cos(φ0) = √V²−v²
T0) = d /V²−v²
This is the total time to get to point B.
If we choose some other angle φ≠φ0, we have to add to the time of crossing a river T(φ) the time to reach point B going up or down a stream along the opposite river bank.

Let's prove now that the course with angle φ=φ0 results in the best travel time from A to B.

The distance from a point where we reach the opposite bank to point B is
h(φ) = [V·sin(φ)−v]·T(φ) =
=
[V·sin(φ)−v]·d / [V·cos(φ)]

This distance must be covered by a boat by going down (if h(φ) is positive) or up (if h(φ) is negative) the river's current.
Let's consider these cases separately.

Case h(φ) is positive

This is the case of angle φ is greater than φ0.
Obviously, the timing to reach point B in this case will be worse than if φ=φ0 with h(φ0)=0.
First of all, with a greater than φ0 angle φ the river crossing with speed V(φ)=V·cos(φ) will take longer than with speed V0)=V·cos(φ0) because cos() monotonically decreases for angles from 0 to π/2.
Secondly, in addition to this time, we have to go downstream to reach point B.
So, we should not increase the course angle above φ0.

Case h(φ) ≤ 0 because φ ≤ φ0

This scenario is not so obvious because crossing the river with angle φ smaller than φ0 but greater than 0 takes less time than with angle φ0.
But it adds an extra segment to go upstream after a river is crossed.

The extra distance h(φ) is negative because V·sin(φ is less than v, which allows a current to carry a boat below point B.
Since the point of crossing the river is below point B, the distance |h(φ)| should be covered by going upstream with speed V−v, which will take time
Th(φ) = |h(φ)| / (V−v)

Using the same expression for h(φ) but reversing its sign to deal with its absolute value, we get the additional time to reach point B after crossing a river
Th(φ) =
[v−V·sin(φ)]
V·cos(φ)·(V−v)
The total travel time from point A to point B in this case is T(φ)=T(φ)+Th(φ), which after trivial simplification looks like
T(φ)=
d[1−sin(φ)]
(V−v)cos(φ)
This function is monotonically decreasing by φ because its derivative
T'(φ) =
d[sin(φ)−1]
(V−v)cos²(φ)
is negative.
Therefore, its minimum is when its argument is the largest, that is if φ=φ0, h=0 and the time to get to point B is
TAB = T = d / [V·cos(φ)]

So, the answer to our simplified problem, when we managed to solve it using the classic methodology, is to choose the course of navigation from A to B at angle φ0=arcsin(v/V).
The minimum time of traveling is TAB=d / [V·cos(φ)]

As you see, in some cases, when a set of functions that are arguments to a functional can be parameterized by a single real value (like with an angle φ in the above problem), optimization problems can be solved using classic Calculus.

The subject of a few future lectures is Calculus of Variations that allows to solve problems of optimization in more complicated cases, when parameterization of arguments to a functional is not possible.

Friday, June 20, 2025

Physics+ Kepler Third Law: UNIZOR.COM - Physics+ 4 All - Laws of Newton

Notes to a video lecture on UNIZOR.COM

Laws of Newton -
Kepler's Third Law


Kepler's Third Law states that for all objects moving around a fixed source of gravitational field along elliptical orbits the ratio of a square of their period of rotation to a cube of a semi-major axis is the same.

As in the case of the Kepler's First Law, this Third Law has been based on numerous experiments and years of observation.

Based on all the knowledge conveyed in previous lectures on Kepler's Laws, we will derive this Third Law theoretically.

Let's make a simple derivation of Kepler's Third Law in case of a circular orbit.

In this case the velocity vector of an object circulating around a central point is always perpendicular to a position vector from a center to an object.
Since the gravitational force is collinear with a position vector, it is also perpendicular to velocity, which is tangential to a circular orbit. Therefore, gravitational force makes no action along a velocity vector which makes the magnitude of the velocity vector constant.

Let's introduce the following characteristics of motion:
t - absolute time,
r - radius of a circular orbit of a moving object,
F - vector of gravity,
M - mass of the source of gravitational field,
m - mass of object moving in the gravitational field,
r - position vector from the source of gravitational field to a moving object,
r'=v - velocity vector of a moving object,
r"=v'=a - acceleration vector of a moving object,
T - period of circulation,
ω=2π/T - scalar value of angular velocity,
Here bold letters signify vectors, regular letters signify scalars and magnitudes of corresponding vectors, single and double apostrophes signify first and second derivative by time.

Constant magnitude v of velocity vector means constant angular velocity ω and obvious equality v=r·ω.

Magnitude a of an acceleration vector can be simply found by representing a position vector as a pair of Cartesian coordinates (x,y):
x = r·cos(ωt)
y = r·sin(ωt)
x' = −r·ω·sin(ωt)
y' = r·ω·cos(ωt)
x" = −r·ω²·cos(ωt)=−ω²·x
y" = −r·ω²·sin(ωt)=−ω²·y
and, therefore,
a = r" = −ω²·r
(collinear with r and F)
from which follows
a = |a| = |−ω²·r| = ω²·r

According to the Newton's Second Law,
F = m·a
According to the Universal Law of Gravitation,
F = G·M·m/r²
Therefore,
a = ω²·r = G·M/r²
from which follows
ω² = G·M/r³

Since ω=2π/T,
4π²/T² = G·M/r³
T²/r³ = 4π²/(G·M) - constant
End of proof for circular orbit.

Let's prove it in a more complicated general case of any elliptical orbit.
We will use the First and the Second Kepler's Laws as well as the results presented in the previous lecture Planet Orbit Geometry to derive this Third Law.

Recall the Kepler's Second Law (see the lecture Kepler's Second Law in this course).
We have introduced a function A(t) that represents an area of a sector bounded by r(0), r(t) and a trajectory from a planet's position P(0) at time t=0 to its position at any moment of time P(t).
Then the area swept by position vector r(t) during the object's motion from time t1 to t2 equals
ΔA[t1,t2] = A(t2) − A(t1)

Using the above symbols, the Kepler's Second Law can be formulated as
If t2−t1 = t4−t3 then
A(t2)−A(t1) = A(t4)−A(t3)
The above condition is equivalent to a statement that
dA(t)/dt is constant or, equivalently, that A(t) is a linear function of time t with A(0)=0.

In the same lecture we have proven that
dA(t)/dt = ½|L|/m
where L is an angular momentum of a moving object (constant in a central force field) and m is object's mass.

Therefore,

A(t) = t·½|L|/m

Assume, we want to know how much area is swept by position vector r(t) during a period T of a complete round movement of a planet around the Sun.
Obviously, it's
A(T) = T·½|L|/m
At the same time, A(T) is an area of an elliptical orbit of a planet, and we know that the area of an ellipse along which a planet moves equals to
A(T) = π·a·b
where a is semi-major and b is semi-minor axes (see the lecture More on Ellipse in this course).

Therefore,
π·a·b = T·½|L|/m

Let's assume that at time t=0 a planet is at the furthest from the Sun point (aphelion), it's initial position vector is r0 and its velocity vector is v0.
At this initial point on an orbit the position vector r0, lying along the major axis of an ellipse, and a tangential to an ellipse vector of velocity v0 are perpendicular to each other.

Therefore, the magnitude of the angular momentum |L| equals to a product of planet's mass, a magnitude of its position vector (that is, a distance of the Sun) and a magnitude of its velocity vector:
|L| = m·|r0|·|v0| or L/m = r0·v0

Now the above formula for a period T of a planet's rotation around the Sun is
π·a·b = ½T·r0·v0

In the previous lecture Planet Orbit Geometry we have derived the expressions for major and minor axes of an elliptical orbit of a planet in terms of its initial position and velocity at aphelion:
a =
r0
2−β
b =
r0β·(2−β)
2−β
where β=r0·v0²/(G·M)

Let's substitute these expression into a formula connecting a period T with an area of an ellipse:
πr0²√β·(2−β)/(2−β)²=½T·r0·v0

Let's square both sides to get rid of a radical:
π²r04β/(2−β)3=¼T²·r0²·v0²

Next is just technicality.
Cancel one r0 from both sides
π²r03β/(2−β)3=¼T²·r0·v0²

Replace r03/(2−β)3 with a3 (see formula above)
π²a3β=¼T²·r0·v0²

Replace β with r0·v0²/(G·M) (see formula above)
π²a3r0·v0²/(G·M)=¼T²·r0·v0²

Cancel r0·v0² on both sides
π²a3/(G·M)=¼T²

Final result:

a3
=
4π²
G·M

The right side is a constant that contains no planet-specific parameters (like initial position and velocity), which means that any planet has the following property (Kepler's Thirt Law).
The ratio of a square of the period of a planet's rotation around the Sun to a cube of a semi-major axis of its elliptical orbit is constant that depends only on a mass of the Sun.