In everyday life, we have a good sense of the quantity of various items in terms of a "single" number. For example, we know what it means to buy 5 pounds of flour and it is more than 3 pounds of flour. But, we don't have a very intuitive sense of quantity and comparison in case of some mathematical objects such as vector and matrix.
For example, how big is a vector like the one below?
[ 98, 1, 23, 6 ]
Is it bigger than the following vector?
[ 2, 97, 12, 13 ]
What about the two matrices below?
It is not very intuitive to compare vectors and matrices, because a matrix is a collection of numbers, not just a single number. We need to define a way to interpret the size (or the quantity) of a collection of numbers, whether it is a vector or it is a matrix.
Norms
The size of a matrix or a vector is defined based on a "norm". You can say that the higher is the norm of matrix, the bigger is the size of the matrix (of course, it is not a precise definition, but you get the idea). Unlike the value of a single number, there is no unique way to interpret the size of a vector or a matrix. There are many different types of norms and each one has its own properties and uses.
Some of the famous norms include the Euclidean norm, l0 norm, l1 norm, Frobenius norm, etc. The following table shows the famous vector norms.
There are also a number of matrix norms. The matrix norms l_0 through l_\infty can be defined in a number of different ways, for example by adding up all the entries, or as a result of multiplying by a vector.
We will explain a couple of vector norms and matrix norms later. But first let's use a more precise mathematical notation of norm. The norm of a matrix X is denoted by ||X||_i where i denotes the type of the norm, e.g. if it is norm 0, we will denote it as ||X||_0, and if it is the Euclidean norm, we will show it as ||X||_2. Note that Euclidean norm is more commonly denoted as ||X||, as it is the one of the most widely used norms, therefore the subscript is usually omitted.
Some of the famous norms include the Euclidean norm, l0 norm, l1 norm, Frobenius norm, etc. The following table shows the famous vector norms.
There are also a number of matrix norms. The matrix norms l_0 through l_\infty can be defined in a number of different ways, for example by adding up all the entries, or as a result of multiplying by a vector.
We will explain a couple of vector norms and matrix norms later. But first let's use a more precise mathematical notation of norm. The norm of a matrix X is denoted by ||X||_i where i denotes the type of the norm, e.g. if it is norm 0, we will denote it as ||X||_0, and if it is the Euclidean norm, we will show it as ||X||_2. Note that Euclidean norm is more commonly denoted as ||X||, as it is the one of the most widely used norms, therefore the subscript is usually omitted.
Vector Norms
First, we define the vector norms and then we will explain the matrix norms.Euclidean Norm
The Euclidean norm (i.e. l_2 norm) of a vector X is defined as:
||X|| = \sqrt{\sum_i{x_i^2}}
l_2 norm is used in many applications, because it is continuous and differentiable.
l_2 norm has a very simple geometric interpretation, shown in the following figure. Here, l_2 norm shows the distance of a 2D point (i.e. a vector with only two elements) from the origin (it is the shortest path).
l_2 norm has a very simple geometric interpretation, shown in the following figure. Here, l_2 norm shows the distance of a 2D point (i.e. a vector with only two elements) from the origin (it is the shortest path).
Another useful illustration of the Euclidean norm is its unit circle, i.e. all the set of all vectors whose Euclidean norm is equal to "1". You will realize that the set of all such vectors lie on a circle with a radius 1.
l_n norms in General
In general, l_n norm can be defined as:
||X||_n = \sqrt[n]{\sum_i{x_i^n}}
These norms might look similar, but they have very different properties, therefore those norms are not interchangeable in most cases.
l_0 Norm
If you look carefully at the above formula, then you will realize that there is a problem ... we mentioned l_0 norm. But how do we define l_0 norm with the presence of \sqrt[0]{..} and x_i^0?! Well, as computing the above norm formula for l_0 is quite tricky, in practice we use the following definition of l_0:
||X||_0 = \# (i | x_i \neq 0)
The above definition tells us that l_0 norm of a vector X is the total number of its non-zero elements. You might think that the above norm is just a mathematical definition with perhaps no real world application. In reality l_0 norm shows up in many optimization problems, however as its computation is difficult, we usually use other norms such as l_1.
l_1 Norm
l_1 norm is defined as:||X||_1 = \sum_i{|x_i|}
l_0 norm is used in many applications, and it has a couple of other names such as the "Manhattan distance". It is used in many regularization and optimization problems instead of l_0 which is more difficult to compute (see this post). l_1 also has a simple geometric interpretation, as shown in the following figure. It shows the distance in terms of "city blocks", hence sometimes it is called "Manhattan distance" or "city block distance".
Similar to Euclidean norm, we can also show its unit circle, i.e. all the set of all vectors whose l_1 norm is equal to "1". You will realize that the set of all such vectors lie on a square (and not a circle) with its sides equal to 1.
l_\infty Norm
l_\infty Norm is defined as:
||X||_\infty = \max{(|x_1|, \cdots, |x_n|)}
It is sometimes called as the Chebyshev norm. The geometric interpretation of its unit circle is a square with sides of 1. It has this particular shape, because in order to have a norm of 1, at least one of the entries has to be 1.