Singular Value Decomposition Explained: Theory and Code

presmiterventbemo
Aug 19, 2023
6 min read

In summary, we can write the SVD expression for one vector as\beginequation\labelequation-svd-1\mathbfH\cdot v_i = \sigma_j\cdot \mathbfu_j\endequationwhere $i$ and $j$, or the number of input and output vectors, can be different to each other. In 2D geometry, a circle on the left of the above figure is transformed into an ellipse. The orientation of the axes are dictated by the vectors $u_j$ and the lengths of the major and minor axes by $\sigma_j$. These values $\sigma_j$ are known as singular values. The vectors $\mathbfv_i$ and $\mathbfu_j$ in Eq (\refequation-svd-1) can then be combined into matrices $\mathbfV$ and $\mathbfU$, respectively, to produce the final SVD expression.

singular value decomposition tutorial

DOWNLOAD

We earlier assumed that the number of Tx antennas $N_T$ is the same as the number of Rx antennas $N_R$ and each Tx antenna is transmitting an independent modulation symbol $s_i$. This is why we represented the singular values $\sigma_i$ with index $i$, otherwise the number of singular values depends on the minimum of $i$ and $j$. Let us now break free from these assumptions and any number of modulation symbols can be chosen before beamforming at the Tx that determines the number of virtual pipes utilized. Notice that various linear combinations of modulations symbols are being constructed at each Rx antenna while passing through the wireless channel.

Diversity precoding is a special case of linear precoding in which the only target is to improve reliability and a single modulation symbol is transmitted. The precoder is one of the vectors $v_i$ we saw before in the SVD of the wireless channel. Specifically, the Tx weights are $\mathbfv_i$ where the index $i$ here refers to the vector corresponding to the largest singular value $\sigma_x$. As shown in the figure below, it is like pointing the whole fire hose towards the fattest pipe in the wireless channel. In a corresponding manner, the Rx chooses the vector $\mathbfu_j$ as its beamforming weights where $j$ also corresponds to the index of the largest singular value $\sigma_x$. As shown in the figure below, this is like collecting all the energy from one fattest pipe only and discarding everything coming from elsewhere. This is similar to MRC at the Tx and MRC at the Rx in a simultaneous manner for a single symbol.

This transformer performs linear dimensionality reduction by means oftruncated singular value decomposition (SVD). Contrary to PCA, thisestimator does not center the data before computing the singular valuedecomposition. This means it can work with sparse matricesefficiently.

Singular values are equal to the square root of the eigenvalues. Since eigenvalues are automatically normalized in the Data Library, they do not easily provide information into the total amount of variance they explain. However, you may calculate the total variance explained by each EOF by squaring the singular values.

This is a collaborative tutorial aimed at simplifying a common machine learning method known as singular value decomposition. Learn how these techniques impact computational neuroscience research as well!

Singular value decomposition is a method to factorize an arbitrary $m\ \times\ n$ matrix, $A$, into two orthonormal matrices $U$ and $V$, and a diagonal matrix $\Sigma$. $A$ can be written as $U\Sigma V^T$. The diagonal entries of $\Sigma$, called singular values, are arranged to be in decreasing magnitude. The columns of $U$ and $V$ are composed of the left and right singular vectors. Therefore, we can express $U\Sigma V^T$ as a weighted sum of outer products of the corresponding left and right singular vectors, $\sigma_i u_i v_i^T$.

Using SVD, we can approximate $R$ by $\sigma_1 u_1 v_1^T$, which is obtained by truncating the sum after the 1st singular value. This will be a low-rank approximation of $R$. If $R$ is fully separable in direction and disparity, only the first singular value will be non-zero, indicating the matrix is of rank 1. $\sigma_1 u_1 v_1^T$ will then be a close approximation of $R$. In general, the closer $R$ is to being separable, the more dominant the first singular value $\sigma_1$ will be over the other singular values, and the closer the approximation $\sigma_1 u_1 v_1^T$ will be to the original matrix $R$.

You might be wondering what the inuition to Eigen-decomposition is. The eigenvalues represent the covariance and eigenvectors represent the linearly independent directions of variation in data. A sample eigenvectors and eigenvalues are shown below:

Note that $\Sigma$ here does not refer to the covariance matrix. Here, $U$ is a $m \times m$ square orthonormal basis function. The columns of $U$ form a set of orthonormal (unit normal) vectors which can be regarded as basis vectors. $V^T$ is a $n \times n$ square orthonormal basis function as well. The columns of $V$ also form a set of orthonormal (unit normal) vectors which can be regarded as basis vectors. Think of $U, V^T$ as matrices which rotate the data. $\Sigma$ is a $m \times n $ diagonal rectangular matrix which acts as a scaling matrix. Note that for SVD to be valid $A $ has to be a Positive Semi-Definite matrix (PSD), i.e., $ A \succeq 0$ or all the eigenvalues have to be non-negative (either zero or positive). You might be wondering this looks very similar to the eigendecomposition we studied earlier. What is the relation between the two?

The matrix $U$ (left singular values) of $A$ gives us the eigenvectors of $AA^T$. Similarly, as you expect, the matrix $V$ (right singular values) of $A$ gives us the eigenvectors of $A^TA$. The non-zero singular values of $A$ (found on diagonal entries of $\Sigma$) are the square roots of non-zero eigenvalues of both $AA^T$ and $A^TA$.

To understand how this will help in solving $A\mathbfx=0$, we need to understand the concept of rank of a matrix first. The rank of a matrix $A$ is defined as the number of linearly independent columns of $A$, this is mathematically defined as the dimension of the vector space spanned by the columns of $A$. The easiest way to find the rank of a matrix is to take the Eigen-decomposition (for square matrices) or the SVD (for any shaped matrix). The number of non-zero eigenvalues or the number of non-zero singularvalues gives the rank of a matrix. The rank can be atmost the smallest dimension of the matrix $A$, i.e., if $A \in \mathbbR^m \times n$ and \(n Rank-nullity theorem as follows:

Let us find the nullspace using SVD. Let the SVD of $A = U\Sigma V^T$. Now, $\Sigma$ is ideally supposed to be of rank 3 (as we know that we have 3 unknowns). This means that the 4:N rows of $\Sigma$ have to be all zeros. But due to noise, the rank will be more than 3. A good solution to the optimization problem is obtained when we set any of the columns of $V^T$ corresponding to nullspace to zero (4:N rows of $\Sigma$). The singular values are sorted in descending order and hence a minimum deviation from the ideal line would give us the best-fit solution. This is the solution corresponding to the smallest singular-value.

Here $\sigma_max$ and $\sigma_min$ refers to the maximum and minimum singular values respectively. Similarly, $\lambda_max$ and $\lambda_min$ refers to the maximum and minimum eigenvalues respectively. If the noiseless version of the problem is $\mathbf\hatx = A^\dagger \mathbf\hatb$ and the noisy version is $\mathbfx = A^\dagger \mathbfb$, the relation between the estimates and condition number is given below:

The columns of $U$ are eigenvectors of $AA^T$, and the columns of $V$ are eigenvectors of $A^TA$. The $r$ singular values on the diagonal of $\Sigma$ are the square roots of the nonzero eigenvalues of both $AA^T$ and $A^TA$.

The two perspectives are complementary: they are simply different ways of looking at the same thing. The benefit we get from the singular vectors is that we can develop the singular value decomposition (SVD).

Moreover, even though $\gc\M$ may be non-square or singular, we know that $\gc\M^T\gc\M$ is always symmetric, which tells us that it has exactly $m$ real eigenvalues, including multiplicities. The square roots of these eigenvalues indicate how much $\gc\M$ stretches space along the various axes of the ellipsoid we get if we transform the unit vectors by it.

We call these values the singular values of $\gc\M$. For each singular value $\gc\sigma$, we have used two vectors in its definition. The unit vector $\rc\v \in \mR^m$ which we multiplied by $\gc\M$ and the vector that resulted from the multiplication. The latter has length $\gc\sigma$ so we can represent it as $\gc\sigma\rc\u$, where $\rc\u \in \mR^n$ is also unit vector. With this, we can make our definition of the singular vectors similar to that of the eigenvectors: if $\gc\sigma$ is a singular value of $\gc\M$, then for its two singular vectors $\rc\v$ and $\rc\u$

We can now show that the singular values and eigenvalues are in some sense generalizations of standard deviation and variance. To illustrate, assume that our data $\X$ is one-dimensional. In this case, we could estimate the variance with the formula:

In short, if we ignore a scaling factor of $\frac1\sqrtn$, the singular values of $\X$ are analogous to the standard deviation and the eigenvalues of the covariance matrix $\X^T\X$ are analogous to the variance.

When we developed eigenvalues and eigenvectors, we saw that they allowed us to decompose square matrices as the product of three simpler matrices: $\bc\A = \rc\P\bc\D\rc\P^T$. We can do the same thing with singular values and vectors.

The fact that the eigenvectors are orthogonal comes from the iterative way in which we chose them: each was chosen to be orthogonal to all the previous choices. We did the same for the singular vectors. The assumption that we have $n$ real eigenvalues comes from the spectral theorem. 2ff7e9595c

Singular Value Decomposition Explained: Theory and Code

singular value decomposition tutorial

Recent Posts

コメント